AI Systems · May 06, 2026 · 9 min

NLP-based lead extraction with Gemini — a chatbot case study

Pulling names, emails, and phone numbers out of natural chat without users feeling interrogated. What worked while building AslasChat — schema design, prompt patterns, and the verification layer.

AslasChat's whole value proposition was capturing qualified leads from natural conversation — not bolting a form onto a chat window. That meant extracting names, emails, and phone numbers as they surfaced organically, validating them, and landing them in a tenant's dashboard without the user ever feeling processed. Here's the approach that worked.

Use the LLM for extraction, not generation

The key framing: Gemini's job is structured NLP, not free-form output. Extracting entities from a conversation is a deterministic, auditable task. Asking it to 'be helpful and also capture leads' is not.

Every extraction returns a strict object — name, email, phone, confidence — validated against a schema before it lands anywhere. Using the model narrowly keeps results predictable and cheap, and it makes the feature explainable to tenants who want to know why a lead was captured.

The verification layer is not optional

Models hallucinate plausible emails and mis-segment phone numbers. A validation pass — regex and format checks, plus a confidence threshold — sits between extraction and storage. Anything below the threshold is flagged rather than silently saved.

This matters because a CRM full of malformed leads is worse than no leads: it erodes trust in the whole system. The verification layer is what makes the automation safe to leave running.

Treat the prompt + schema as a versioned artifact

Extraction quality lives or dies by the prompt and schema pair. I version them together and treat changes like code changes — not a string someone tweaks in passing. A prompt edit can shift extraction behavior across every tenant at once, so it deserves the same review discipline as a migration.

Multi-tenant isolation from day one

Every extracted lead is scoped to a tenant from the moment it's created. Tenant isolation went into the data model on day one — retrofitting it after the first paying customer is the kind of refactor that delays everything else.

Takeaways

  • Use the LLM for narrow, structured extraction rather than open-ended generation to keep results auditable and cheap.
  • A validation layer with a confidence threshold is what makes automated lead capture safe to trust.
  • Version the prompt and schema together and review changes like code.
  • Build tenant isolation into the data model before the first customer, not after.

Related case study

AslasChat

AI-powered chatbot SaaS — automated customer interactions and NLP-based lead capture.

Read the case study