NLP-based lead extraction with Gemini — a chatbot case study

AslasChat's whole value proposition was capturing qualified leads from natural conversation — not bolting a form onto a chat window. That meant extracting names, emails, and phone numbers as they surfaced organically, validating them, and landing them in a tenant's dashboard without the user ever feeling processed. Here's the approach that worked.

Use the LLM for extraction, not generation

The key framing: Gemini's job is structured NLP, not free-form output. Extracting entities from a conversation is a deterministic, auditable task. Asking it to 'be helpful and also capture leads' is not.

Every extraction returns a strict object — name, email, phone, confidence — validated against a schema before it lands anywhere. Using the model narrowly keeps results predictable and cheap, and it makes the feature explainable to tenants who want to know why a lead was captured.

The verification layer is not optional

Models hallucinate plausible emails and mis-segment phone numbers. A validation pass — regex and format checks, plus a confidence threshold — sits between extraction and storage. Anything below the threshold is flagged rather than silently saved.

This matters because a CRM full of malformed leads is worse than no leads: it erodes trust in the whole system. The verification layer is what makes the automation safe to leave running.

Treat the prompt + schema as a versioned artifact

Extraction quality lives or dies by the prompt and schema pair. I version them together and treat changes like code changes — not a string someone tweaks in passing. A prompt edit can shift extraction behavior across every tenant at once, so it deserves the same review discipline as a migration.

Multi-tenant isolation from day one

Every extracted lead is scoped to a tenant from the moment it's created. Tenant isolation went into the data model on day one — retrofitting it after the first paying customer is the kind of refactor that delays everything else.

Takeaways

Use the LLM for narrow, structured extraction rather than open-ended generation to keep results auditable and cheap.
A validation layer with a confidence threshold is what makes automated lead capture safe to trust.
Version the prompt and schema together and review changes like code.
Build tenant isolation into the data model before the first customer, not after.

Related case study

AslasChat

AI-powered chatbot SaaS — automated customer interactions and NLP-based lead capture.

Read the case study

All notes Start a project