Gemini & Vertex AI in production — what actually works

A demo that calls an LLM once and prints the result is easy. A system that calls it ten thousand times a day across languages, networks, and failure modes is a different animal. Building the AdShort AI pipeline on Gemini and Vertex AI taught me that the model is the least of your problems — the engineering around it is the product. Here is what actually held up under load.

Define the output schema before you write the prompt

The single biggest reliability win was forcing structured outputs. Instead of asking the model for prose and parsing it downstream, I defined a strict schema for every call and validated the response against it before anything touched the database.

Vertex AI and Gemini both support response schemas — use them. A response that doesn't match the schema is a retry, not a 3 a.m. incident. This one decision eliminated an entire class of bugs where a slightly reworded model response silently broke a downstream parser.

The discipline pays off twice: the schema documents what each stage expects, and it gives you a clean seam to swap models later without rewriting the consumers.

Idempotent jobs beat clever retry state

Every job in the pipeline carries enough context to be safely re-run from scratch. No distributed locks, no half-written state to reconcile. A worker that crashes mid-render is a non-event — the job simply runs again and produces the same result.

This is less elegant than a stateful retry machine and far more operable. When you're orchestrating prompt generation, media synthesis, translation, and publishing across multiple networks, the operational model has to be simple enough to reason about at a glance.

Translation quality varies sharply by language pair

Multilingual automation sounds like a solved problem until you ship it. Some language pairs come back clean; the long tail needs review. Rather than chasing 100% automation, I surfaced a manual-review queue for low-confidence translations and let the high-confidence majority flow through untouched.

Chasing the last 5% of automated quality would have cost more than the human review it replaced. Knowing where to stop is part of the engineering.

Pick the platform for its operational guarantees

I chose Vertex AI over raw API access not for marginal quality differences but for region controls and predictable enterprise quotas. The client needed EU-region inference and reproducible billing. Bleeding-edge capability matters less than a system you can deploy, bill, and audit predictably.

Per-tenant rate limits went in from day one. Retrofitting them under load is painful — by the time you need them, you're already firefighting.

Takeaways

Structured outputs validated against a schema turn malformed responses into cheap retries instead of production incidents.
Idempotent jobs make a crashed worker a non-event — favor them over stateful retry machinery.
Surface a manual-review queue for the long tail of translation quality rather than chasing full automation.
Choose your AI platform for region, quota, and billing guarantees, not just model quality.

Related case study

AdShort AI

AI-driven short-form ad platform — automated video generation, scheduling, and publishing.

Read the case study

All notes Start a project