Enterprise voice AI migration / FinOps architecture story

How I designed a Gemini-powered English tutor for airline-wide adoption

A Latin American airline had already bought 50K Gemini licenses and had an English tutor assistant built on OpenAI technology. I helped design the target system, migration strategy, cost optimization architecture and rollout plan so the tutor could become a practical adoption engine for Gemini.

50K

Gemini licenses to activate

cost optimization mechanisms

voice architecture paths

~EUR0.40M

estimated project investment

The architecture move

The tutor was the product surface. The real work was adoption, migration and cost architecture.

I structured the proposal around the employee experience, the Google-native platform, two voice routes, cumulative efficiency mechanisms, a governed knowledge brain and an executable investment plan.

01 / Employee journey

A tutor experience designed to create repeated Gemini adoption

I designed the functional journey around short, useful learning sessions: diagnosis, live conversation, feedback, progress tracking and continuous improvement. The English tutor was not only a product idea; it was the adoption vehicle for the airline to make 50K Gemini licenses visible, useful and measurable for employees.

Employee adoptionShort practice sessionsProgress evidenceContinuous improvement

End-to-end employee journey for an English tutor with diagnosis, conversation, feedback, progress and continuous improvement

02 / Target architecture

A Google-native architecture that keeps the voice engine interchangeable

I proposed a layered architecture on Google foundations where channel, identity, voice, orchestration, RAG, cache, evaluation, observability, security and deployment are explicit layers. This made the migration more than a model substitution: the assistant platform could support Gemini Live as the primary route while keeping an open-source voice path available.

Google-native platformShared layersGemini path ALiveKit path B

Google-native layered target architecture for the English tutor assistant

03 / Via A

Gemini Live as the productive baseline for real-time voice sessions

I mapped how a Gemini Live session should work end to end: ephemeral token, WebSocket setup, audio streaming, transcription, barge-in, tool calls and fallback behavior. This gave the client a clear route to activate the licenses with a premium real-time experience.

Ephemeral tokenStreaming audioTool callsFallback behavior

Gemini Live session cycle with client app, backend and Gemini Live API

04 / Via B

LiveKit open-source cascade as the cost-efficient scaling path

I also designed an alternative route using LiveKit and a cascading pipeline: VAD, STT, LLM and TTS. This path gives the airline a benchmark-driven way to reduce dependency and scale more efficiently when the cost profile justifies using interchangeable providers.

Open-source routeProvider flexibilityBenchmark-drivenCost scaling

LiveKit cascading pipeline with VAD, speech-to-text, LLM, text-to-speech and audio out

05 / Cost architecture

A cumulative efficiency architecture to reduce consumption cost

Because tens of thousands of potential users can create meaningful AI consumption cost, I designed efficiency into the architecture from the start. The proposal combines multilevel cache, model routing, context control, RAG grounding and the LiveKit path so savings compound instead of depending on a single optimization.

L0/L1/L2 cacheModel routingContext controlRAG grounding

Cost-saving architecture for Gemini usage with multilevel cache, model routing, context control, RAG and open-source path B

06 / Self-improving brain

A governed second brain that improves without bypassing control

I designed the knowledge brain as a governed improvement loop. Runtime conversations create observations, Gemini can propose improvements, but production changes only happen after quality gates and human approval. This protects quality while still letting the tutor learn from real usage patterns.

Observation loopQuality gateHuman approvalRAG reindexing

Self-improving brain under a governed cycle with runtime, observation, proposal, quality gate, human approval and reindexing

07 / Investment and plan

A four-month delivery plan with a clear investment frame

I translated the architecture into an investment and rollout plan: approximately EUR0.40M for a four-month project with a lean team, phased delivery, pilot validation and go-live. This helped stakeholders compare technical ambition with delivery cost, scope and adoption risk.

~EUR0.40M4 monthsLean teamPilot to rollout

Investment breakdown and four-month roadmap for go-live and rollout start

01 / Problem framing

I reframed 50K Gemini licenses as an adoption and product architecture challenge

A Latin American airline had purchased 50K Gemini licenses and wanted to turn that investment into practical usage. The entry point was an English tutor for employees, but the real question was broader: how to activate the licenses through a useful employee experience.

The client already had an assistant developed with OpenAI technology. My first contribution was to frame the migration as a product and platform decision, not as a simple provider swap. The tutor had to create repeated usage, generate learning evidence and justify the new Gemini ecosystem through measurable adoption.

Client

Latin American airline with 50K Gemini licenses

Starting point

Existing English tutor assistant built on OpenAI technology

Design shift

From license purchase to adoption architecture

My role

Shape the target system and migration logic

02 / Challenge

The tutor had to drive license usage while reducing AI consumption cost

The challenge was not only to make the assistant work on Google technology. The client needed a system that employees would actually use, and a cost architecture that could survive tens of thousands of potential users.

I set a clear architectural objective: make the English tutor a compelling reason to use Gemini while reducing the cost of consumption. That meant designing for efficiency from day one: cache, routing, context discipline, RAG and an open-source path that could scale more cheaply when appropriate.

Adoption lever

Employees practice English through mobile or desktop sessions

Cost risk

Large usage volume can create material AI consumption cost

Architectural target

Activate Gemini without losing cost control

Proposal stance

Efficiency is a design requirement, not an afterthought

03 / Architecture design

I designed a shared Google-native platform with two interchangeable voice routes

The target system separates the assistant platform from the voice engine. Channel, identity, orchestration, RAG, cache, evaluation, observability, security and deployment remain shared, while the voice route can change depending on cost, quality and delivery needs.

Via A uses Gemini Live as the primary route to activate the licenses with a strong real-time experience. Via B uses LiveKit with a cascading pipeline of VAD, STT, LLM and TTS to introduce an open-source scaling option. This gave the client a practical baseline and a strategic escape valve for cost optimization.

Via A

Gemini Live session cycle for real-time voice experience

Via B

LiveKit cascade with interchangeable STT, LLM and TTS providers

Shared platform

RAG, cache, observability, security and deployment layers

My design choice

Keep the assistant stable while making the voice engine replaceable

04 / Cost optimization

I proposed five cumulative mechanisms to reduce consumption cost

One of the main axes of the proposal was cost efficiency. A system with tens of thousands of potential users can generate a relevant economic impact if repeated prompts, long contexts and expensive model choices are not controlled from the beginning.

The solution combines multilevel cache, model routing by task, context control, a governed knowledge brain plus RAG, and the open-source Via B. These mechanisms work together: cache avoids repeated calls, routing avoids overusing premium models, context control reduces token load, RAG grounds answers, and LiveKit gives the client another scaling route.

Cache

L0 exact, L1 semantic and L2 context caching

Routing

Use Gemini Flash, Pro or alternate providers based on task complexity

Context

Control prompts, reusable state and token budgets

Second brain

RAG improves answer quality while reducing free-form model dependency

05 / Investment and rollout

I translated the architecture into investment, phases and rollout governance

The proposal also needed to be executable. I mapped the target architecture into a four-month delivery plan with a lean team, a fixed investment frame and clear phases for architecture, migration, cache, integration, security, evaluation, pilot and go-live.

This is where the work moved from solution design to delivery planning. I broke the investment into phases, clarified what was included, separated cloud consumption assumptions from project cost, and showed how pilot validation and adoption management would take the assistant from design to rollout.

Investment

~EUR0.40M estimated project investment

Timeline

Four months to go-live and rollout start

Delivery model

Lean team, phased scope and pilot validation

My contribution

Connect architecture, cost logic, governance and implementation plan

What this proves

I connected product adoption, cloud architecture, FinOps and rollout planning.

The strongest signal in this project is the ability to turn a platform license investment into a concrete adoption product, and then support that product with architecture, cost controls, governance and an implementation roadmap.

I turned a license adoption problem into a product and platform architecture proposal.

I designed the migration from an OpenAI-based assistant toward a Google-native Gemini ecosystem.

I created a dual-route voice strategy: Gemini Live for the baseline and LiveKit for efficient scale.

I embedded cost optimization into the architecture through cache, routing, context control, RAG and provider flexibility.

I translated the solution into investment, delivery phases, pilot validation and rollout governance.

Architecture stackGemini LiveLiveKitADKVertex AI RAG EnginepgvectorL0/L1/L2 cacheOpenTelemetryBigQueryTerraform