Back to BlogEngineering

How We Built an AI Calling Platform for INR 1.83 per Minute

10 February 2026


title: "How We Built an AI Calling Platform for INR 1.83 per Minute" date: "2026-02-10" category: "Engineering" excerpt: "A transparent breakdown of our voice AI cost structure. STT, LLM, TTS, and telephony — every component and what it costs."

Radical Transparency on Costs

Most AI calling companies hide their cost structure. We're doing the opposite. Here's exactly what every minute of an AI call costs us and how we keep it under INR 2.

The Voice AI Pipeline

Every AI call runs through five components:

  1. Telephony — connecting the phone call
  2. VAD — detecting when someone is speaking
  3. STT — converting speech to text
  4. LLM — generating the AI's response
  5. TTS — converting text back to speech

Each component has a cost. Let's break it down.

Component-by-Component Cost Breakdown

Telephony: Plivo — INR 0.40/min

We chose Plivo over Twilio because Plivo offers India-local numbers at INR 0.40/min vs Twilio's INR 0.80+/min. For a platform targeting Indian SMBs, this 50% saving on telephony matters.

STT: Deepgram Nova-2 — $0.0043/min (INR 0.36/min)

Deepgram Nova-2 gives us accurate real-time transcription with excellent Hindi support at $0.0043 per minute. This runs for the entire call duration since the AI needs to listen continuously.

LLM: GPT-4o-mini — $0.0026/min (INR 0.22/min)

We use GPT-4o-mini for response generation. At roughly 120 input tokens and 50 output tokens per speaking turn, each minute of conversation costs about $0.0026. The LLM is actually the cheapest component.

TTS: Cartesia Sonic — $0.0139/min (INR 1.16/min)

Text-to-Speech is our largest cost. The AI speaks roughly 50% of each call minute, generating about 375 characters per minute. Cartesia Sonic Multilingual gives us 40ms first-byte latency with native Hindi support.

VAD: Silero — Free

Voice Activity Detection runs locally using Silero's open-source model. No API cost.

Total: INR 1.83/min at 30,000 minutes/month

| Component | Provider | Cost/min (INR) | |-----------|----------|----------------| | Telephony | Plivo | 0.40 | | STT | Deepgram Nova-2 | 0.36 | | LLM | GPT-4o-mini | 0.22 | | TTS | Cartesia Sonic | 1.16 | | VAD | Silero (open-source) | 0.00 | | Total | | 1.83 |

Our Pricing Tiers

With a cost of INR 1.83/min, here's how our tariff-based pricing works:

  • Free (15 min): Try the platform at no cost — no credit card required
  • Starter (INR 5.5/min, 300 min): Revenue INR 1,650, cost INR 549, margin 67%
  • Growth (INR 5/min, 5,000 min): Revenue INR 25,000, cost INR 9,150, margin 63%
  • Scale (INR 4.5/min, 10,000 min): Revenue INR 45,000, cost INR 18,300, margin 59%

No monthly subscription — you pay per minute at a fixed rate. Higher volume tiers get lower rates. These margins fund engineering, infrastructure, support, and growth.

What's Next: Reducing Costs Further

We're exploring three paths to bring costs below INR 1.50/min:

Phase 2: Groq + Llama 70B — Replacing GPT-4o-mini with Groq-hosted Llama could reduce LLM costs by 40%.

Phase 3: IndiaAI Self-Hosted — The Indian government's IndiaAI initiative offers 40% GPU subsidies. Self-hosting models on subsidized H100s could cut both LLM and TTS costs.

TTS Optimization — Sarvam Bulbul v2 at INR 0.50/min could replace Cartesia for Hindi-only calls, saving INR 0.66/min per call.

Why We're Sharing This

Because we believe the AI calling market in India has a transparency problem. When competitors hide pricing behind "talk to sales," it usually means the price is too high for SMBs.

We want Indian businesses to know exactly what they're paying for and why.