Stories linking everyone in Telecom

Table of Contents

Despite it being a critical service for operators, actual voice AI cost per minute is still an open question in telecom. Yes, voice AI services and the use of AI voice agents are growing – fast – and genuine revenue is coming in. But ask most AI voice agent providers what it really costs to serve a specific customer, and the answer is not a number. It’s a spreadsheet exercise that happens once a month, if it happens at all.

The question behind the per-minute rate

Behind every customer interaction, multiple vendors are generating charges simultaneously. Worse, each charge in that overall voice AI cost is calculated in different units, on different billing cycles, and in different dashboards. The company bills their customer one number. Its vendors collectively bill something else. The gap between those two figures is where margin lives – and, for most voice AI businesses, that gap is not measured per account.

So we’ve decided to dive deep into voice AI cost on the granular level. We’re going to break down all the charges behind a single voice AI call: the services running in parallel, what each one costs at current market rates, where production adds charges that no rate card accounts for, and why per-customer profitability for AI voice agents is hard to pin down. We drew all of the numbers we use here from published provider rates and public pricing guides. (And we tell you which ones.)

Our aim is not to argue that voice AI pricing is too high. Rather, we’re trying to reveal how fragmented the cost structure is across providers in a way that makes per-account attribution difficult. We’re also going to show you how that lack of transparency has an impact – one that only grows as the volume of calls grows and AI voice contracts get larger.

The four services behind one AI voice call

Four services behind one AI voice call

Platforms typically advertise their voice AI cost as a per-minute rate. In developer-oriented, component-based setups, that rate covers the orchestration layer – the software managing the conversation flow. The underlying AI and telephony services are billed separately.

Here is what each of these services cost in mid-2026. These figures come from provider pricing pages and public rate guides:

Speech-to-text: Deepgram’s Nova-3, widely used in voice agent stacks, lists at $0.0077/min for streaming audio on its pay-as-you-go tier, dropping to $0.0065/min on the Growth plan. Pre-recorded transcription is lower at $0.0043/min. Deepgram bills per second, not rounded to the minute.

Large language model: Most 2026 voice AI cost references cite GPT-4o. (OpenAI retired this from ChatGPT in February 2026 but continues to offer it through its developer API – the route most voice AI platforms use.) At typical conversational token volumes, GPT-4o-equivalent inference runs $0.01–0.03/min. Heavier models or longer context windows push this higher. Lighter alternatives bring it down. LLM inference is priced per token, so the per-minute equivalent varies with the amount of text the conversation generates and which model the platform routes to.

Text-to-speech: ElevenLabs bills in credits mapped to characters (1 credit per character on Multilingual v2, 0.5 on Flash and Turbo models) with overage rates from $0.12 to $0.30 per 1,000 characters, depending on plan tier. Conversational AI agents are billed per minute. Cartesia and Deepgram Aura are priced per 1,000 characters at lower rates. Secondary sources convert these to per-minute equivalents ranging from $0.03–$0.10/min for ElevenLabs and $0.03–$0.04 per minute for Cartesia. (These conversions depend on speech rate and output length.)

Telephony: Twilio, the most common carrier for voice AI, lists US local rates at $0.0085/min for inbound and $0.014/min for outbound. International destinations vary significantly. Telnyx and other carriers offer comparable domestic pricing.

Costs in concert

How does it all look together? Here is a sample stack for one three-minute inbound call, using mid-range providers:

ComponentProvider exampleRate3-min call
STTDeepgram Nova-3 (streaming)$0.0077/min$0.023
LLMGPT-4o via API (conversational)~$0.02/min$0.060
TTSElevenLabs (Multilingual, equiv.)~$0.06/min$0.180
TelephonyTwilio (US inbound)$0.0085/min$0.026
Total~$0.10/min$0.289

The platform orchestration fee sits on top of this. Managed platforms that bundle these services into a single rate typically charge $0.25–$0.50/min all-in, according to Aircall’s 2026 report. In BYOK setups, Klariqo puts the lower bound at $0.13–$0.14/min with budget providers. Retell, testing against GPT-4o with Deepgram and ElevenLabs, reports $0.25–$0.33/min.

The spread is wide because provider configuration – not the platform’s headline – determines the real voice AI cost per minute. Two companies on the same platform, running different model and voice setups, will have materially different economics per interaction.

The voice AI cost that only appears after the call runs

The provider rate cards might show the units, but they don’t show the hidden costs of voice AI that can stack during a real conversation.

The costs that only appear after the call runs

LLM inference is priced per token, but token consumption during a call is not constant. As the conversation progresses, the context window grows as the model carries more history into each new response. Softcery’s voice AI calculator, which is calibrated against production deployment data, applies a 1.8× multiplier to the base LLM figure to account for context growth, interruptions, and tool calls. True, that is one benchmark from one production dataset, not a universal constant. But it reflects what extended conversations look like compared to rate-card assumptions.

Tool calls add to the bill separately. When the agent looks up a booking, checks a CRM record, or queries a knowledge base mid-conversation, those API requests generate charges outside the four core services. In multi-step workflows, these charges quickly compound to increase the overall voice AI cost.

Then there is dead time. Most platforms meter the full duration of the interaction including silence, hold, and ringing. On a two-minute call with thirty seconds of dead air, roughly a quarter of the invoice covers no productive exchange.

These charges emerge from how the conversation runs in production, not from how vendors list services on their pricing page.

What 10,000 minutes a month exposes in your overall voice AI cost

What 10,000 minutes a month exposes

At low volumes (say, a few hundred interactions a month) the gap between the published rate and the all-in number becomes absorbed into general overhead. Nobody tracks this part of voice AI cost in detail, and it doesn’t attract attention.

But, when the numbers increase (for example, to 10,000 minutes per month), things change quickly. A $0.20/min discrepancy across that volume is $2,000 per month in unattributed spend. At 50,000 minutes, that grows to $10,000. Multiple providers estimate enterprise voice AI deployments at $40,000 to$70,000 per year – figures that include these stacked charges whether or not anyone is reconciling them.

Volume does not create one cost problem. Rather, each account creates its own cost problem. A two-minute appointment confirmation on a lightweight model carries a different per-minute profile from a ten-minute support interaction that triggers CRM lookups and runs on a heavier LLM. Both accounts might sit on the same flat rate. However, one generates healthy margins, and the other depends on a per-account breakdown that most companies don’t produce.

The bottom line is that voice AI unit economics start to diverge by customer, but the information to see it – split by account, interaction type, and model – doesn’t exist inside of any single provider dashboard. It sits across four to six separate invoices, reconciled monthly, in a spreadsheet.

The question that decides margin

The question that decides margin

The answer to the question “what did it cost to serve Customer X last month?” requires every component – STT, LLM, TTS, telephony, tool calls, overhead – attributed to a single account, broken down by service rather than averaged across the customer base or estimated from the platform rate.

Deriving the figure for this overall voice AI cost today means pulling records from four to six provider dashboards, normalizing different units (tokens, minutes, characters, API calls), and reconciling them by hand. This might happen once a month for companies that attempt it. For most companies, it doesn’t happen at all.

The consequence of that lack of transparency is operational. Companies set their rates before this breakdown existed. They signed contracts against estimated margins. They designed new pricing tiers around assumptions that were not verified against real per-account economics.

Here is the sequence most voice AI companies run today: an interaction happens; each provider records a charge in a different unit; those records sit in separate dashboards; finance reconciles at month end; pricing is decided based on averages. The information to support those decisions exists, but it is sitting in vendor invoices and API logs, disconnected from what a company bills its customer on the other side.

Who has solved this problem before

Long before voice AI cost even existed as a question, telecoms had already spent decades learning this lesson. In the old days, telecoms did not treat a call as a single billing record. Instead, the cost of each call was a chain of supplier charges, rating rules, settlement records, and customer billing events. Eventually, however, telecoms learned to connect these charges at the transaction level, per interaction, in real time.

Where this has been solved before

Voice AI is facing the same structural problem much earlier in the evolution of its market. One customer interaction can touch a platform, an STT provider, an LLM, a TTS provider, a carrier, and several business tools. Each one bills in a different unit, on a different cycle, with no one knowing what the real overall voice AI cost is for a given customer at all.

So, the company running the service still has to answer one question: Did this customer make money last month?

That answer should not require four dashboards and a spreadsheet. It should come from the same system that connects usage, provider cost, customer pricing, and billing – per interaction, per account. So, where and how is the problem of transparency for per-customer, per-minute voice AI cost being solved right now? Click the button below find out.

Share this story