From CPaaS to Voice AI Agents: What a Decade of Global Infrastructure Taught Us
Plivo powers billions of voice minutes across 190+ countries. We’ve spent over a decade building carrier relationships, optimizing global routing, and fixing infrastructure problems that most companies never encounter.
When Voice AI stopped being a research idea and started showing up in real products, we didn’t feel the need to change direction. Because this is what we’d already been building towards - for over 10 years!
The global network. The carrier connections. The ability to run voice reliably at scale. None of this was old tech we had to move away from. They were the foundation for what comes next.
So, what changed recently?
Two things.
First - The AI caught up.
LLMs got fast enough for real-time conversation. Response times that once took seconds now happen in milliseconds. Speech-to-text and text-to-speech improved dramatically. The core AI components for voice agents finally work well enough for production use.
Second - The market moved past demos.
Businesses aren’t experimenting with Voice AI anymore, they’re trying to deploy it. And they’re discovering that the impressive demo doesn’t survive contact with production.
Calls drop in certain regions. Latency spikes under load. Audio quality degrades. Carriers reject traffic.
The AI works fine. But everything around it breaks.
What We Started Noticing
Over the past two years, we started noticing the same thing again and again. Voice AI companies would build on top of our telephony APIs. Their demos were impressive - fast responses, natural conversations, slick interfaces.
Then they’d scale. And things started to break.
What worked fine in a demo didn’t hold up in the real world.
The same issues appeared every time:
- Latency that was acceptable in a demo became unacceptable across real networks
- Systems built in one region struggled when calls came in globally
- Reliability assumptions broke down when concurrent calls started to appear
- Carrier complexity they’d never considered, suddenly mattered
The AI wasn’t the problem. The infrastructure was.
These weren’t edge cases. They were predictable consequences of building Voice AI without understanding how voice actually works at scale.
The 80/20 Reality
Here’s something that isn’t discussed enough in Voice AI: the AI is only part of the delay.
When a caller speaks to a voice agent, time passes in multiple places:
- Some of it is obvious. The AI has to listen, think, and speak back: AI processing
- But a lot of time is also spent behind the scenes. Calls moving across networks. Carriers routing audio. Distance between regions. Small bits of delay adding up: The Infrastructure
Most Voice AI conversation focuses on optimizing the first bucket. Faster models, better prompts, streaming responses.
But the second bucket often contributes just as much total latency - And teams focused only on AI usually don’t notice it until users do.
We’ve spent 10 years optimizing that second bucket. Shaving milliseconds off carrier routing. Building points of presence closer to users. Establishing direct relationships with carriers instead of routing through intermediaries.
You won’t see this in AI benchmarks.
But you feel it when you use the product.
A voice agent with a 200ms LLM response time feels slow if there’s 300ms of infrastructure latency on top of it.
And another voice agent with a 300ms LLM response time will feel snappy and fast if the infrastructure adds only 50ms.
The total user experience is what matters. And this experience depends on both halves.
What This Means
Voice AI is entering a new phase - the ‘Infrastructure’ phase.
The early excitement was about AI - could it understand speech? Could it generate natural responses? Could it handle a real conversation? Those questions are largely answered. The models work.
Now the questions are different. Can it work globally? Can it handle thousands of concurrent calls? Can it maintain quality when the network isn’t perfect? Can it integrate with existing phone systems?
These aren’t AI questions. These are infrastructure questions.
And they’re the questions that determine whether Voice AI moves from impressive demos to actual products.
We’ve spent a decade answering these questions for CPaaS. The same discipline, reliability engineering, global optimization, carrier expertise, applies directly to Voice AI.
The difference is that now, the intelligence layer is ready to take advantage of it.
Building What’s Next
The best Voice AI experiences will be built by teams that understand both halves of the equation.
The AI half is getting all the attention.
The infrastructure half is where the real differentiation happens.
We’re not starting from scratch. We’re building on a foundation that’s been tested by billions of calls, refined over a decade, and deployed across every major market in the world.
That’s the advantage of having spent 10 years in voice before Voice AI arrived.
If you’re working on voice agents and hitting scaling challenges, we should talk.





