Plivo now supports Amazon Polly, with more than 40 voices, 27 languages, and new APIs to give developers control over synthesized speech output in applications that use text-to-speech. With Amazon Polly, developers can control the volume, pitch, rate, and pronunciation of the voices that interact with their users.
Text-to-speech is an important tool in a developer’s armory. It allows developers to create interactive voice applications by generating speech dynamically, rather than playing prerecorded media files. But with simple text-to-speech, developers can only choose from a basic male or female voice in a subset of languages, without pauses, tonal modulations, or other qualities that a natural speech possesses. The result is often mechanical-sounding speech, in a limited set of languages, without any choice of voice or tones. That doesn’t provide a lifelike experience to customers.
Enter Speech Synthesis Markup Language (SSML), designed by the W3C to provide an XML-based markup language to assist in generating natural-sounding synthesized speech. Amazon Polly is the world leader in SSML speech synthesis.
For text-to-speech, listening is believing. Listen to the difference between basic text-to-speech vs. Amazon Polly:
With Amazon Polly’s dozens of lifelike voices across a variety of languages, you can now select the ideal voice, adjust speech rate, pitch, loudness, and even emphasis to provide a localized voice experience to your customers.
Integrating advanced text-to-speech in your application with Plivo
To synthesize SSML speech on Plivo, simply specify one of the many Amazon Polly voices in the voice attribute of Plivo’s <Speak> XML. Note that Polly voices must be namespaced with Polly.
For example:
We’ve created a Getting Started with SSML page that documents the SSML tags supported for use in Plivo’s XML and lists the Amazon Polly voices supported for use with Plivo XML. Check it out to see what’s possible SSML and Amazon Polly.