Skip to main content

Text-to-Speech and Speech-to-Text

  • Can I use Plivo's text-to-speech feature for international calls?

    Yes, you can use text-to-speech (TTS) for international calls. Our documentation lists the languages that you can use.

  • Can I transcribe international calls?

    Yes, you can transcribe international calls; however, the call recording must be in English.

  • I had a call recording for 19 seconds but the transcription is blank. Why?

    Our transcription feature works for calls recorded in English only. Calls recorded in any other language will not be transcribed.

  • How can I transcribe a voice call to text?

    Transcription ties in with recording features: if a call can be recorded, it can also be transcribed. Plivo lets you record and transcribe a call through the Record API or Record XML. Both use the same set of parameters with respect to transcription:

    1. Transcription type: Auto
    2. Transcription URL: the URL to which Plivo will send the transcription text once the call has been transcribed.
    3. Transcription method: the type of HTTP method used by Plivo to send the request to the transcription URL. Valid values are GET and POST.
  • What languages are supported for text-to-speech?

    See our documentation for the full list of languages and voices.

  • What is speech-to-text transcription?

    Transcription converts recorded calls (in English) to text. Plivo provides two options for speech-to-text transcription. 

    1. Auto: $0.05/min

    Transcription is performed by a computer. This is extremely fast (typically under 5 minutes), but may not be as accurate as the hybrid method. 

    2. Hybrid: $0.35/min

    Transcription is done by a computer and checked by a person. This method has a higher quality overall and can usually be completed within 20 minutes. 

    Note: Our transcription service is primarily for voicemail and is limited to recorded files with durations of between 20 seconds and two minutes.

  • What is text-to-speech?

    Text-to-speech (TTS) is a form of speech synthesis that converts text into spoken words. Text-to-speech is an easy way to add dynamic voice to your phone capabilities. Unlike prerecorded audio, TTS can deliver spoken versions of dynamic messages such as sports scores or emergency updates. Simple changes in the code allow you to change the message, change the gender of the voice, and determine who should receive your message.

    For more details, refer to our reference guide to the Speak XML element and our documentation for the PHLO Play Audio component.