Skip to main content

Text-to-speech & Speech-to-text

  • Can I use Plivo's text-to-speech feature for international calls?

    Yes, you can use text-to-speech (TTS) for international calls. The languages that you can use are listed here: https://api-reference.plivo.com/latest/python/elements/speak 

  • Can I transcribe international calls?

    Yes, you can transcribe international calls; however, the call recording must be in English.

  • I had a call recording for 19 seconds, but the transcription is blank. Why?

    Our transcription feature works for calls recorded in English with a maximum duration of 120 seconds and a minimum duration of 20 seconds. Any calls with duration outside the maximum or minimum will not be transcribed. If you have a specific need for long transcriptions, please reach out to your account manager or our Support Team.

  • How can I transcribe a voice call to text?

    Transcription ties in with recording features: if a call can be recorded, it can also be transcribed. There are two  functionalities in Plivo that let you record and transcribe a call:

    1. Record API
    2. Record XML

    With respect to transcription, all the features above use the same set of parameters:

    1. Transcription type: auto or hybrid 
    2. Transcription URL: the URL to which Plivo will send the transcription text once the call has been transcribed.
    3. Transcription method: the type of HTTP method used by Plivo to send the request to the transcription URL. The valid values are GET or POST.
  • What languages are supported for text-to-speech?

    Plivo supports 16 languages for the text-to-speech feature. Please click on the link below to check the list of languages we support: 

    https://api-reference.plivo.com/latest/python/elements/speak

    Note: If you are using special characters like é, you need to encode these characters using a numerical reference. You can find various online services to convert unicode characters into numeric expressions.

  • What is speech-to-text transcription?

    Transcription is a feature that converts recorded calls (in English) to text. There are two options for speech-to-text transcription. 

    1. Auto: $0.05/min

    Transcription is performed by a computer. This is extremely fast (typically under 5 minutes), but may not be as accurate as the hybrid method. 

    2. Hybrid: $0.35/min

    Transcription is done by a computer and checked by an actual person. This method has a higher quality overall and can usually be completed within 20 minutes. 

    Note: Our transcription service is primarily for voicemails and is limited to recorded files with a duration of up to two minutes.

  • What is text-to-speech?

    Text-to-speech (TTS) is a form of speech synthesis that converts text into spoken words. Text-to-speech is an easy way to add a dynamic voice to your phone capabilities. Unlike pre-recorded audio, this feature can be used to deliver spoken versions of text; live information (e.g., sports scores), emergency updates, and other dynamic messages. Simple changes in the code can allow you to change the message, the gender of the voice, determine who will receive your message, and much more.

    For more details, refer to the following guides:

    https://www.plivo.com/docs/getting-started/text-to-speech-on-a-call/ 

    https://www.plivo.com/docs/phlo/components-library/#play-audio