Skip to main content

Automatic Speech Recognition

  • Can I build voice-controlled IVRs for my calls on Plivo?

    Plivo GetInput XML comes with built-in support for Automatic Speech Recognition. 

    You can use the GetInput XML to prompt users to provide their input and gather their voice responses in real-time.

    Recorded voice responses are transcribed and forwarded to the specified Action URL instantly. 

    You may then use this transcription to identify the caller's intent and accordingly respond with another Plivo XML.

    To learn more about Automatic Speech Recognition with GetInput XML, please refer to our detailed reference documentation here.

  • Can I gather speech input from callers in real-time?

    Yes. Plivo GetInput XML can be used to gather speech input from callers in real-time. 

    Recorded voice input can be collected using the GetInput XML. These recordings are transcribed instantly and forwarded to the Action URL specified in the XML.

    Learn more about working with GetInput XML by clicking here.

  • Can I redact speech and DTMF inputs gathered through GetInput XML?

    The log parameter of the GetInput XML can be used to redact DTMF and speech inputs gathered from the caller. 

    When log redaction is enabled, DTMF digits and speech transcriptions are not logged on Plivo servers. 

    Please note that if log redaction is enabled, the above information will not be accessible through the debug logs for the call on your Plivo Console.

  • How can I improve the accuracy of GetInput XML speech recognition?

    You may be able to improve the accuracy of speech recognition of Plivo GetInput XML by experimenting with the following features.

    Speech Adaptation with Hints

    Provide a set of hint words and phrases to improve speech recognition accuracy. This feature can greatly improve transcription accuracy of proper nouns, homophones (ex: one, won), and domain-specific words rarely used in the general parlance. You can build a repository of words and phrases expected from the speaker in the hints attribute of GetInput XML. 

    Speech Models

    Experiment with the pre-built speech recognition models depending on your use case. We recommend using the command_and_search model for command-driven IVRs, and the phone_call model for more informal speech.

    To learn more about Automatic Speech Recognition with GetInput XML, read our detailed reference documentation here.

  • How do I limit the maximum duration of speech recognized by Plivo?

    You can limit the maximum duration of speech recognized in a GetInput XML execution through the executionTimeout parameter of the GetInput XML. Note that a maximum of 60 seconds of speech can be recognized in one execution of the GetInput XML.

    To learn more about speech recognition-related timeouts, check out our reference guide here.

  • How does Plivo charge for Automatic Speech Recognition?

    Plivo charges for Automatic Speech Recognition based on the usage of Plivo’s speech recognition engine. For every GetInput XML execution, charges are applied based on the duration of speech analyzed.

    Charges are computed on a 15-second pulse. For example, if speech is recognized for 35 seconds, the account would be billed for 45 seconds (15 * 3) of speech.

    Charges apply only to speech input detection. DTMF input detection with GetInput XML is not charged. So, if you’re using GetInput with inputType set to ‘dtmf’, you will not be charged.

    Head to our pricing page for the current pricing for speech recognition.

  • How does Plivo’s Automatic Speech Recognition work?

    Plivo GetInput XML comes with built-in support for Automatic Speech Recognition. Use the GetInput XML to prompt users for input and to gather their voice responses in real-time. Gathered speech is transcribed and forwarded to the specified Action URL instantly. 

    You can control speech recognition behavior using GetInput XML parameters such as speechEndTimeout, executionTimeout, hints, and speechModel.  To learn more about Automatic Speech Recognition using the GetInput XML, check out our detailed reference guide here.

  • How does the profanity filter on GetInput XML work?

    The profanity filter detects common expletives in a caller’s speech. These words will be redacted in the transcription posted to the Action URL. Words that are filtered out will contain their first letter and asterisks for the remaining characters (e.g. f***). 

    Note that the profanity filter operates on single words. It does not detect abusive or offensive speech that is a phrase or a combination of words.

  • Is it possible to gather DTMF and speech inputs simultaneously?

    Yes. GetInput XML supports the simultaneous detection of DTMF and speech inputs. This can be enabled by setting the inputType param to ‘dtmf speech’ in the XML response. 

    The first input to be detected will be forwarded to the Action URL that you have specified. For example, in response to a prompt like ‘Press 1 or say Yes to accept.’, if the caller presses 1 before saying anything, then the digit 1 will be forwarded to the Action URL. 

    To learn more about detecting DTMF and speech with the GetInput XML, check out our detailed reference guide here.

  • What languages are supported by Plivo’s Automatic Speech Recognition engine?

    Please click here for our most up-to-date list of languages supported by Plivo’s Automatic Speech Recognition engine.

  • Why was I charged for speech recognition when no speech was gathered?

    Plivo charges for Automatic Speech Recognition based on the usage of Plivo’s speech recognition engine. It is possible that the caller may not have spoken for the duration for which speech recognition was enabled. 

    In cases where simul-input (inputType=’dtmf speech’) is enabled, the caller may have chosen to enter a digit instead of responding with speech. These cases still engage Plivo’s speech recognition engine, and therefore charges for speech recognition would be applicable.