Skip to main content

Automatic Speech Recognition

  • Can I build voice-controlled IVR menus for my calls on Plivo?

    Plivo's GetInput XML element comes with built-in support for automatic speech recognition (ASR). You can use the GetInput XML element to prompt users to provide input and gather their voice responses in real time. Recorded voice responses are transcribed and forwarded to a specified Action URL instantly. You can use the transcription to identify the caller's intent and respond with another Plivo XML element.

  • Can I gather speech input from callers in real time?

    Yes. You can use Plivo's GetInput XML element to prompt users to provide input and gather their voice responses in real time. Recorded voice responses are transcribed and forwarded to a specified Action URL instantly.

  • Can I redact speech and DTMF inputs gathered through the GetInput XML element?

    You can use the log parameter of the GetInput XML element to redact DTMF and speech inputs gathered from a caller. 

    When log redaction is enabled, DTMF digits and speech transcriptions are not logged on Plivo servers. 

    Please note that if log redaction is enabled, the redacted information is not accessible through the debug logs for the call on your Plivo console.

  • How can I improve the accuracy of GetInput XML speech recognition?

    You may be able to improve the accuracy of speech recognition by experimenting with a couple of GetInput XML features.

    Speech recognition hints

    Provide a set of hint words and phrases to improve speech recognition accuracy. This feature can greatly improve transcription accuracy of proper nouns, homophones (such as one and won), and domain-specific words rarely used in the general parlance. You can build a repository of words and phrases expected from speakers in the hints attribute of the GetInput XML element. 

    Speech models

    Experiment with the prebuilt speech recognition models. We recommend using the command_and_search model for command-driven IVR applications, and the phone_call model for more informal speech.

  • How do I limit the maximum duration of speech recognized by Plivo?

    You can limit the maximum duration of speech recognized in a GetInput XML execution through the executionTimeout parameter, which specifies the maximum execution time, in seconds, for which input detection is carried out. It defaults to 15, and accepts values between 5 and 60.

  • How does Plivo charge for automatic speech recognition?

    Plivo charges for automatic speech recognition based on the usage of Plivo’s speech recognition engine. For every GetInput XML execution, charges are applied based on the duration of the speech analyzed.

    Charges are computed per 15-second pulse. For example, if speech is recognized for 35 seconds, the account would be billed for 45 seconds (15 * 3) of speech.

    Charges apply only to speech input detection. DTMF input detection with GetInput XML is not charged. So, if you’re using GetInput with inputType set to "dtmf," you will not be charged.

    Visit our pricing page for the current pricing for speech recognition.

  • How does Plivo’s automatic speech recognition work?

    Plivo's GetInput XML element comes with built-in support for automatic speech recognition. You can use GetInput XML to prompt users for input and to gather their voice responses in real time. Gathered speech is transcribed and forwarded to the specified action URL instantly. 

    You can control speech recognition behavior using GetInput XML parameters such as speechEndTimeout, executionTimeout, hints, and speechModel. To learn more about automatic speech recognition using the GetInput XML element, visit our GetInput reference guide.

  • How does the GetInput XML profanity filter work?

    The profanity filter detects common expletives in a caller’s speech and redacts them in the transcription posted to the Action URL. Words that are filtered out are represented by their first letter and asterisks for the remaining characters (e.g. f***). 

    Note that the profanity filter operates on single words. It does not detect abusive or offensive speech that's a phrase or a combination of words.

  • Is it possible to gather DTMF and speech inputs simultaneously?

    Yes. GetInput XML supports simultaneous detection of DTMF and speech inputs. Enable this functionality by setting the inputType parameter to "dtmf speech". With that setting, the first input to be detected is forwarded to the Action URL you specify. For example, in response to a prompt like Press 1 or say Yes to accept, if the caller presses 1 before saying anything, then the digit 1 is forwarded to the Action URL.

  • What languages are supported by Plivo’s automatic speech recognition engine?

    Plivo’s automatic speech recognition engine supports the list of languages on this documentation page.

  • Why was I charged for speech recognition when no speech was gathered?

    Plivo charges for automatic speech recognition based on the usage of Plivo’s speech recognition engine. It's possible that the caller may not have spoken for the duration for which speech recognition was enabled. 

    In cases where the inputType attribute of the GetInput XML element is set to "dtmf speech", a caller may enter a digit instead of responding with speech. These cases still engage Plivo’s speech recognition engine, and therefore incur charges for speech recognition.