Skip to main content

Audio Stream

  • Can I switch the audio codec between each playAudio event?

    Currently, the ability to switch audio codecs between consecutive playAudio events is not supported. Attempting to do so could lead to a degradation in audio quality. It’s advisable to maintain consistency in the audio codec throughout the stream to ensure optimal experience.

  • What are the supported audio codecs?

    The compatible audio codecs include audio/x-l16 and audio/x-mulaw. Here are the corresponding sampling rates.

    Audio codec  Sampling rate
    audio/x-l16 8000 & 16000
    audio/x-mulaw 8000
  • How do I know if the buffered audio is played?

    When your intended audio is dispatched using the playAudio event, you can transmit the checkpoint event through a WebSocket. Plivo will generate a 'played' event in response once the preceding media events leading up to the checkpoint event have been successfully played back to the end user.



      "event": "checkpoint",

      "streamId": "20170ada-f610-433b-8758-c02a2aab3662",

      "name": "customer greeting audio"




      "event": "played",

      "streamId": "20170ada-f610-433b-8758-c02a2aab3662",

      "name": "customer greeting audio"

  • I need to start a bi-directional stream. Can I initiate the bi-directional stream element without including any other elements in my XML instruction?

    Yes, you can initiate a bi-directional stream with the keepCallAlive parameter set to true. With this setting, Plivo will initialize the audio stream and refrain from executing subsequent elements in the XML instruction. 

    Parameter Datatype Description
    keepCallAlive Boolean

    Specifies that the stream element should be the only one executed. This applies exclusively to the bi-directional audio stream.

    If keepCallAlive is set to true, any elements mentioned after the stream will not be executed. This is relevant only when initiating the stream during a call.

    Allowed values: true, false

    Default: false

  • Can I interrupt and clear buffered audio?

    Use the clearAudio event to interrupt or clear buffered audio previously sent to Plivo via the playAudio event. Plivo clears all buffered media events, enabling you to initiate new playAudio events tailored to a specific use case.

    Transmit the clearAudio event using the format outlined below. Use the same WebSocket connection with the steamId to interrupt the audio.

    Sample Request


      "event": "clearAudio",

      "streamId": "b77e037d-4119-44b5-902d-25826b654539"




      "sequenceNumber": 0,

      "event": "clearedAudio",

      "streamId": "20170ada-f610-433b-8758-c02a2aab3662"

  • What is an audio stream?

    Plivo’s audio streaming feature lets businesses stream raw audio from active calls in real-time to applications or third-party systems over WebSockets. 

    Businesses that use audio streaming can transmit raw audio from a live call to their applications or third-party systems. This raw audio can be utilized to integrate with AI bots via Amazon Lex or Google Dialogflow, enabling the creation of AI virtual assistants to engage with your customers. Alternatively, you can integrate it with transcription services such as Amazon Transcribe or Google Speech-to-Text for real-time transcriptions.

    Refer to our API and XML documentation for more information.

  • Does Plivo support bi-directional audio streaming?

    Yes, Plivo supports bi-directional audio streaming. You can transmit raw audio to Plivo by passing the bi-directional parameter as true when initiating the audio stream. Plivo will then relay this audio back to the caller or end user during the call.

  • What is the expected payload format to send audio to Plivo?

    Plivo expects the audio to stream in a specific format. Below is the expected payload to send audio to Plivo.

     "event": "playAudio",
     "media": {
       "contentType": "audio/x-l16 or audio/x-mulaw",
       "sampleRate": 8000,
       "payload": "base64 encoded raw audio.."
    • contentType: can be (audio/x-l16, audio/x-mulaw)
    • sampleRate: 8000 or 16000
    • payload: base64 encoded raw audio
  • How much does an audio stream cost?

    Audio streaming is priced at $0.003 per minute per stream, over and above the expected charges for voice minutes associated with a call.

  • Where can I see the debug logs for audio streams?

    Plivo presents the audio stream details associated with the call on its dedicated CDR page. Find this page by following the steps below. 

    1. Navigate to the call logs screen.
    2. Choose the call sample that encountered an audio stream problem.
    3. Scroll down to the audio stream section.
    4. Select debug logs for further insights into the audio stream.
  • What is the billing interval for audio streams?

    The regular billing interval that applies to your calls is the same pricing structure for audio streams. 

  • What are the callbacks sent to my application service?

    Plivo posts the events to your application server if the status_callback_url is configured during the stream initiation. Please refer to our documentation to get a comprehensive list of all events sent to your status callback URL.

  • Will Plivo retry if there is a connection failure to my WebSocket?

    In the event of an unsuccessful connection, either on the initial connection attempt or if an established connection is dropped, Plivo will retry the specified WebSocket connection twice before disconnecting.

  • What is the format in which Plivo sends the message over WebSocket?

    The details on the WebSocket format are as below:

    Event on starting the audio stream


      "sequenceNumber": 0,

      "event": "start",

      "start": {

        "callId": "8c43a765-94fa-4ee9-b9a3-242703e41f63",

        "streamId": "b77e037d-4119-44b5-902d-25826b654539",

        "accountId": "155747",

        "tracks": [




        "mediaFormat": {

          "encoding": "audio/x-l16",

          "sampleRate": 8000



      "extra_headers": "{}"


    Event on receiving an inbound media event


      "sequenceNumber": 887,

      "streamId": "20170ada-f610-433b-8758-c02a2aab3662",

      "event": "media",

      "media": {

        "track": "inbound",

        "timestamp": "1687353805345",

        "chunk": 469,



      "extra_headers": "{}"


    A similar event is sent for outbound audio streams; for them, the track value is “outbound.”