GStreamer Opus Audio Encapsulation and Transport

This article provides a technical overview of how GStreamer pipelines capture, encode, encapsulate, and transport libopus compressed audio. It explains the critical GStreamer elements required for containerizing Opus streams into formats like Ogg, Matroska, and RTP, and demonstrates how to construct pipelines for both storage and real-time network streaming.

Encoding Audio with libopus in GStreamer

The transition from raw audio (PCM) to a deployable Opus stream begins with the opusenc element. This element wraps the upstream libopus library, converting raw, uncompressed audio into encoded Opus packets.

Because Opus natively supports flexible sample rates (8 kHz to 48 kHz) and channels (1 to 255), the pipeline typically uses helper elements like audioconvert and audioresample before the encoder to negotiate the correct input format.

[Audio Source] -> [audioconvert] -> [audioresample] -> [opusenc]

Within opusenc, developers can configure parameters that dictate how the audio is encoded: * bitrate: Sets the target bitrate in bits per second. * frame-size: Adjusts the latency (ranging from 2.5 ms to 60 ms). * bandwidth: Constrains the audio bandwidth (narrowband, mediumband, wideband, super-wideband, or fullband). * audio-type: Optimizes encoding for either “voice” (speech) or “generic” (music/mixed audio).

Encapsulation Formats

Raw Opus packets lack timing and synchronization metadata, making them unsuitable for raw transport or storage without a container. GStreamer handles this by passing the encoded payload to specific multiplexing (“muxing”) elements.

1. Ogg Encapsulation (for local storage or icecast)

The Ogg container is the traditional standard for Opus audio files. In GStreamer, the oggmux element wraps the Opus packets into Ogg pages. * Element pipeline: opusenc ! oggmux ! filesink * Use case: Creating standard .opus or .ogg files playable by most media players.

2. Matroska and WebM Encapsulation (for video/audio synchronization)

When multiplexing Opus audio with video (such as VP8, VP9, or H.264/H.265), GStreamer uses matroskamux or webmmux. * Element pipeline: opusenc ! webmmux ! filesink * Use case: HTML5-compliant WebM files and MKV video containers.

3. RTP Payload Encapsulation (for real-time streaming)

For low-latency network streaming, such as WebRTC or VoIP, GStreamer encapsulates Opus packets into Real-time Transport Protocol (RTP) packets. This is handled by the rtpopuspay (payloader) element, which formats the data according to the RFC 7587 specification. * Element pipeline: opusenc ! rtpopuspay ! udpsink * Use case: Real-time broadcast and interactive communication.


Practical Pipeline Examples

The following command-line examples demonstrate how GStreamer pipelines construct and execute encapsulation and transport in practice using gst-launch-1.0.

Example 1: Encapsulating Opus into an Ogg File

This pipeline generates a test tone, encodes it using libopus, packages it into an Ogg container, and saves it locally.

gst-launch-1.0 audiotestsrc num-buffers=200 ! \
  audioconvert ! \
  audioresample ! \
  opusenc bitrate=64000 ! \
  oggmux ! \
  filesink location=output.opus

Example 2: Transporting Opus over RTP (UDP)

This sender-receiver pair demonstrates how to transmit Opus audio over a network.

Sender Pipeline: The sender encodes the audio, wraps it in RTP packets using rtpopuspay, and sends it over UDP to port 5004.

gst-launch-1.0 audiotestsrc is-live=true ! \
  audioconvert ! \
  audioresample ! \
  opusenc frame-size=20 bitrate=96000 ! \
  rtpopuspay ! \
  udpsink host=127.0.0.1 port=5004

Receiver Pipeline: The receiver listens on UDP port 5004, decodes the RTP packets with rtpopusdepay, processes them back to raw audio, and plays them through the default audio output.

gst-launch-1.0 udpsrc port=5004 caps="application/x-rtp,media=audio,clock-rate=48000,encoding-name=OPUS,payload=96" ! \
  rtpopusdepay ! \
  opusdec ! \
  audioconvert ! \
  audioresample ! \
  autoaudiosink

Demuxing and Depayloading

On the receiving or playback end of a GStreamer pipeline, the encapsulation process must be reversed: * Demuxing: Elements like oggdemux or matroskademux extract the raw Opus stream from their container wrappers. * Depayloading: For network streams, rtpopusdepay strips the RTP headers, reconstructing the ordered sequence of Opus packets.

Once the container or transport headers are stripped, the packets are forwarded to the opusdec element, which leverages libopus to output the raw PCM audio stream for system playback.