GStreamer Opus Audio Encapsulation and Transport
This article provides a technical overview of how GStreamer pipelines capture, encode, encapsulate, and transport libopus compressed audio. It explains the critical GStreamer elements required for containerizing Opus streams into formats like Ogg, Matroska, and RTP, and demonstrates how to construct pipelines for both storage and real-time network streaming.
Encoding Audio with libopus in GStreamer
The transition from raw audio (PCM) to a deployable Opus stream
begins with the opusenc element. This element wraps the
upstream libopus library, converting raw, uncompressed
audio into encoded Opus packets.
Because Opus natively supports flexible sample rates (8 kHz to 48
kHz) and channels (1 to 255), the pipeline typically uses helper
elements like audioconvert and audioresample
before the encoder to negotiate the correct input format.
[Audio Source] -> [audioconvert] -> [audioresample] -> [opusenc]
Within opusenc, developers can configure parameters that
dictate how the audio is encoded: * bitrate: Sets the
target bitrate in bits per second. * frame-size:
Adjusts the latency (ranging from 2.5 ms to 60 ms). *
bandwidth: Constrains the audio bandwidth (narrowband,
mediumband, wideband, super-wideband, or fullband). *
audio-type: Optimizes encoding for either “voice”
(speech) or “generic” (music/mixed audio).
Encapsulation Formats
Raw Opus packets lack timing and synchronization metadata, making them unsuitable for raw transport or storage without a container. GStreamer handles this by passing the encoded payload to specific multiplexing (“muxing”) elements.
1. Ogg Encapsulation (for local storage or icecast)
The Ogg container is the traditional standard for Opus audio files.
In GStreamer, the oggmux element wraps the Opus packets
into Ogg pages. * Element pipeline:
opusenc ! oggmux ! filesink * Use case:
Creating standard .opus or .ogg files playable
by most media players.
2. Matroska and WebM Encapsulation (for video/audio synchronization)
When multiplexing Opus audio with video (such as VP8, VP9, or
H.264/H.265), GStreamer uses matroskamux or
webmmux. * Element pipeline:
opusenc ! webmmux ! filesink * Use case:
HTML5-compliant WebM files and MKV video containers.
3. RTP Payload Encapsulation (for real-time streaming)
For low-latency network streaming, such as WebRTC or VoIP, GStreamer
encapsulates Opus packets into Real-time Transport Protocol (RTP)
packets. This is handled by the rtpopuspay (payloader)
element, which formats the data according to the RFC 7587 specification.
* Element pipeline:
opusenc ! rtpopuspay ! udpsink * Use case:
Real-time broadcast and interactive communication.
Practical Pipeline Examples
The following command-line examples demonstrate how GStreamer
pipelines construct and execute encapsulation and transport in practice
using gst-launch-1.0.
Example 1: Encapsulating Opus into an Ogg File
This pipeline generates a test tone, encodes it using
libopus, packages it into an Ogg container, and saves it
locally.
gst-launch-1.0 audiotestsrc num-buffers=200 ! \
audioconvert ! \
audioresample ! \
opusenc bitrate=64000 ! \
oggmux ! \
filesink location=output.opusExample 2: Transporting Opus over RTP (UDP)
This sender-receiver pair demonstrates how to transmit Opus audio over a network.
Sender Pipeline: The sender encodes the audio, wraps
it in RTP packets using rtpopuspay, and sends it over UDP
to port 5004.
gst-launch-1.0 audiotestsrc is-live=true ! \
audioconvert ! \
audioresample ! \
opusenc frame-size=20 bitrate=96000 ! \
rtpopuspay ! \
udpsink host=127.0.0.1 port=5004Receiver Pipeline: The receiver listens on UDP port
5004, decodes the RTP packets with rtpopusdepay, processes
them back to raw audio, and plays them through the default audio
output.
gst-launch-1.0 udpsrc port=5004 caps="application/x-rtp,media=audio,clock-rate=48000,encoding-name=OPUS,payload=96" ! \
rtpopusdepay ! \
opusdec ! \
audioconvert ! \
audioresample ! \
autoaudiosinkDemuxing and Depayloading
On the receiving or playback end of a GStreamer pipeline, the
encapsulation process must be reversed: * Demuxing:
Elements like oggdemux or matroskademux
extract the raw Opus stream from their container wrappers. *
Depayloading: For network streams,
rtpopusdepay strips the RTP headers, reconstructing the
ordered sequence of Opus packets.
Once the container or transport headers are stripped, the packets are
forwarded to the opusdec element, which leverages
libopus to output the raw PCM audio stream for system
playback.