How to Stream Libopus Packets over RTP
This article outlines the standard, RFC-compliant methodology for streaming libopus audio packets over Real-time Transport Protocol (RTP) networks. It details the core specifications defined in RFC 7587, including the mandatory 48 kHz clock rate rule, Session Description Protocol (SDP) parameters, packetization configurations, and best practices for achieving low-latency, high-resilience audio transmission.
The Standard: RFC 7587
The strictly recommended specification for encapsulating Opus audio in RTP is RFC 7587 (“RTP Payload Format for the Opus Speech and Audio Codec”). To ensure interoperability across different VoIP, WebRTC, and broadcast systems, your streaming implementation must strictly adhere to this standard rather than using custom framing.
The RTP Timestamp and Clock Rate Rule
Unlike other codecs where the RTP timestamp clock rate matches the sampling rate of the encoded audio, RFC 7587 defines a strict rule for Opus:
- The RTP timestamp clock rate MUST always be 48,000 Hz, regardless of the actual sampling rate of the audio being encoded (which can be 8, 12, 16, 24, or 48 kHz).
- The RTP timestamp increases by 48,000 per second. For example, a 20 ms packet always increases the RTP timestamp by exactly 960 (0.020 seconds × 48,000), even if you are transmitting narrow-band 8 kHz audio.
- Decoders must be capable of receiving this 48 kHz stream and resampling it to their native output rate if necessary.
RTP Payload Structure
The RTP payload for an Opus packet is straightforward:
- RTP Header: Standard 12-byte header. The Payload Type (PT) must be dynamically allocated (typically in the range of 96–127).
- Opus Payload: The libopus compressed audio data is placed directly into the RTP payload field.
- TOC (Table of Contents) Byte: The very first byte of the libopus payload is the TOC byte, which describes the configuration (bandwidth, channel count, and frame duration) of the packet. No extra encapsulation headers are allowed between the RTP header and the Opus payload.
Typically, one RTP packet should contain exactly one Opus frame (or one multi-frame Opus packet generated by the encoder) to minimize packetization latency.
SDP Signaling and Media Negotiation
To establish a stream, you must negotiate the connection using SDP. Because Opus uses dynamic payload types, you must define the mapping explicitly.
An RFC 7587 compliant SDP media description looks like this:
m=audio 5004 RTP/AVP 111
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10; useinbandfec=1; stereo=1; maxaveragebitrate=128000
Key SDP Parameter Rules:
opus/48000/2: The encoding name isopus, the clock rate is48000, and the channel parameter must be2(the standard mandates 2 channels for the RTP mapping, even if the source is mono).stereo: Specifies whether the decoder prefers to receive stereo (1) or mono (0).sprop-stereo: Declares whether the sender is actually transmitting stereo audio.useinbandfec=1: Enables In-Band Forward Error Correction (FEC). This is highly recommended for lossy networks, allowing the decoder to reconstruct lost packets using lower-bitrate data embedded in subsequent packets.maxaveragebitrate: Defines the maximum average target bitrate in bits per second.
Packetization and Latency (ptime)
Opus supports frame sizes of 2.5, 5, 10, 20, 40, and 60 ms.
- Recommended Default: A packet duration
(
ptime) of 20 ms is the industry standard. It offers the best balance between low latency and network overhead. - Low-Latency Applications: Use 10 ms or 5 ms frames if your network can handle the increased packet-per-second overhead.
- Signaling: Use the
ptimeandmaxptimeattributes in the SDP to negotiate these limits between the sender and receiver.
Congestion Control and DTX
To optimize network utilization, the streaming application should leverage libopus’s native features:
- Discontinuous Transmission (DTX): When enabled
(
a=fmpt:... usedtx=1), the encoder stops sending packets or drops the bitrate significantly during periods of silence, reducing bandwidth consumption. - Variable Bitrate (VBR): Ensure VBR is enabled in libopus to allow the encoder to dynamically allocate bits based on audio complexity, which optimizes bandwidth without sacrificing quality.