Libopus Unsupported PCM Frame Size Error
This article explains how the libopus library handles PCM input frame sizes that do not match its strictly supported durations. We will explore the specific error codes returned by the encoder, the technical reasons behind these strict requirements, and how developers can implement buffering to resolve this issue in their audio applications.
The Result of Invalid Frame Sizes
When you feed the libopus encoder (opus_encode or
opus_encode_float) with a PCM frame size that is not an
exact match or multiple of its supported sizes, the encoder will reject
the input. It does not automatically pad, truncate, or split the audio
data.
Instead, the encoding function will fail immediately and return the
error code OPUS_BAD_ARG (which is defined
as -1 in the API).
Supported Opus Frame Sizes
Opus is designed to operate only with specific, highly optimized frame sizes. The library measures frame sizes in terms of audio duration per channel. The only supported frame sizes are:
- 2.5 ms
- 5 ms
- 10 ms
- 20 ms
- 40 ms
- 60 ms
- 80 ms (added in newer versions, achieved by combining multiple frames)
- 120 ms (added in newer versions, achieved by combining multiple frames)
To calculate the exact number of samples required per channel for a valid frame, you must multiply the audio duration by your sampling rate. For example, at the standard Opus sampling rate of 48 kHz:
- 2.5 ms: 120 samples
- 5 ms: 240 samples
- 10 ms: 480 samples
- 20 ms: 960 samples
- 40 ms: 1920 samples
- 60 ms: 2880 samples
If your input buffer contains any other number of samples per channel
(for example, 500 samples at 48 kHz), libopus will refuse to process it
and return OPUS_BAD_ARG.
Why Opus Requires Strict Frame Sizes
The Opus codec relies on transform-based coding, specifically the Modified Discrete Cosine Transform (MDCT) for its high-latency CELT mode, and Linear Predictive Coding (LPC) for its low-latency SILK mode.
These mathematical transforms require fixed-size blocks of data to properly perform frequency analysis, apply windowing functions, and maintain low algorithmic delay. Allowing arbitrary frame sizes would break the overlap-add process required for seamless audio reconstruction, resulting in severe compression artifacts or massive processing overhead.
How to Handle Arbitrary Input Sizes
If your audio source generates PCM frames of arbitrary or non-standard sizes, you must handle the layout adjustment before passing the data to libopus. The standard industry solution is to implement a FIFO (First-In, First-Out) Buffer.
- Accumulate PCM Data: Push the incoming, arbitrarily-sized PCM samples into a local FIFO buffer.
- Check Buffer Size: Query the FIFO buffer to see if it contains enough samples to meet one of the supported Opus frame sizes (e.g., 960 samples for 20 ms at 48 kHz).
- Extract and Encode: Pop the exact number of
required samples from the FIFO, pass them to
opus_encode, and transmit the resulting packet. - Repeat or Hold: Leave any remaining fractional samples in the FIFO to be combined with the next incoming block of audio.