Libopus Multistream Channel Mapping Matrix Math
This article explains the mathematical purpose and mechanics of the
custom channel mapping matrix within the libopus
multistream API. It details how this matrix functions as a linear
transformation to project audio signals between the input channel space
and the encoded stream space. By understanding this mathematical
relationship, developers can efficiently compress, downmix, and
reconstruct arbitrary multi-channel and spatial audio layouts.
The Linear Algebra of Channel Mapping
At its mathematical core, the custom channel mapping matrix
(specifically used in libopus mapping family 255) performs
a linear transformation that maps a vector of input
audio channels to a vector of encoded stream channels, and vice
versa.
In a standard multi-channel setup, we represent the input audio at any given discrete time sample as an \(N\)-dimensional vector:
\[\mathbf{x} = [x_1, x_2, \dots, x_N]^T\]
where \(N\) is the number of input channels (e.g., 6 for 5.1 surround sound).
To compress this audio efficiently, libopus groups the
audio into \(M\) encoded streams (where
\(M \le N\)). Some of these streams may
be coupled (stereo) and others uncoupled (mono). Let \(K\) be the total number of coded channels
across all streams. The mapping matrix acts as the operator that
transforms the input vector \(\mathbf{x}\) into the stream vector \(\mathbf{s}\):
\[\mathbf{s} = \mathbf{M}_{\text{enc}} \mathbf{x}\]
where: * \(\mathbf{s}\) is a \(K \times 1\) vector of the channels to be encoded. * \(\mathbf{M}_{\text{enc}}\) is a \(K \times N\) encoding matrix (or downmix matrix) containing gain coefficients. * Each element \(m_{i,j}\) in \(\mathbf{M}_{\text{enc}}\) dictates the contribution of input channel \(j\) to encoded stream channel \(i\).
Reconstructing the Output (The Decoding Matrix)
Upon decoding, the receiver decodes the stream vector \(\mathbf{s}\) to obtain \(\mathbf{s}'\) (which contains the original signal plus some quantization noise). To render this back to the physical speaker channels, a second linear transformation is applied using a decoding matrix (or upmix matrix) \(\mathbf{M}_{\text{dec}}\):
\[\mathbf{y} = \mathbf{M}_{\text{dec}} \mathbf{s}'\]
where: * \(\mathbf{y}\) is the \(P \times 1\) reconstructed output channel vector (where \(P\) is the number of playback speakers). * \(\mathbf{M}_{\text{dec}}\) is a \(P \times K\) decoding matrix.
Mathematical Purposes of the Matrices
The custom mapping matrix serves three primary mathematical purposes in spatial audio processing:
1. Dimensionality Reduction and Redundancy Elimination
By projecting a high-dimensional input space \(N\) into a lower-dimensional stream space \(K\), the matrix discards redundant spatial information. For example, in Ambisonics (often encoded using mapping family 2 or 3), the matrix projects physical microphone signals into spherical harmonic coefficients (B-format), isolating the directional components mathematically.
2. Energy Preservation and Normalization
To prevent digital clipping and maintain consistent loudness, the mapping matrices must be normalized. This is mathematically achieved by ensuring the rows or columns of the matrices satisfy specific norm conditions. For example, to preserve total signal energy during a downmix, the sum of the squares of the coefficients for any given input channel across the mixed streams is often constrained to \(1\):
\[\sum_{i=1}^{K} m_{i,j}^2 = 1\]
3. Spatial Decoding and Pseudoinverse Relationships
Ideally, the decoding matrix \(\mathbf{M}_{\text{dec}}\) is designed as the mathematical reconstruction operator of the encoding matrix \(\mathbf{M}_{\text{enc}}\). In cases where a direct inverse is not possible because the matrix is non-square (\(K < N\)), the system relies on the Moore-Penrose pseudoinverse \(\mathbf{M}_{\text{enc}}^+\) to find the least-squares optimal reconstruction of the original sound field:
\[\mathbf{M}_{\text{dec}} \approx \mathbf{M}_{\text{enc}}^+ = \mathbf{M}_{\text{enc}}^T (\mathbf{M}_{\text{enc}} \mathbf{M}_{\text{enc}}^T)^{-1}\]
In the libopus API, defining a custom mapping matrix
allows developers to bypass predefined channel layouts (like stereo or
5.1) and inject these custom projection coefficients directly into the
encoder and decoder pipelines. This enables precise mathematical control
over how sound fields are compressed and rendered across arbitrary
speaker arrays.