On October 24, 2019, Agora officially launched SOLO, an anti-loss audio codec from research, for all developers. The codec is suitable for all scenarios requiring real-time audio interaction, especially optimized for weak network antagonism, and MOS score is better than Opus in the same weak network environment. SOLO works with all kinds of RTC applications and is not tied to the Agora SDK.

Author: Zhao Xiaohan, audio algorithm engineer of Agora

Solo source code analysis (2)- narrow band coding

The last issue of Solo source code analysis analyzed the bandwidth expansion system of Solo, and this issue of Solo source code analysis will introduce the narrow band coding process of Solo. Since the coding framework of Solo is modified based on Silk, the introduction of Silk native code in this paper will be brief.

SOLO source: github.com/AgoraIO-Com…

First, coding module

The narrow band encoding entry function of Solo is SKP_Silk_SDK_Encode

SKP_int SKP_Silk_SDK_Encode( 
    void                              *encState,      /* I/O: State                    */
    const SKP_SILK_SDK_EncControlStruct  *encControl, /* I:   Control structure        */
    const SKP_int16                   *samplesIn,     /* I:   Input samples            */
    SKP_int                           nSamplesIn,     /* I:   Number of samples        */
    SKP_uint8                         *outData,       /* O:   Encoded output           */
    SKP_int16                         *nBytesOut      /* I/O: I: Max bytes O:out bytes */
)
Copy the code

Within this function, Solo first performs some bandwidth detection, resampling (if necessary), and finally inputs to SKP_Silk_encode_frame_FLP

Is a signal with a sampling rate of 8khz

SKP_int SKP_Silk_encode_frame_FLP( 
    SKP_Silk_encoder_state_FLP      *psEnc,             /* I/O  Encoder state FLP      */
    SKP_uint8                       *pCode,             /* O    Payload                */
    SKP_int16                       *pnBytesOut,        /* I/O  Payload bytes          */
                                                        /* input: max ;  output: used  */
    const SKP_int16                 *pIn                /* I    Input speech frame     */
)
Copy the code

In this function, the mute of signal is detected by SKP_Silk_VAD_FLP first and the probability value of the current signal is speech is obtained, which will be used to participate in the control of LBRR coding, LSF transformation, noise integer type and other modules.

SKP_int SKP_Silk_VAD_FLP(
    SKP_Silk_encoder_state_FLP      *psEnc,             /* I/O  Encoder state FLP      */
    SKP_Silk_encoder_control_FLP    *psEncCtrl,         /* I/O  Encoder control FLP    */
    const SKP_int16                 *pIn                /* I    Input signal           */
)
Copy the code

The following main steps include long-term prediction, short-term prediction, noise shaping, coding, etc. The main functions and their functions are as follows:

SKP_Silk_find_pitch_lags_FLP is a function used to analyze the pitch period and voiced signal. For voiced frames, long time prediction (LTP) is required because of strong periodicity, while for unvoiced frames, long time prediction is not required because of not obvious periodicity.

void SKP_Silk_find_pitch_lags_FLP(
    SKP_Silk_encoder_state_FLP      *psEnc,             /* I/O  Encoder state FLP      */
    SKP_Silk_encoder_control_FLP    *psEncCtrl,         /* I/O  Encoder control FLP    */
    SKP_float                       res[],              /* O    Residual               */
    const SKP_float                 x[]                 /* I    Speech signal          */
)
Copy the code

SKP_Silk_noise_shape_analysis_FLP is a function used for noise shaping analysis. Noise shaping can adjust the quantization gain to make the quantization noise rise and fall with the original signal energy, so that it is difficult to perceive the quantization noise by using masking effect. In this function, in addition to Silk’s original gain control, Solo also has its own gain calculation system, whose logic is similar to Silk’s original gain control, but the details of some parameters are different. Since it is dual-stream coding in Solo, Solo redistributes bit rates, and according to the allocated bit rates, The theoretical SNR of each current bit stream is calculated, and then the SNR is used to calculate the subsequent gain, which is used to control the residual amplitude segmentation ratio in the subsequent processing of residual signal.

void SKP_Silk_noise_shape_analysis_FLP(
    SKP_Silk_encoder_state_FLP      *psEnc,             /* I/O  Encoder state FLP      */
    SKP_Silk_encoder_control_FLP    *psEncCtrl,         /* I/O  Encoder control FLP    */
    const SKP_float                 *pitch_res,         /* I    LPC residual           */
    const SKP_float                 *x                  /* I    Input signal           */
)
Copy the code

SKP_Silk_find_pred_coefs_FLP is a function for linear prediction, including short-term prediction coefficient and long-term prediction coefficient, which will be calculated here. LPC coefficient will be converted into LSF coefficient, and LSF coefficient will be reduced to LPC coefficient after quantization and reverse quantization. Used for subsequent signal reconstruction function SKP_Silk_NSQ_wrapper_FLP.

void SKP_Silk_find_pred_coefs_FLP(
    SKP_Silk_encoder_state_FLP      *psEnc,         /* I/O  Encoder state FLP          */
    SKP_Silk_encoder_control_FLP    *psEncCtrl,     /* I/O  Encoder control FLP        */
    const SKP_float                 res_pitch[]     /* I    Residual                   */
)
Copy the code

SKP_Silk_NSQ_wrapper_FLP is a reconstruction Analysis function before the coding module, whose idea is Analysis by Sythesis, that is, in this function, there will be an analog decoder to reconstruct the speech signal using the above linear prediction parameters, gain, quantization residual, etc. The reconstructed signal will be directly compared with the current encoded signal, and the error between the reconstructed signal and the input signal in the analog decoder will be minimized by noise shaping, random residual disturbance and other methods, so that the decoded signal of the real decoder will be as close to the original signal as possible.

void SKP_Silk_NSQ_wrapper_FLP(
    SKP_Silk_encoder_state_FLP      *psEnc,         /* I/O  Encoder state FLP          */
    SKP_Silk_encoder_control_FLP    *psEncCtrl,     /* I/O  Encoder control FLP        */
    const SKP_float                 x[],            /* I    Prefiltered input signal   */
    SKP_int8                        q[],            /* O    Quantized pulse signal     */
    SKP_int8                       *q_md[],         /* O    Quantized pulse signal     */
    const SKP_int                   useLBRR         /* I    LBRR flag                  */
)
Copy the code

This function analyzes two modes, SKP_Silk_NSQ and SKP_Silk_NSQ_del_dec. The biggest difference between the two is that SKP_Silk_NSQ_del_dec uses delay-decision, whose complexity is higher than SKP_Silk_NSQ. However, because the essence of delay-decision is to transform the scalar quantization of each residual handicap in Silk into the vector quantization of 32 points, its effect is better, so Silk uses SKP_Silk_NSQ_del_dec by default, and this paper only analyzes this default function.

void SKP_Silk_NSQ_del_dec(
    SKP_Silk_encoder_state          *psEncC,                                                 /* I/O  Encoder State                       */
    SKP_Silk_encoder_control        *psEncCtrlC,                                             /* I    Encoder Control                     */
    SKP_Silk_nsq_state              *NSQ,                                                     /* I/O  NSQ state                           */
    SKP_Silk_nsq_state              NSQ_md[MAX_INTERLEAVE_NUM],                               /* I/O  NSQ state                           */
    const SKP_int16                 x[],                                                     /* I    Prefiltered input signal            */
    SKP_int8                        q[],                                                     /* O    Quantized pulse signal              */
    SKP_int8                        *q_md[ MAX_INTERLEAVE_NUM ],                             /* O    Quantized qulse signal              */
    SKP_int32                       r[],                                                     /* O    Output residual signal              */
    const SKP_int                   LSFInterpFactor_Q2,                                       /* I    LSF interpolation factor in Q2      */
    const SKP_int16                 PredCoef_Q12[ 2 * MAX_LPC_ORDER ],                       /* I    Prediction coefs                    */
    const SKP_int16                 LTPCoef_Q14[ LTP_ORDER * NB_SUBFR ],                     /* I    LT prediction coefs                 */
    const SKP_int16                 AR2_Q13[ NB_SUBFR * MAX_SHAPE_LPC_ORDER ],               /* I    Noise shaping filter                */
    const SKP_int                   HarmShapeGain_Q14[ NB_SUBFR ],                           /* I    Smooth coefficients                 */
    const SKP_int                   Tilt_Q14[ NB_SUBFR ],                                     /* I    Spectral tilt                       */
    const SKP_int32                 LF_shp_Q14[ NB_SUBFR ],                                   /* I    Short-term shaping coefficients     */
    const SKP_int32                 Gains_Q16[ NB_SUBFR ],                                   /* I    Gain for each subframe              */
	const SKP_int32 				MDGains_Q16[ NB_SUBFR ],                                 /* I    New gain, no use now                */
	const SKP_int32			        DeltaGains_Q16,                                           /* I    Gain for odd subframe               */
    const SKP_int                   Lambda_Q10,                                               /* I    Quantization coefficient            */
    const SKP_int                   LTP_scale_Q14                                             /* I    LTP state scaling                   */
)
Copy the code

Within this function, the core operation has the following steps:

1) AgorA_silk_deldec_rebroke, Agora_Silk_DelDec_Rewhitening_Side, and SKP_Silk_nsq_del_dec_scale_states are initialized and combined into code streams and two multi-description code streams Is ready for the analog decoding of each bar code stream in the encoder.

2) Subsequent operations before coding are completed in SKP_Silk_md_noise_shape_quantizer_del_dec, which completes all decoding and analysis operations. The first step of the analysis is to calculate the residua of each quantized parameter in SKP_Silk_md_noise_shape_quantizer_del_dec.

3) After taking the residuals, unlike Silk, which will analyze the single residuals, Solo will use the special gain DeltaGains_Q16 to divide the residuals into the energy complementary residual streams of two sub-frames according to the sub-frames, and the gain distribution of adjacent sub-frames is opposite.

4) Then, Solo will calculate the errors of two different quantization methods of the two flows in Agora_Silk_RDCx1 and save the cumulative errors. Then, in Agora_Silk_CenterRD, Solo will calculate the errors of the synthetic residual combined by two residers and the actual residual. A weighted error is calculated based on the error and the errors of the two streams. The accumulation of the weighted error determines which two residual streams are used as coding objects. INTERNAL_JOINT_LAMBDA can be adjusted to calculate the weight error, the weight is closer to the synthesis error, the decoding end without packet loss under the two bar code stream synthesis of audio error is smaller; The more the weight describes the error of the code stream, the smaller the error of each bar code stream decoding alone, but the error of the two bar code streams combined may be larger.

Finally, the function outputs two residual signals used for encoding and a disturbed initial seed. Because the seed changes with the amplitude information of each time domain point, the decoder can calculate the corresponding disturbances of all time domain points of the frame by encoding only the disturbed initial seed of this frame for subsequent encoding.

The low frequency coding of Solo follows the coding scheme of Silk (the high frequency coding uses an independent coding scheme, the specific implementation can be seen in the previous Solo code interpretation). All the low frequency information to be encoded is encoded in SKP_Silk_encode_parameters using range coding. The probability density function required by the new parameters in range coding is calculated based on a large number of Chinese and English corpus, and to some extent is the probability density function with high coding efficiency.

void SKP_Silk_encode_parameters(
    SKP_Silk_encoder_state          *psEncC,        /* I/O  Encoder state              */
    SKP_Silk_encoder_control        *psEncCtrlC,    /* I/O  Encoder control            */
    SKP_Silk_range_coder_state      *psRC,          /* I/O  Range encoder state        */
    SKP_int							md_type,        /* I    Use MDC or not             */
    const SKP_int8                  *q              /* I    Quantization indices       */
)
Copy the code

Second, decoding module

After the high and low frequency stream separation, the code stream carrying the low frequency information is sent to the low frequency decoder, which can be regarded as two parallel Silk decoders plus pre and post processing modules. The main function of the pre-processing module is to segment the code stream according to different packet receiving conditions. Packet receiving is divided into four types: (1) Only the first description code stream is received; (2) Only the second description code stream is received; (3) Both bar code streams are received; (4) The corresponding code streams of this frame are not received.

Solo set different parameters according to different packet receiving conditions, passed AgoraSateDecodeTwoDesps for decoding.

SKP_int AgoraSateDecodeTwoDesps(
    SKP_Silk_decoder_state          *psDec,              /* I/O  Silk decoder state    */
    SKP_Silk_decoder_control		*psDecCtrl,
    SKP_int16                       pOut[],              /* O    Output speech frame   */
    const SKP_int                   nBytes1,             /* I    Payload length        */
    const SKP_int                   nBytes2,             /* I    Payload length        */
    const SKP_uint8                 pCode1[],            /* I    Pointer to payload    */
    const SKP_uint8                 pCode2[],            /* I    Pointer to payload    */
    SKP_int							desp_type,
    SKP_int                         decBytes[]           /* O    Used bytes            */
)
Copy the code

In this function, SKP_Silk_decode_parameters can decode the gain, linear prediction coefficient, residual signal and other information needed for audio reconstruction from the code stream. It should be noted that in the first two packet receiving cases, the decoded residual is not the complete residual, but one of the two complementary residual. However, because the other complementary stream does not arrive at the decoder on time, the decoder cannot obtain another complementary residual. Therefore, after decoding the current residual, the decoder needs to use the special gain calculated in the encoder and transmitted to the decoder to restore the residual into a complete residual signal. If the current packet receiving situation is the third one, the complete residual data can be obtained only by adding the two complementary residuals decoded. After the residual signal is obtained, the speech signal can be reconstructed with other parameters. If the current packet receiving situation is the fourth, Solo will call the packet loss compensation module and use the gain and linear prediction coefficient of the last frame as well as random residual signal to generate the compensation frame.

Finally, after some of the same post-processing as Silk, the flow of the decoder ends.

If you have questions about SOLO, please click here → talk to the author of this article in the RTC Developer community post (rTCdeveloper.com).

SOLO (1) : bandwidth expansion