I learned the AAC format online, copied it and recorded it. Reference: www.pianshen.com/article/142…

1.AAC

AAC is an abbreviation for Advanced Audio Coding. It was first developed in 1997 as an Audio Coding technology based on MPEG-2. Developed by Fraunhofer IIS, Dolby Laboratories, AT&T, Sony and others, the company aims to replace the MP3 format. In 2000, THE MPEG-4 standard came out, and AAC reintegrated other technologies (PS,SBR). In order to be different from the traditional MPEG-2 AAC, AAC containing SBR or PS characteristics is also called MPEG-4 AAC.

AAC is a new generation of audio lossy compression technology. It derives lC-AAC, He-AAC and He-AACV2 through some additional coding technologies (such as PS,SBR, etc.). Lc-aac is a relatively traditional AAC, which is mainly used for medium and high bit rate (>=80Kbps). He-aac (equivalent to AAC+SBR) is mainly used for low and medium codes (<=80Kbps), while the recently introduced HE-AACV2 (equivalent to AAC+SBR+PS) is mainly used for low bit rates (<=48Kbps). In fact, most encoders are set to <=48Kbps automatically enable PS technology. And >48Kbps without PS, equivalent to ordinary he-aac.

Currently the most used are LC and HE(suitable for low bit rates). Popular Nero AAC encoding programs only support LC, HE, HEv2 these three specifications, encoded AAC audio, specification display is LC. HE is AAC (LC) +SBR technology, HEv2 is AAC (LC) +SBR+PS technology

2.AAC file format

ADIF: Audio Data Interchange Format Audio Data Interchange Format. The feature of this format is that it is possible to find the beginning of the audio data definitively, without the decoding that begins in the middle of the audio data stream, i.e. it must be decoded at a clearly defined beginning. Therefore, this format is often used in disk files.

ADTS: Audio Data Transport Stream. The characteristic of this format is that it is a bit stream with synchronization words, and decoding can start anywhere in this stream. Its features are similar to the MP3 data stream format.

Simply put, ADTS can be decoded at any frame, which means that it has headers for each frame. ADIF has only one unified header, so it must be decoded after getting all the data. In addition, the formats of the two headers are also different. At present, the encoded and extracted audio streams are all in ADTS format.

3. ADTS is what

ADTS (Audio Data Transport Stream) is a very common transmission format of AAC.

I remember when I did Demux for the first time, THE ES stream of AAC audio could not be played when it was extracted from FLV package format and sent to hardware decoder. Save to the local PC player broadcast, I rely on also can not broadcast. It crashed. I found out later through research. A typical AAC decoder packages the AAC ES stream in ADTS format by adding a 7-byte ADTS header in front of the AAC ES stream. So you can think of the ADTS header as the frameheader of the AAC.

4. Content and structure of ADTS

ADTS header relatively useful information sampling rate, number of channels, frame length. Come to think of it, if I were a decoder, you could give me a bunch of AAC audio ES streams and I wouldn’t be able to solve them. Each AAC stream with an ADTS header clearly informs the decoder of the information it needs.

Typically, the ADTS header is 7 bytes, divided into two parts:

adts_fixed_header();

adts_variable_header();

1:adts_fixed_header

syncword: Sync header is always 0xFFF, all bits must be 1, representing the start of an ADTS frame

ID: MPEG Version: 0 for MPEG-4, 1 for MPEG-2

Layer: always: ’00’

Profile: Indicates the AAC level to use. Some chips only support AAC LC. Three are defined in MPEG-2 AAC:

Sampling_frequency_index: the Sampling rate subscript to be used. By this subscript, the Sampling rate value can be found in the Sampling array [].

There are 13 supported frequencies:

  • 0: 96000 Hz
  • 1: 88200 Hz
  • 2: 64000 Hz
  • 3: 48000 Hz
  • 4: 44100 Hz
  • 5: 32000 Hz
  • 6: 24000 Hz
  • 7: 22050 Hz
  • 8: 16000 Hz
  • 9: 12000 Hz
  • 10: 11025 Hz
  • 11: 8000 Hz
  • 12: 7350 Hz
  • 13: Reserved
  • 14: Reserved
  • 15: frequency is written explictly

Channel_configuration: indicates the number of channels

  • 0: Defined in AOT Specifc Config
  • 1: 1 channel: front-center
  • 2: 2 channels: front-left, front-right
  • 3: 3 channels: front-center, front-left, front-right
  • 4: 4 channels: front-center, front-left, front-right, back-center
  • 5: 5 channels: front-center, front-left, front-right, back-left, back-right
  • 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
  • 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
  • 8-15: Reserved

frame_length: The length of an ADTS frame includes the ADTS header and the AAC raw stream.

adts_buffer_fullness: 0x7FF Indicates a variable bit rate bit stream

5. Package AAC into ADTS format

With knowledge of the ADTS format, it is easy to package AAC as ADTS. We only need to get the information about the audio sampling rate, number of channels, length of metadata, type of AAC format, etc. Then prefix each AAC raw stream with an ADTS header.

Post the code to add ADTS header in FFMPEG, you can clearly understand the structure of ADTS header:

  int ff_adts_write_frame_header(ADTSContext *ctx, uint8_t *buf, int size, int pce_size)  
  {  
      PutBitContext pb;  
    
      init_put_bits(&pb, buf, ADTS_HEADER_SIZE);  
    
      /* adts_fixed_header */  
      put_bits(&pb, 12.0xfff);   /* syncword */  
      put_bits(&pb, 1.0);        /* ID */  
      put_bits(&pb, 2.0);        /* layer */  
      put_bits(&pb, 1.1);        /* protection_absent */  
      put_bits(&pb, 2, ctx->objecttype); /* profile_objecttype */  
      put_bits(&pb, 4, ctx->sample_rate_index);  
      put_bits(&pb, 1.0);        /* private_bit */  
      put_bits(&pb, 3, ctx->channel_conf); /* channel_configuration */  
      put_bits(&pb, 1.0);        /* original_copy */  
      put_bits(&pb, 1.0);        /* home */  
    
      /* adts_variable_header */  
      put_bits(&pb, 1.0);        /* copyright_identification_bit */  
      put_bits(&pb, 1.0);        /* copyright_identification_start */  
      put_bits(&pb, 13, ADTS_HEADER_SIZE + size + pce_size); /* aac_frame_length */  
      put_bits(&pb, 11.0x7ff);   /* adts_buffer_fullness */  
      put_bits(&pb, 2.0);        /* number_of_raw_data_blocks_in_frame */  
    
      flush_put_bits(&pb);  
    
      return 0;  
}
Copy the code

The second way, which is more primitive and easier to understand, is:

static int get_audio_obj_type(int aactype){
    //AAC HE V2 = AAC LC + SBR + PS
    //AAV HE = AAC LC + SBR
    // So both AAC_HEv2 and AAC_HE are AAC_LC
    switch(aactype){
        case 0:
        case 2:
        case 3:
            return aactype+1;
        case 1:
        case 4:
        case 28:
            return 2;
        default:
            return 2; }}static int get_sample_rate_index(int freq, int aactype){

    int i = 0;
    int freq_arr[13] = {
        96000.88200.64000.48000.44100.32000.24000.22050.16000.12000.11025.8000.7350
    };

    // If it is AAC HEv2 or AAC HE, the frequency is halved
    if(aactype == 28 || aactype == 4){
        freq /= 2; 
    }

    for(i=0; i< 13; i++){
        if(freq == freq_arr[i]){
            returni; }}return 4;// The default is 44100
}

static int get_channel_config(int channels, int aactype){
    // If the number of AAC HEv2 channels is halved
    if(aactype == 28) {return (channels / 2); 
    }
    return channels;
}

static void adts_header(char *szAdtsHeader, int dataLen, int aactype, int frequency, int channels){

    int audio_object_type = get_audio_obj_type(aactype);
    int sampling_frequency_index = get_sample_rate_index(frequency, aactype);
    int channel_config = get_channel_config(channels, aactype);

    printf("aot=%d, freq_index=%d, channel=%d\n", audio_object_type, sampling_frequency_index, channel_config);

    int adtsLen = dataLen + 7;

    szAdtsHeader[0] = 0xff;         / / syncword: 0 XFFF high 8 bits
    szAdtsHeader[1] = 0xf0;         / / syncword: 0 XFFF low 4 bits
    szAdtsHeader[1] | = (0 << 3);    //MPEG Version:0 for MPEG-4,1 for MPEG-2 1bit
    szAdtsHeader[1] | = (0 << 1);    //Layer:0 2bits
    szAdtsHeader[1] | =1;           //protection absent:1 1bit

    szAdtsHeader[2] = (audio_object_type - 1) < <6;            //profile:audio_object_type - 1 2bits
    szAdtsHeader[2] |= (sampling_frequency_index & 0x0f) < <2; //sampling frequency index:sampling_frequency_index 4bits
    szAdtsHeader[2] | = (0 << 1);                             //private bit:0 1bit
    szAdtsHeader[2] |= (channel_config & 0x04) > >2;           / / channel configuration: high channel_config 1 bit

    szAdtsHeader[3] = (channel_config & 0x03) < <6;     / / channel configuration: channel_config low 2 bits
    szAdtsHeader[3] | = (0 << 5);                      / / the original: 0 1 bit
    szAdtsHeader[3] | = (0 << 4);                      / / home: 0 1 bit
    szAdtsHeader[3] | = (0 << 3);                      // Copyright ID bit: 0 1bit
    szAdtsHeader[3] | = (0 << 2);                      // Copyright id start: 0 1bit
    szAdtsHeader[3] |= ((adtsLen & 0x1800) > >11);           //frame length: value the length is 2bits

    szAdtsHeader[4] = (uint8_t)((adtsLen & 0x7f8) > >3);     //frame length:value middle 8bits
    szAdtsHeader[5] = (uint8_t)((adtsLen & 0x7) < <5);       / / frame length: low value 3 bits
    szAdtsHeader[5] | =0x1f;                                 / / buffer fullness: 0 x7ff 5 bits
    szAdtsHeader[6] = 0xfc;
}
Copy the code