Front-end real-time AAC audio processing solution

preface

For audio widely application scenario for the mobile terminal equipment, now we are in the process of test program corresponding inevitably need to authenticate audio ability, in order to perform in the real machine test platform of cloud audio test under different scenarios, you need real-time audio in the client device, which can directly through the Web platform for testing.

According to the application scenario, it is necessary to read the audio data from the mobile device, that is, to obtain the audio naked data PCM (Pulse Code Modulation) data, and then send the PCM data to the Web end for playing through a series of ways.

Technology selection

Coding techniques

Due to access to the audio data is naked (PCM) data, if it is stored on the local disk, the volume of the audio file is acceptable, but we application scenario is real-time transmission, direct transmission PCM data will have the problem of the large amount of data transmission, so you need to use on bare data compression coding technology transfer after the compression. Common coding techniques include AAC, MP3, WAV and WMA.

By referring to the comparison of sound quality between audio encoding schemes (AAC, MP3, WMA, etc.), it can be seen that in the case of low bit rate, the audio quality of different encoding schemes is ranked as AAC+ > MP3PRO > AAC > RealAudio > WMA > MP3.

Because AAC is a high compression ratio of audio compression algorithm, but its compression ratio is far more than the older audio compression algorithm, such as AC-3, MP3 and so on. And the quality is comparable to uncompressed CD sound. Therefore, AAC coding technology is finally selected to compress PCM data.

Front-end decoding scheme

JMuxer is an open source JavaScript library based on MSE technology, which allows JavaScript to dynamically build

AAC audio coding technology

AAC is an abbreviation for Advanced Audio Coding. It was first developed in 1997 as an Audio Coding technology based on MPEG-2. Developed by Fraunhofer IIS, Dolby Laboratories, AT&T, Sony and others, the company aims to replace the MP3 format. In 2000, THE MPEG-4 standard came out, and AAC reintegrated other technologies (PS,SBR). In order to be different from the traditional MPEG-2 AAC, AAC containing SBR or PS characteristics is also called MPEG-4 AAC.

AAC is a new generation of audio lossy compression technology, audio file formats include ADIF and ADTS:

ADIF: Audio Data Interchange Format Audio Data Interchange Format. The feature of this format is that it is possible to find the beginning of the audio data definitively, without the decoding that begins in the middle of the audio data stream, i.e. it must be decoded at a clearly defined beginning. Therefore, this format is often used in disk files.
ADTS: Audio Data Transport Stream. Is the AAC audio transmission stream format. The characteristic of this format is that it is a bit stream with synchronization words, and decoding can start anywhere in this stream.

In general, ADTS can be decoded at any frame, which means that it has headers for each frame. ADIF has only one unified header, so it must be decoded after getting all the data.

ADTS frame is formed by ADTS encapsulation of the original frame and ADTS header. Each frame of an AAC Audio file consists of ADTS headers and AAC Audio Data. The structure is as follows:

In this scheme, due to real-time transmission, data frames in ADTS format will be used for data transmission. A packet ADTS Header is divided into:

Fixed header: Data is the same in each frame. For details, see THE INFORMATION about ADTS header files in the AAC.
Variable headers: variable from frame to frame;

Sometimes the AAC packets we receive may be illegal, that is, the ADTS Header is illegal. You can use the Header to verify the data using the P23 tool.

JMuxer decoding

When AAC packets are obtained through Websocket, they are actually received hexadecimal data, which will be converted into hexadecimal data during decoding and finally converted into binary data for parsing.

The JMuxer library also supports audio and video decoding. I’ll start with the following arguments and methods commonly used by JMuxer.

parameter

attribute	Attribute values	instructions	The default value
node	Tag ID	Video/audio ID tags	–
mode	audio/video/both	Decoding mode	both
flushingTime	Time (ms)	Cache refresh rate	1500
maxDelay	Time (ms)	Maximum delay	500
clearBuffer	true/false	Removal of Buffer	true
fps	frames	Video frames	duration
onReady	function	Callback after MSE is ready	–
onError	function	The Buffer callback is abnormal	–
debug	true/false	Whether to print logs	false

methods

function	parameter	instructions
feed	data object	Object parameters include audio, video, and duration. If duration is not provided, it is calculated based on FPS.
createStream	–	Write buffer, nodeJS available
reset	–	Reset and restart JMuxer
destroy	–	Instance is destroyed

Overall code implementation

The overall Demo can be found at github.com/Lewage59/pr…

/** * Audio accept processor */
import JMuxer from 'jmuxer';
import Socket from './socket';

const DEFAULT_WS_URL = 'ws://localhost:8080';

export default class AudioProcessor {
    constructor(options) {
        const wsUrl = options.wsUrl || DEFAULT_WS_URL
        /** * node: 'player', * mode: 'audio', * debug: true, * flushingTime: 0, * wsUrl */
        this.jmuxer = new JMuxer({
            mode: 'audio'.flushingTime: 0.onReady() {
                console.log('Jmuxer audio init onReady! ');
            },
            onError(data) {
                console.error('Buffer error encountered', data);
            },
            ...options
        });
        this.audioDom = document.getElementById(options.node)
        this.initWebSocket(wsUrl)
	}

    initWebSocket(url) {
        const that = this
        this.ws = new Socket({
            url,
            binaryType: 'arraybuffer'.onmessage: function(event) {
                constdata = that.parse(event.data); data && that.jmuxer.feed(data); }}); }/** * audio parsing *@param {*} Data AAC Buffer video stream *@returns * /
    parse(data) {
        let input = new Uint8Array(data)
    
        return {
        	audio: input
        };
    }

    onPlay() {
        this.audioDom.load()
        const playPromise = this.audioDom.play()
        if(playPromise ! = =undefined) {
            playPromise.then(() = > {
                this.audioDom.play()
            })
        }
    }

    onPause() {
    	this.audioDom.pause()
    }

    onReload() {
    	this.audioDom.load()
    }

    onDestroy() {
        this.ws.handleClose()
        this.audioDom.pause()
        this.jmuxer = null}}Copy the code

Sometimes the front end needs to simulate audio and video service for debugging. First of all, IT can convert MP3 audio to AAC audio format through FFmpeg, and then divide the audio file into data frames, and then start Websocket to transmit data frames through Node service. For details, see Node Server AAC Audio Transmission

Write in the last

In the process of pre-research and project landing, I have gained a new understanding of this technology. Therefore, this paper mainly explains the application level, without in-depth discussion of the details of each technology, but I believe that I will continue to pay attention to this audio and video technology in the future.

If you are interested in UI automation testing, remote control, etc., check out Sonic Cloud Testing platform.

Sonic, a one-stop open source distributed cluster cloud real machine testing platform dedicated to sme client UI testing (free forever)

The resources

Comparison of sound quality between audio encoding schemes (AAC, MP3, WMA, etc.)
7. Master basic knowledge of audio and render PCM data with AudioTrack and OpenSL ES
AAC file parsing and decoding process
AAC ADTS format analysis
AAC ADTS header file information