Audio Concept Introduction

Sampling rate

The number of times a sound signal is sampled in one second is called the sampling rate, in Hz. The higher the sampling rate, the smoother the sound wave, the better the reduction of the sound, and the more storage space required. Common sampling rates in the field of digital audio are:

  • 8000Hz Sampling rate used by telephones

  • 22050Hz Sampling rate used for radio broadcasts

  • 32000Hz miniDV Digital video camcorder, DAT(LPmode) sampling rate

  • 44100Hz audio CD, also commonly used for MPEG-1 audio (VCD, SVCD, MP3) sample rate

  • 48000Hz Sampling rate for miniDV, digital TV, DVD, DAT, movie, and professional audio

  • Sample rates used for 96000 or 192000Hz DVD-Audio, some LPCMDVD tracks, BD-ROM tracks, and HD-DVD tracks

The sampling accuracy

Each sampling of the sound signal is represented as a number in the computer. The larger the range of the number, the greater the range of the amplitude of the sound. There are three kinds of sampling precision supported in Android, defined in AudioFormat

Public static final int ENCODING_PCM_16BIT = 2; Public static final int ENCODING_PCM_8BIT = 3; public static final int ENCODING_PCM_8BIT = 3; Public static final int ENCODING_PCM_FLOAT = 4;Copy the code

Mixing implement

Here, we use linear superposition to average the two audio streams for mixing. Before mixing, we need to ensure that the two audio streams have the same sampling rate, sampling bit depth and number of channels.

Mixing implementation when sampling bit depth is 8 bit

When the depth of the sampling bit is 8 bit, it only needs to add the numbers of the corresponding positions of the PCM data at both ends and average them

Public static void mixAsByte(byte[] audioSrc, byte[] audioDst) {for (int I = 0; i <audioData2.length; ++i) { audioDst[i] = (byte) ((audioDst[i] + audioSrc[i]) / 2); }}Copy the code

Mixing implementation when sampling bit depth is 16 bit

When the depth of the sampling bit is 16 bits, a sample needs to be represented by two bytes. Therefore, the two consecutive bytes need to be converted into short first and then added for the average value. We also need to consider the problem of byte order, which is divided into big-endian byte order (high-byte first, low-byte last) and small-endian byte order (low-byte first, high-byte last). The ByteOrder of audio data decoded by MediaCodec or recorded from AudioRecord in Android can be obtained by the byteorder.nativeorder () method, generally using the little endian ByteOrder.

Public static void mixAsShort(byte[] audioSrc, byte[] audioDst) { ShortBuffer sAudioSrc = ByteBuffer.wrap(audioSrc).order(ByteOrder.nativeOrder()).asShortBuffer(); ShortBuffer sAudioDst = ByteBuffer.wrap(audioData2).order(ByteOrder.nativeOrder()).asShortBuffer(); for (int i = 0; i < sAudioData1.capacity(); ++i) { sAudioDst.put(i, (short) ((sAudioSrc.get(i) + sAudioDst.get(i)) / 2)); }}Copy the code

Use FFMPEG to achieve resampling

The formats (sampling rate, bit depth, and channel number) of the two audio streams must be the same. Therefore, the original audio data must be resamped before the mixing. In this case, we use ffmPEG. We only need to compile libswresample, libavFormat, and libavFormat libraries

SwrContext *swr_context_ = swr_alloc(); Layoutint64_t in_channel_layout = av_get_default_channel_layout(in_channel_count_); int64_t out_channel_layout = av_get_default_channel_layout(out_channel_count_); Av_opt_set_channel_layout (swr_context_, "in_channel_layout", in_channel_layout, 0); av_opt_set_channel_layout(swr_context_, "out_channel_layout", out_channel_layout, 0); Av_opt_set_int (swr_context_, "in_sample_rate", in_SAMple_rate, 0); av_opt_set_int(swr_context_, "out_sample_rate", out_sample_rate, 0); Av_opt_set_sample_fmt (swr_context_, "in_sample_fmt", in_SAMple_fmt, 0); av_opt_set_sample_fmt(swr_context_, "in_sample_fmt", in_SAMple_fmt, 0); av_opt_set_sample_fmt(swr_context_, "out_sample_fmt", out_sample_fmt, 0); Error_ = swr_init(swr_context_);Copy the code

The above code creates a context, SwrContext, for resampling, and sets the parameters that are required before and after the resampling

int Resample(JNIEnv* env, jobject caller, Jint byte_count) {// Number of input samples = number of input bytes/(number of input channels x number of input audio mono bytes in a single sample) int in_samples = byte_count/(in_channel_count_ * in_sample_bytes_); Int out_samples = av_rescale_rnd(swr_get_delay(swr_context_, in_sample_rate_) + in_samples, out_sample_rate_, in_sample_rate_, AV_ROUND_UP); // Resampling, Out_samples = sWR_convert (swr_context_, // Out_buffer_address_ reinterpret_cast<uint8_t**>(&out_buffer_address_), // The number of samples expected to be output out_samples, // Input audio data is stored in in_buffer_address_ (const uint8_t**)(&in_buffer_address_), // Input number of samples in_samples); Return out_samples * out_channel_count_ * out_sample_bytes_; return out_samples * out_channel_count_ * out_sample_bytes_; }Copy the code

In_buffer_address_ arising from the above code implementation and out_buffer_address_ is Android layer calls ByteBuffer. AllocateDirect () method to create input/output caching in advance, Because Android calls to CXX involve JNI calls, and the resampler method is called frequently, creating a cache avoids the expense of frequently allocating and destroying memory. It is also important to note that the actual number of samples for the output is not necessarily the same as the number of samples for the input. When creating the output cache, set the cache size slightly higher than expected or dynamically adjust the cache size.

Finally, the resources occupied by the context need to be released after the resampling

swr_free(&swr_context_);
Copy the code

conclusion

In this paper, ffMPEG is first used to resample the source audio data into the target audio format, and then two audio streams with the same audio format are mixed into one audio stream. The mixing method adopted is to add the sampled data of the corresponding positions of the two audio streams and average them. This mixing method will not introduce additional noise. However, when the number of audio streams is too large, the overall volume will drop. Readers interested in how to mix music can search for additional information on the Internet.