This article was first published on the wechat official account Byteflow

FFmpeg development series serial:

  • FFmpeg Development (01) : FFmpeg compilation and integration

  • FFmpeg development (02) : FFmpeg + ANativeWindow video decoding playback

  • FFmpeg development (03) : FFmpeg + OpenSLES audio decoding playback

  • FFmpeg development (04) : FFmpeg + OpenGLES audio visual playback

  • FFmpeg development (05) : FFmpeg + OpenGLES video decoding playback and video filter

  • FFmpeg development (06) : FFmpeg player to achieve audio and video synchronization in three ways

  • FFmpeg development (07) : FFmpeg + OpenGLES implementation of 3D panorama player

  • FFmpeg Development (08) : FFmpeg player video rendering optimization

  • FFmpeg development (09) : FFmpeg, X264 and FDK-AAC compiler integration

  • FFmpeg Development (10) : FFmpeg video recording – Video adding filters and coding

In this paper, FFmpeg is used to render the preview frames collected by Android Camera2, and then OpenGL is used to add filters. Finally, the rendering results are encoded to generate mp4 files.

This paper will use Android AudioRecorder to collect PCM audio, and then use FFmpeg to encode it to generate AAC files.

In the next article in this series, FFmpeg will encode both the Android Camera preview frame and the Audio data collected by the AudioRecorder to create an MP4 file.

AudioRecorder use

Here, the Android AudioRecorder API is used to collect audio PCM raw data, and then it is passed to the Native layer through JNI for FFmpeg encoding.

The following code encapsulates the AudioRecoder into a thread and transmits PCM data through interface callbacks. The default sampling rate is 44.1khz, and the sampling format is PCM 16 bit.

public class AudioRecorder extends Thread {
	private static final String TAG = "AudioRecorder";
	private AudioRecord mAudioRecord = null;
	private static final int DEFAULT_SAMPLE_RATE = 44100;
	private static final int DEFAULT_CHANNEL_LAYOUT = AudioFormat.CHANNEL_IN_STEREO;
	private static final int DEFAULT_SAMPLE_FORMAT = AudioFormat.ENCODING_PCM_16BIT;
	private final AudioRecorderCallback mRecorderCallback;

	public AudioRecorder(AudioRecorderCallback callback) {
		this.mRecorderCallback = callback;
	}

	@Override
	public void run(a) {
		final int mMinBufferSize = AudioRecord.getMinBufferSize(DEFAULT_SAMPLE_RATE, DEFAULT_CHANNEL_LAYOUT, DEFAULT_SAMPLE_FORMAT);
		Log.d(TAG, "run() called mMinBufferSize=" + mMinBufferSize);

		mAudioRecord = new AudioRecord(android.media.MediaRecorder.AudioSource.MIC, DEFAULT_SAMPLE_RATE, DEFAULT_CHANNEL_LAYOUT, DEFAULT_SAMPLE_FORMAT, mMinBufferSize);
		try {
			mAudioRecord.startRecording();
		} catch (IllegalStateException e) {
			mRecorderCallback.onError(e.getMessage() + " [startRecording failed]");
			return;
		}

		byte[] sampleBuffer = new byte[4096];
		try {
			while(! Thread.currentThread().isInterrupted()) {int result = mAudioRecord.read(sampleBuffer, 0.4096);
				if (result > 0) { mRecorderCallback.onAudioData(sampleBuffer, result); }}}catch (Exception e) {
			mRecorderCallback.onError(e.getMessage());
		}

		mAudioRecord.release();
		mAudioRecord = null;
	}

	public interface AudioRecorderCallback {
		void onAudioData(byte[] data, int dataSize);
		void onError(String msg); }}Copy the code

Audio coding process

The audio coding process is basically the same as the video coding process. In order to show the process more clearly, a flow chart is also drawn as shown in the figure below.

The PCM audio collected by the AudioRecoder is put into the audio queue, and the sub-thread audio coding loop continuously obtains data from the queue for encoding, and finally writes the encoded data into the media file.

FFmpeg two sampling formats

Since the new FFmpeg version no longer supports encoding audio data in the AV_SAMPLE_FMT_S16 sampling format, swr_convert is required to convert the format to AV_SAMPLE_FMT_FLTP, And AV_SAMPLE_FMT_S16 is the AudioRecorder audioformat.encoDING_PCM_16bit.

Features of the two sampling formats:

  • AV_SAMPLE_FMT_S16 bits wide 16 bits, short, value range [-32767, 32767].
  • AV_SAMPLE_FMT_FLTP 32-bit width, float, value range [-1.0, 1.0];

It can be seen that mono channel and mono channel are only the difference in value range, and there are differences in storage structure between the two.

Double channel AV_SAMPLE_FMT_S16 and AV_SAMPLE_FMT_FLTP format structure

As can be seen from the figure, the left and right channels of dual-channel AV_SAMPLE_FMT_S16 type are stored interleaved, while the left and right channels of dual-channel AV_SAMPLE_FMT_FLTP type are stored in a planer respectively. Those of you familiar with the YUV format can see that this arrangement is a bit like YUV420SP and YUV420P.

Swresample swr_convert swresample swr_convert swresample swr_convert swresample swr_convert swresample swr_convert

/** Convert audio. * @param s allocated Swr context, with parameters set * @param out output buffers, only the first one need be set in case of packed audio * @param out_count amount of space available for output in samples per channel * @param in input buffers, only the first one need to be set in case of packed audio * @param in_count number of input samples available in one channel * * @return number of samples output per channel, negative value on error */
int swr_convert(struct SwrContext *s, uint8_t **out, int out_count,
                                const uint8_t **in , int in_count);
Copy the code

Where in_count and out_count represent the number of input and output samples of each channel, rather than the total number of samples of two channels. For example, a lump of 4096 bytes of dual-channel AV_SAMPLE_FMT_S16 data is collected. So the number of samples per channel is 4096/2 (dual channel) / 2 (16 bits) = 1024.

In addition, NB_samples in AVFrame also represents the number of samples for each channel.

Use of swr_convert:

// audioFrame->data: AV_SAMPLE_FMT_S16 data
int result = swr_convert(recorder->m_swrCtx, pFrame->data, pFrame->nb_samples, (const uint8_t **) &(audioFrame->data), audioFrame->dataSize / 4);
// The number of results is generally equal to in_count. If the value is less than 0, the conversion fails.
Copy the code

Code implementation

FFmpeg encoded audio data as well as encoded video data implementation, Android AudioRecorder via JNI passes PCM data to the Native layer queue for FFmpeg encoding.

extern "C"
JNIEXPORT void JNICALL
Java_com_byteflow_learnffmpeg_media_MediaRecorderContext_native_1OnAudioData(JNIEnv *env, jobject thiz, jbyteArray data, jint size) {
    int len = env->GetArrayLength (data);
    unsigned char* buf = new unsigned char[len];
    env->GetByteArrayRegion(data, 0, len, reinterpret_cast<jbyte*>(buf));
    MediaRecorderContext *pContext = MediaRecorderContext::GetContext(env, thiz);
    if(pContext) pContext->OnAudioData(buf, len);
    delete[] buf;
}
Copy the code

For the convenience of the reader demo, the implementation of FFmpeg encoded audio is now also implemented in a separate class.

class SingleAudioRecorder {
public:
    SingleAudioRecorder(const char *outUrl, int sampleRate, int channelLayout, int sampleFormat);
    ~SingleAudioRecorder(a);// Start recording
    int StartRecord(a);
    // Receive audio data
    int OnFrame2Encode(AudioFrame *inputFrame);
    // Stop recording
    int StopRecord(a);

private:
    // Code loop
    static void StartAACEncoderThread(SingleAudioRecorder *context);
    // Encode a frame function
    int EncodeFrame(AVFrame *pFrame);
private:
    ThreadSafeQueue<AudioFrame *> m_frameQueue;
    char m_outUrl[1024] = {0};
    int m_frameIndex = 0;
    int m_sampleRate;
    int m_channelLayout;
    int m_sampleFormat;
    AVPacket m_avPacket;
    AVFrame  *m_pFrame = nullptr;
    uint8_t *m_pFrameBuffer = nullptr;
    int m_frameBufferSize;
    AVCodec  *m_pCodec = nullptr;
    AVStream *m_pStream = nullptr;
    AVCodecContext *m_pCodecCtx = nullptr;
    AVFormatContext *m_pFormatCtx = nullptr;
    SwrContext *m_swrCtx = nullptr;
    thread *m_encodeThread = nullptr;
    volatile int m_exit = 0;
};
Copy the code

The SingleAudioRecorder starts an encoding loop in a thread that continuously retrieves data from the audio queue for encoding.

// Code loop
void SingleAudioRecorder::StartAACEncoderThread(SingleAudioRecorder *recorder) {
    LOGCATE("SingleAudioRecorder::StartAACEncoderThread start");
    while(! recorder->m_exit || ! recorder->m_frameQueue.Empty()) {if(recorder->m_frameQueue.Empty()) {
            // The queue is empty and dormant
            usleep(10 * 1000);
            continue;
        }

        AudioFrame *audioFrame = recorder->m_frameQueue.Pop();
        AVFrame *pFrame = recorder->m_pFrame;
        // Audio sampling format conversion
        int result = swr_convert(recorder->m_swrCtx, pFrame->data, pFrame->nb_samples, (const uint8_t **) &(audioFrame->data), audioFrame->dataSize / 4);
        LOGCATE("SingleAudioRecorder::StartAACEncoderThread result=%d", result);
        if(result >= 0) {
            pFrame->pts = recorder->m_frameIndex++;
            recorder->EncodeFrame(pFrame);
        }
        delete audioFrame;
    }

    LOGCATE("SingleAudioRecorder::StartAACEncoderThread end");
}

// Encode a frame function
int SingleAudioRecorder::EncodeFrame(AVFrame *pFrame) {
    LOGCATE("SingleAudioRecorder::EncodeFrame pFrame->nb_samples=%d", pFrame ! =nullptr ? pFrame->nb_samples : 0);
    int result = 0;
    result = avcodec_send_frame(m_pCodecCtx, pFrame);
    if(result < 0)
    {
        LOGCATE("SingleAudioRecorder::EncodeFrame avcodec_send_frame fail. ret=%d", result);
        return result;
    }
    while(! result) { result = avcodec_receive_packet(m_pCodecCtx, &m_avPacket);if (result == AVERROR(EAGAIN) || result == AVERROR_EOF) {
            return 0;
        } else if (result < 0) {
            LOGCATE("SingleAudioRecorder::EncodeFrame avcodec_receive_packet fail. ret=%d", result);
            return result;
        }
        LOGCATE("SingleAudioRecorder::EncodeFrame frame pts=%ld, size=%d", m_avPacket.pts, m_avPacket.size);
        m_avPacket.stream_index = m_pStream->index;
        av_interleaved_write_frame(m_pFormatCtx, &m_avPacket);
        av_packet_unref(&m_avPacket);
    }
    return 0;
}
Copy the code

The complete implementation code can be found in the project LearnFFmpeg

Technical communication

Technical exchange/get video tutorials can be added to my wechat: byte-flow