This article was first published on the wechat official account Byteflow

FFmpeg development series serial:

FFmpeg Development (01) : FFmpeg compilation and integration

FFmpeg development (02) : FFmpeg + ANativeWindow video decoding playback

FFmpeg development (03) : FFmpeg + OpenSLES audio decoding playback

FFmpeg development (04) : FFmpeg + OpenGLES audio visual playback

FFmpeg development (05) : FFmpeg + OpenGLES video decoding playback and video filter

In the previous article, we implemented the rendering of decoded video and audio using OpenGL ES and OpenSL ES respectively based on FFmpeg. This article will realize the last important function of the player: audio and video synchronization.

Old people often say that there is no absolute static synchronization between audio and video playback, only relative dynamic synchronization, in fact, audio and video synchronization is a “you catch up with me” process.

There are three audio and video synchronization modes: audio and video to system clock synchronization, audio to video synchronization, and video to audio synchronization.

Audio and video decoder structure

Before the realization of audio and video synchronization, we first briefly say the general structure of the player, convenient for the realization of different audio and video synchronization.

As shown in the figure above, audio decoding and video decoding occupy an independent thread respectively. There is a decoding loop in the thread, which continuously decodes the audio and video encoded data. The audio and video decoding frame is not set to cache Buffer for real-time rendering, which greatly facilitates the realization of audio and video synchronization.

Audio and video decoding thread independent separation of the player mode, simple and flexible, small code for beginners, can be very convenient to achieve audio and video synchronization.

The audio and video decoding processes are very similar, so we can abstract the decoders for both as a base class:

class DecoderBase : public Decoder {
public:
    DecoderBase()
    {};
    virtual~ DecoderBase()
    {};
    // Start playing
    virtual void Start(a);
    // Pause the playback
    virtual void Pause(a);
    / / stop
    virtual void Stop(a);
    // Get the duration
    virtual float GetDuration(a)
    {
        //ms to s
        return m_Duration * 1.0 f / 1000;
    }
    //seek to play at a certain point in time
    virtual void SeekToPosition(float position);
    // The current playing position is used for updating the progress bar and audio and video synchronization
    virtual float GetCurrentPosition(a);
    virtual void ClearCache(a)
    {};
    virtual void SetMessageCallback(void* context, MessageCallback callback)
    {
        m_MsgContext = context;
        m_MsgCallback = callback;
    }
    // Set the audio and video synchronization callback
    virtual void SetAVSyncCallback(void* context, AVSyncCallback callback)
    {
        m_AVDecoderContext = context;
        m_AudioSyncCallback = callback;
    }

protected:
    void * m_MsgContext = nullptr;
    MessageCallback m_MsgCallback = nullptr;
    virtual int Init(const char *url, AVMediaType mediaType);
    virtual void UnInit(a);
    virtual void OnDecoderReady(a) = 0;
    virtual void OnDecoderDone(a) = 0;
    // Decodes the data callback
    virtual void OnFrameAvailable(AVFrame *frame) = 0;

    AVCodecContext *GetCodecContext(a) {
        return m_AVCodecContext;
    }

private:
    int InitFFDecoder(a);
    void UnInitDecoder(a);
    // Start the decoding thread
    void StartDecodingThread(a);
    // Audio and video decoding loop
    void DecodingLoop(a);
    // Update the display timestamp
    void UpdateTimeStamp(a);
    // Audio and video synchronization
    void AVSync(a);
    // Decode a packet encoding data
    int DecodeOnePacket(a);
    // Thread function
    static void DoAVDecoding(DecoderBase *decoder);

    // Encapsulate the format context
    AVFormatContext *m_AVFormatContext = nullptr;
    // Decoder context
    AVCodecContext  *m_AVCodecContext = nullptr;
    / / decoder
    AVCodec         *m_AVCodec = nullptr;
    // Encoded packets
    AVPacket        *m_Packet = nullptr;
    // Decoded frames
    AVFrame         *m_Frame = nullptr;
    // The type of data flow
    AVMediaType      m_MediaType = AVMEDIA_TYPE_UNKNOWN;
    // File address
    char       m_Url[MAX_PATH] = {0};
    // Current playback time
    long             m_CurTimeStamp = 0;
    // Start time of playback
    long             m_StartTimeStamp = - 1;
    // Total duration ms
    long             m_Duration = 0;
    // Data flow index
    int              m_StreamIndex = - 1;
    // Lock and condition variables
    mutex               m_Mutex;
    condition_variable  m_Cond;
    thread             *m_Thread = nullptr;
    //seek position
    volatile float      m_SeekPosition = 0;
    volatile bool       m_SeekSuccess = false;
    // Decoder state
    volatile int  m_DecoderState = STATE_UNKNOWN;
    void* m_AVDecoderContext = nullptr;
    AVSyncCallback m_AudioSyncCallback = nullptr;// For audio and video synchronization
};
Copy the code

Space is limited, too much code can easily lead to visual fatigue, see the full implementation of the code in the reading article, only a few key functions are posted here.

Decode the loop.

void DecoderBase::DecodingLoop(a) {
    LOGCATE("DecoderBase::DecodingLoop start, m_MediaType=%d", m_MediaType);
    {
        std::unique_lock<std::mutex> lock(m_Mutex);
        m_DecoderState = STATE_DECODING;
        lock.unlock(a); }for(;;) {
        while (m_DecoderState == STATE_PAUSE) {
            std::unique_lock<std::mutex> lock(m_Mutex);
            LOGCATE("DecoderBase::DecodingLoop waiting, m_MediaType=%d", m_MediaType);
            m_Cond.wait_for(lock, std::chrono::milliseconds(10));
            m_StartTimeStamp = GetSysCurrentTime() - m_CurTimeStamp;
        }

        if(m_DecoderState == STATE_STOP) {
            break;
        }

        if(m_StartTimeStamp == - 1)
            m_StartTimeStamp = GetSysCurrentTime(a);if(DecodeOnePacket() != 0) {
            // Pause the decoder after decoding
            std::unique_lock<std::mutex> lock(m_Mutex); m_DecoderState = STATE_PAUSE; }}LOGCATE("DecoderBase::DecodingLoop end");
}
Copy the code

Gets the current timestamp.

void DecoderBase::UpdateTimeStamp(a) {
    LOGCATE("DecoderBase::UpdateTimeStamp");
	/ / reference ffplay
    std::unique_lock<std::mutex> lock(m_Mutex);
    if(m_Frame->pkt_dts ! = AV_NOPTS_VALUE) { m_CurTimeStamp = m_Frame->pkt_dts; }else if(m_Frame->pts ! = AV_NOPTS_VALUE) { m_CurTimeStamp = m_Frame->pts; }else {
        m_CurTimeStamp = 0;
    }

    m_CurTimeStamp = (int64_t)((m_CurTimeStamp * av_q2d(m_AVFormatContext->streams[m_StreamIndex]->time_base)) * 1000);

}
Copy the code

Decode the encoded data of a packet.

int DecoderBase::DecodeOnePacket(a) {
    int result = av_read_frame(m_AVFormatContext, m_Packet);
    while(result == 0) {
        if(m_Packet->stream_index == m_StreamIndex) {
            if(avcodec_send_packet(m_AVCodecContext, m_Packet) == AVERROR_EOF) {
                // End of decoding
                result = - 1;
                goto __EXIT;
            }

            // How many frames does a packet contain?
            int frameCount = 0;
            while (avcodec_receive_frame(m_AVCodecContext, m_Frame) == 0) {
                // Update the timestamp
                UpdateTimeStamp(a);/ / synchronize
                AVSync(a);/ / rendering
                LOGCATE("DecoderBase::DecodeOnePacket 000 m_MediaType=%d", m_MediaType);
                OnFrameAvailable(m_Frame);
                LOGCATE("DecoderBase::DecodeOnePacket 0001 m_MediaType=%d", m_MediaType);
                frameCount ++;
            }
            LOGCATE("BaseDecoder::DecodeOneFrame frameCount=%d", frameCount);
            // Check whether a packet is decoded
            if(frameCount > 0) {
                result = 0;
                goto__EXIT; }}av_packet_unref(m_Packet);
        result = av_read_frame(m_AVFormatContext, m_Packet);
    }

__EXIT:
    av_packet_unref(m_Packet);
    return result;
}
Copy the code

The audio and video are synchronized with the system clock

Audio and video synchronization to the system clock, as the name implies, the update of the system clock is increased according to the increase of time, the acquisition of audio and video decoding frame with the system clock for alignment operation.

In short, when the current audio or video playback timestamp is greater than the system clock, the decoder thread sleeps until the timestamp is aligned with the system clock.

The audio and video are synchronized with the system clock.

void DecoderBase::AVSync(a) {
    LOGCATE("DecoderBase::AVSync");
    long curSysTime = GetSysCurrentTime(a);// Calculates the elapsed time from the start of playback based on the system clock
    long elapsedTime = curSysTime - m_StartTimeStamp;

    // Synchronize the system clock
    if(m_CurTimeStamp > elapsedTime) {
        // Sleep time
        auto sleepTime = static_cast<unsigned int>(m_CurTimeStamp - elapsedTime);//ms
        av_usleep(sleepTime * 1000); }}Copy the code

Synchronizing audio and video with the system clock minimizes frame loss and frame hops, but only when the system clock is not affected by other time-consuming tasks.

Audio is synchronized to video

Audio to video synchronization is when the audio timestamp is aligned with the video timestamp. Since video has a fixed refresh rate, namely FPS, we determine the render time of each frame based on PFS, and then use this to determine the time stamp of the video.

When the audio timestamp is larger than the video timestamp or exceeds a certain threshold, the audio player generally inserts a mute frame, sleeps, or slows down the playback. Otherwise, you need to skip, drop frames, or speed up audio playback.

void DecoderBase::AVSync(a) {
    LOGCATE("DecoderBase::AVSync");
    if(m_AVSyncCallback ! =nullptr) {
        // Audio is synchronized to video, and m_AVSyncCallback is passed in to get the video timestamp
        long elapsedTime = m_AVSyncCallback(m_AVDecoderContext);
        LOGCATE("DecoderBase::AVSync m_CurTimeStamp=%ld, elapsedTime=%ld", m_CurTimeStamp, elapsedTime);

        if(m_CurTimeStamp > elapsedTime) {
            // Sleep time
            auto sleepTime = static_cast<unsigned int>(m_CurTimeStamp - elapsedTime);//ms
            av_usleep(sleepTime * 1000); }}}Copy the code

Decoder Settings when audio is synchronized to video.

// Create a decoder
m_VideoDecoder = new VideoDecoder(url);
m_AudioDecoder = new AudioDecoder(url);

// Set the renderer
m_VideoDecoder->SetVideoRender(OpenGLRender::GetInstance());
m_AudioRender = new OpenSLRender(a); m_AudioDecoder->SetVideoRender(m_AudioRender);

// Set the video timestamp callback
m_AudioDecoder->SetAVSyncCallback(m_VideoDecoder, VideoDecoder::GetVideoDecoderTimestampForAVSync);
Copy the code

The advantage of audio to video synchronization is that the video can play every frame, and the picture smoothness is optimal.

However, because the human ear is more sensitive to sound than the eye to image, when audio is aligned with video, mute frame, frame loss or variable speed play can be easily detected by users, resulting in poor experience.

Video is synchronized to audio

Syncing video to audio is common, taking advantage of the fact that the ear is more sensitive to changes in sound than the eye to changes in images.

The audio is played at a fixed sampling rate to provide an alignment baseline for the video. When the video timestamp is larger than the audio timestamp, the renderer does not render or repeat the previous frame, but instead performs a frame-hopping render.

void DecoderBase::AVSync(a) {
    LOGCATE("DecoderBase::AVSync");
    if(m_AVSyncCallback ! =nullptr) {
        // The video is synchronized to the audio, and m_AVSyncCallback is passed in to get the audio timestamp
        long elapsedTime = m_AVSyncCallback(m_AVDecoderContext);
        LOGCATE("DecoderBase::AVSync m_CurTimeStamp=%ld, elapsedTime=%ld", m_CurTimeStamp, elapsedTime);

        if(m_CurTimeStamp > elapsedTime) {
            // Sleep time
            auto sleepTime = static_cast<unsigned int>(m_CurTimeStamp - elapsedTime);//ms
            av_usleep(sleepTime * 1000); }}}Copy the code

Decoder Settings when audio is synchronized to video.

// Create a decoder
m_VideoDecoder = new VideoDecoder(url);
m_AudioDecoder = new AudioDecoder(url);

// Set the renderer
m_VideoDecoder->SetVideoRender(OpenGLRender::GetInstance());
m_AudioRender = new OpenSLRender(a); m_AudioDecoder->SetVideoRender(m_AudioRender);

// Set the audio timestamp callback
m_VideoDecoder->SetAVSyncCallback(m_AudioDecoder, AudioDecoder::GetAudioDecoderTimestampForAVSync);
Copy the code

conclusion

Among the three methods of audio and video synchronization, which one is appropriate depends on the specific use scenario. For example, if you have high requirements for screen fluency, you can choose audio to video synchronization. If you want to implement video or audio playback separately, it’s easier to sync directly to the system clock.

Contact and exchange

Technical exchange/access source can add my wechat: byte-flow, access to audio and video development video tutorials