An overview,

It mainly solves the synchronization of sound and picture when playing and collecting code

Two, technical points, principles, nouns

  • The time stamp

Audio and video have time stamps respectively. In the process of audio and video synchronization, the time stamps of the current audio and video need to be kept within a certain difference to prevent users from sensing the difference in the timing of audio and video content. The concepts of DTS and PTS are described as follows:

DTS (Decoding Time Stamp) : refers to the Decoding Time Stamp. The meaning of this Time Stamp is to tell the player when to decode this frame of data. The unit is 1/90,000 of a second.

PTS (Presentation Time Stamp) : The display Time Stamp is used to tell the player when the frame should be displayed. The unit is 1/90,000 of a second.

CompositionTime CompositionTime offset, also called offset or CTS for short, does not store PTS in AVC video frames, but calculates PTS from DTS and CTS. CTS offset: CTS = (Pts-DTS) / 90. CTS is measured in milliseconds.

Note that although DTS and PTS are used to guide the behavior of the player side, they are generated by the encoder at encoding time.

When there are no B-frames in the video stream, DTS and PTS are usually in the same order. But if there is a B frame, it goes back to the problem we talked about earlier: the decoding order and the playback order are inconsistent.

For example, in a video, the display sequence of frames is: I, B, B, P. Now we need to know the information in P frame when decoding B frame, so the sequence of these frames in the video stream may be: I, P, B, B, at this time, it shows that each frame has the function of DTS and PTS. DTS tells us in what order to decode the frames, and PTS tells us in what order to display the frames. The order is roughly as follows:

   PTS: 1 4 2 3
   DTS: 1 2 3 4
Stream: I P B B
Copy the code
  • synchronously

B frame is synchronized at the acquisition end, but no B frame is synchronized

The playback end synchronizes B frames, but no B frames

  • Synchronous threshold

As shown in the figure, this “threshold” is defined by an international standard called RFC-1359, which is defined as follows:

1. Undetected: The time stamp difference between audio and video ranges from -100ms to +25ms 2. Can detect: audio lag of more than 100ms, or more than 25ms in advance 3. Unacceptable: Audio is more than 185ms behind, or more than 90ms aheadCopy the code

Open source projects

Four, reference

  • 1. Video and audio synchronization in FFMPEG