Introduction to the

As of this post, cross-platform players will not be QT dominant. Why? Since QT is not our main area of study, we will learn how to build a true cross-platform player SDK based on basic libraries such as FFMPEG.

It is planned that the core of the player will be redeveloped based on FFPlay, why the redevelopment? Because ffPlay is very difficult to extend, basically all the logic is written in a C file. In order to solve the problem of not easy to expand, I conceived a set of ideas based on FFPlay secondary development (iJK will be referred to later on the mobile terminal). Since it is secondary development, the functions will be stronger than before, so I planned the following points:

Android

  • cpu: ARMv7a, ARM64v8a
  • API: Similar to MediaPlayer
  • Video rendering: OpenGL ES
  • Audio rendering: OpenSL ES
  • NDK MediaCodec (Android 5.0 +)
  • Times the speed of playback
  • Save the GIF by intercepting the fragment

IOS

  • cpu: armv7, arm64
  • API: Similar to MediaPlayer.framework
  • Video rendering: OpenGL ES
  • Audio output: AudioQueue, AudioUnit
  • Hardware Decoder: VideoToolbox (iOS 8+)
  • Times the speed of playback
  • Save the GIF by intercepting the fragment

PC

  • SDL

At present, the PC side of the code is almost finished, you can see the engineering structure and effect:

Code to wait until the back of almost open source out ~

The following article is basically the first part of the analysis of the source code, and then the next part of the practice of a train of thought.

Below we mainly analyze a composition of ffPlay main frame

What is ffplay

Ffplay is ffMPEG’s own cross-platform player, written in C language. When you compile ffmpeg with the following parameter –enable-ffplay, it will generate an ffplay executable file in output/bin/, and use ffplay XXX.mp4 to play a media file. It is mainly a player implemented in FFMPEG + SDL. In fact, the famous iJkPlayer is based on ffPlay. c for secondary development, so it is very helpful for us to master the principle of FFPlay to develop player.

Ffplay framework analysis

At present, FFPlay is mainly constructed by five threads: main UI, read_thread, Audio_thread, video_thread and Audioqueue_thread (SDL audio rendering). The following figure shows the overall process:

Stream_open player initialization

Stream_open does four things internally

  1. Audio/video/subtitle decoding completed data is initialized according to the set fixed queue size ->frame_queue_init
  2. Audio/video/subtitle To be decoded Packet queue initialization ->packet_queue_init
  3. Initialize by audio/video/external clockinit_clock
  4. Use SDL packageSDL_CreateThreadFunction to create a thread that reads unmarshalled media data

thread

The main the main thread

  1. Configure command parameters and invokestream_openInitialize the parameters required by the player, and finally start the read_thread thread
  2. Render the video in the main thread via SDL

Read_thread reads the unmarshaled data thread

  1. avformat_open_inputOpen media file
  2. Find the corresponding audio/video/subtitle stream information
  3. Open the corresponding audio/video/subtitle decoder according to the corresponding codestream information, and open their respective threads for decoding
  4. startav_read_frameThe rotation reads the data to be decodedpacketAnd then callpacket_queue_putAVPacketsave

Audio_decoder /video_decoder/subtitle_decoder Decoder thread

  1. Read the corresponding AVPacket from packet queue, call the corresponding decoder, and store the decoded AVFrame in the frame Queue queue

Audioqueue_thread -> sdL_audio_callback Audio render thread

  1. Play from the decoded AVFrame PCM data copy -> SDL

Internal core functions

Audio and picture synchronization mechanism

  • Audio based
  • Video based
  • Use the external clock as the main clock

Audio processing

  • The audio control

  • Mute control

  • PCM normalization

Video processing

  • YUV -> RGB
  • scale

control

  • Play, pause, stop, fast forward, fast rewind, play frame by frame, mute

Data structure analysis

VideoState Player overall management structure

VideoState encapsulates the global properties of the player, with the following fields:

typedef struct VideoState {
    SDL_Thread *read_tid;                // Read the unmarshaled thread handle
    AVInputFormat *iformat;              // Pass the specified input format, pointing to demuxer
    int abort_request;                   // If =1, request to quit playing
    int force_refresh;                   // if =1, you need to refresh the screen immediately
    int paused;                          // Pause when =1, play when =0
    int last_paused;           					 // Pause/Play status temporarily
    int queue_attachments_req;					 // Request the cover of mp3, AAC and other audio files
    int seek_req;              					 // Identify a seek request
    int seek_flags;            					 // seek flags, such as AVSEEK_FLAG_BYTE
    int64_t seek_pos;          					 // Request the target position of seek (current position + increment)
    int64_t seek_rel;          					 // This time seek position increment
    int read_pause_return;
    AVFormatContext *ic;       					 // Unpack to get format context
    int realtime;           						 // =1 indicates the real-time stream

    Clock audclk;             					 // Audio clock
    Clock vidclk;             				 	 // Video clock
    Clock extclk;             					 // External clock

    FrameQueue pictq;          					 // Decoded video Frame queue
    FrameQueue subpq;          					 // Decoded subtitle Frame queue
    FrameQueue sampq;          					 // Decoded audio Frame queue

    Decoder auddec;             				 // Audio decoder
    Decoder viddec;             				 // Video decoder
    Decoder subdec;             				 // Subtitle decoder

    int audio_stream;          					 // Audio stream index

    int av_sync_type;           				 // Audio and video synchronization type. By default, video is synchronized to audio

    double audio_clock;                  // PTS of current audio frame + Duration of current frame
    int audio_clock_serial;              // Play sequence, seek can change this value
    // The following four parameters are not used in audio master synchronization mode
    double audio_diff_cum;               // used for AV difference average computation
    double audio_diff_avg_coef;
    double audio_diff_threshold;
    int audio_diff_avg_count;
    // end

    AVStream *audio_st;                  // Audio stream information
    PacketQueue audioq;                  // Audio packet queue
    int audio_hw_buf_size;               // The size of the SDL audio buffer in bytes
    // Point to a frame of audio data to be played. The data area pointed to will be copied into the SDL audio buffer. If it's resampled it points to audio_buf1,
    // Otherwise point to the audio in frame
    uint8_t *audio_buf;             		 // points to data that needs to be resampled
    uint8_t *audio_buf1;            		 // Point to the resampled data
    unsigned int audio_buf_size;     		 // The size of a frame of audio data (pointed by audio_buf) to be played
    unsigned int audio_buf1_size;    		 // The actual size of the audio buffer audio_buf1
    int audio_buf_index;            		 The current audio frame has been copied to the SDL audio buffer
    // Position index to the first byte to be copied
    // The amount of data in the current audio frame that has not been copied into the SDL audio buffer:
    // audio_buf_size = audio_buf_index + audio_write_buf_size
    int audio_write_buf_size;
    int audio_volume;               		 / / volume
    int muted;                      		 // mute =1. Mute =0
    struct AudioParams audio_src;        // Audio frame parameters
#if CONFIG_AVFILTER
    struct AudioParams audio_filter_src;
#endif
    struct AudioParams audio_tgt;       // Audio parameters supported by SDL, resampling: audio_src->audio_tgt
    struct SwrContext *swr_ctx;         // Audio resampling context
    int frame_drops_early;              // Discard video packet count
    int frame_drops_late;               // Discard the video frame count

    enum ShowMode {
        SHOW_MODE_NONE = - 1./ / no show
        SHOW_MODE_VIDEO = 0.// Display video
        SHOW_MODE_WAVES,        				// Display wave, audio
        SHOW_MODE_RDFT,         				// Adaptive filter
        SHOW_MODE_NB
    } show_mode;

    // Audio waveform display is used
    int16_t sample_array[SAMPLE_ARRAY_SIZE];    // Sample array
    int sample_array_index;                     // Sampling index
    int last_i_start;                           // From the beginning
    RDFTContext *rdft;                          // Adaptive filter context
    int rdft_bits;                              // Self-use bit rate
    FFTSample *rdft_data;                       // Fast Fourier sampling

    int xpos;
    double last_vis_time;
    SDL_Texture *vis_texture;       						/ / audio Texture

    SDL_Texture *sub_texture;       						// Subtitle display
    SDL_Texture *vid_texture;       						// Video display

    int subtitle_stream;           							// Subtitle stream index
    AVStream *subtitle_st;          						/ / subtitle stream
    PacketQueue subtitleq;          						// Subtitle packet queue

    double frame_timer;             						// Record the time when the last frame is played
    double frame_last_returned_time;    				// Last return time
    double frame_last_filter_delay;     				// Last filter delay

    int video_stream;               						// Video stream index
    AVStream *video_st;             						/ / video
    PacketQueue videoq;             						// Video queue
    double max_frame_duration;      						Above this, we consider the jump a timestamp discontinuity
    struct SwsContext *img_convert_ctx; 				// Change the video size format
    struct SwsContext *sub_convert_ctx; 				// Subtitle size format change
    int eof;            												// Whether reading is complete

    char *filename;     												/ / file name
    int width, height, xleft, ytop; 						// Width, height, x start, y start
    int step;           												// =1 step playback mode, =0 other modes

#if CONFIG_AVFILTER
    int vfilter_idx;
    AVFilterContext *in_video_filter;   				// the first filter in the video chain
    AVFilterContext *out_video_filter;  				// the last filter in the video chain
    AVFilterContext *in_audio_filter;   				// the first filter in the audio chain
    AVFilterContext *out_audio_filter;  				// the last filter in the audio chain
    AVFilterGraph *agraph;              				// audio filter graph
#endif
    // Keep the steam index of the latest corresponding audio, video, and subtitle streams
    int last_video_stream, last_audio_stream, last_subtitle_stream;

    SDL_cond *continue_read_thread; 						The condition can be used to wake up the reader thread when it goes to sleep after the read queue is full
} VideoState;
Copy the code

Clock Encapsulates the Clock structure

Clock is mainly the time stamp encapsulation of audio/video/subtitles. Its structure is as follows:

// The system clock is defined by av_gettime_relative()
typedef struct Clock {
    double pts;            // Based on the clock, the current frame (to play) displays the timestamp, after playing, the current frame becomes the previous frame
    // The difference between the current PTS and the current system clock. Audio and video are independent of this value
    double pts_drift;      // clock base minus time at which we updated the clock
    // Time when the current clock (such as the video clock) was last updated
    double last_updated;   // Last update of the system clock
    double speed;          // Clock speed control, used to control playback speed
    // Play sequence is a sequence of play action, a seek operation will start a new sequence of play
    int serial;            // clock is based on a packet with this serial
    int paused;            // = 1 indicates the pause state
    / / points to packet_serial
    int *queue_serial;      /* pointer to the current packet queue serial, used for obsolete clock detection */
} Clock;
Copy the code

The main API calls are as follows:

// Initializes the clock, queue_serial Specifies the sequence number to play in the current queue
void init_clock(Clock *c, int *queue_serial);

// Set the timestamp
void set_clock(struct Clock *c, double pts, int serial);

// Get the corresponding timestamp
double get_clock(struct Clock *c);

// Get the master timestamp based on the synchronization type
int get_master_sync_type(struct VideoState *is);
Copy the code

PacketQueue AVPacket queue

The queue is designed with the following ideas:

  1. Thread-safe, mutually exclusive, wait, wake up
  2. Cache data volume statistics
  3. Cache packet size statistics
  4. Cumulative package sustainable time
  5. Basic save/fetch/release functions

PacketQueue is mainly used to access and encapsulate audio/video/subtitle unsealed data. Its structure is as follows:

typedef struct MyAVPacketList {
    AVPacket pkt; 								// Unpack the data
    struct MyAVPacketList *next;  // Next node
    int serial;										// Play the sequence
} MyAVPacketList;


typedef struct PacketQueue {
    MyAVPacketList *first_pkt, *last_pkt;  // Team head, team tail pointer
    int nb_packets; 											 // Number of packets, i.e. number of queue elements
    int size;       											 // Total data size of all elements in the queue
    int64_t duration;											 // The duration of data playback for all elements of the queue
    int abort_request;										 // The user exits the request flag
    int serial; 													 // Play serial number, same as serial of MyAVPacketList, but change sequence slightly different
    SDL_mutex *mutex;											 Pthread_mutex_t (); // PacketQueue ();
    SDL_cond *cond;												 Pthread_mutex_t (); // PacketQueue ();
    AVPacket *flush_pkt;
} PacketQueue;
Copy the code

Main API calls:

/** * Initializes the values of each field * @param q * @return */
int packet_queue_init(struct PacketQueue *q);

/** * message queue, free memory * @param q */
void packet_queue_destroy(struct PacketQueue *q);

/** * start queue * @param q */
void packet_queue_start(struct PacketQueue *q);

/** ** abort queue * @param q */
void packet_queue_abort(struct PacketQueue *q);


/** * fetch a node from the queue * @param q * @param PKT * @param block * @param serial * @return */
int packet_queue_get(struct PacketQueue *q, AVPacket *pkt, int block, int *serial);

* @param PKT * @return */
int packet_queue_put(struct PacketQueue *q, AVPacket *pkt);

/** * put an empty package * @param q * @param starat_index * @return */
int packet_queue_put_nullpacket(struct PacketQueue *q, int starat_index);

/** * Clear existing nodes * @param q */
void packet_queue_flush(struct PacketQueue *q);

/** * save node * @param q * @param PKT * @return */
int packet_queue_put_private(struct PacketQueue *q, AVPacket *pkt);
Copy the code

FrameQueue AVFrame queue

Design idea:

  1. Thread-safe, supporting mutual exclusion, wait, wake up
  2. Controls the size of the cache queue

FrameQueue is mainly used to access the original data after decoding audio/video/subtitles

// Cache decoded audio, video, and subtitle data
typedef struct Frame {
    AVFrame *frame;         // Point to a data frame
    AVSubtitle sub;         // For subtitles
    int serial;             // Frame sequence, the serial will change when seeking
    double pts;             // Timestamp, in seconds
    double duration;        // The frame duration, in seconds
    int64_t pos;            // The byte position of the frame in the input file
    int width;              // Image width
    int height;             // Image high read
    int format;             // For image format (enum AVPixelFormat),
    // For voice, (enum AVSampleFormat)
    AVRational sar;         // Image aspect ratio (16:9, 4:3...) , 0/1 if unknown or not specified
    int uploaded;           // This is used to record whether the frame has been displayed.
    int flip_v;             // if =1, flip vertically; if = 0, play normally
} Frame;

/* This is a circular queue where windex is the first element and rindex is the last element. */
typedef struct FrameQueue {
    Frame queue[FRAME_QUEUE_SIZE];      // FRAME_QUEUE_SIZE Maximum size. If a number is too large, it consumes a lot of memory. Set this parameter carefully
    int rindex;                         // Read the index. This frame is read and played when it is to be played. After playing, this frame becomes the last frame
    int windex;                         / / write index
    int size;                           // The current number of frames
    int max_size;                       // The maximum number of frames can be stored
    int keep_last;                      // = 1 to keep the last frame of data in the queue, and only release it when the queue is destroyed
    int rindex_shown;                   // Initialize to 0 with keep_last=1
    SDL_mutex *mutex;                   / / the mutex
    SDL_cond *cond;                     // Condition variable
    PacketQueue *pktq;                  // Packet buffer queue
} FrameQueue;
Copy the code

The main operation apis are as follows:

/** * Initialize FrameQueue * @param f raw data * @param PKTQ encoded data * @param max_size maximum cache * @param keep_last * @return */
int frame_queue_init(struct FrameQueue *f,struct PacketQueue *pktq, int max_size, int keep_last);

/** * destroy all frames in the queue * @param f * @return */
int frame_queue_destory(struct FrameQueue *f);

/** * release a reference to the cached frame * @param vp */
void frame_queue_unref_item(struct Frame *vp);

/** ** ** @param f */
void frame_queue_signal(struct FrameQueue *f);

Ff_frame_queue_nb_remaining Ensures that there is a Frame readable * @param f * @return */ before calling ff_frame_queue_nb_remaining
struct Frame *frame_queue_peek(struct FrameQueue *f);

@param f * @return No matter when it is called, it must not return NULL */
struct Frame *frame_queue_peek_next(struct FrameQueue *f);

If rindex_shown=0, the same effect as frame_queue_peek. If rindex_shown=1, the same effect will be read as Frame @param f * @return */
struct Frame *frame_queue_peek_last(struct FrameQueue *f);

/** * get a writable Frame, which can be done in blocking or non-blocking mode * @param f * @return */
struct Frame *frame_queue_peek_writable(struct FrameQueue *f);

/** * get a readable Frame, which can be done in blocking or non-blocking mode * @param f * @return */
struct Frame *frame_queue_peek_readable(struct FrameQueue *f);

/** * the number of frames in the queue is increased by 1 * @param f */
void frame_queue_push(struct FrameQueue *f);

* When keep_last is 1 and rindex_show is 0, the rindex is not updated and the current Frame * @param f */ is not released
void frame_queue_next(struct FrameQueue *f);

/** * Make sure there are at least 2 frames in the queue * @param f * @return */
int frame_queue_nb_remaining(struct FrameQueue *f);

* @param f * @return */
int64_t frame_queue_last_pos(struct FrameQueue *f);

Copy the code

Decoder Decoder structure

Decoder is mainly for audio/video/subtitles decoding package, its structure is as follows:

/** ** decoder package */
typedef struct Decoder {
    AVPacket pkt;               //
    PacketQueue *queue;         // Packet queue
    AVCodecContext *avctx;      // Decoder context
    int pkt_serial;             / / package sequence
    int finished;               // =0, the decoder is working; = non-0, the decoder is idle
    int packet_pending;         // =0, the decoder is in an abnormal state, need to consider resetting the decoder; =1, the decoder is in normal state
    SDL_cond *empty_queue_cond; // If the packet queue is empty, send signal cache read_thread to read data
    int64_t start_pts;          // Start time of stream when initialized
    AVRational start_pts_tb;    // Time_base of stream when initialized
    int64_t next_pts;           // Record the PTS of the frame after the last decoding. If there is no valid PTS in the solved part of the frame, next_PTS is used for calculation
    AVRational next_pts_tb;     // Unit of next_pts
    SDL_Thread *decoder_tid;    // Thread handle
} Decoder;
Copy the code

The main API calls are as follows:

/** * decoder initialization * @param is * @return */
int ff_decoder_init(struct VideoState *is);

/** * decoded destroy * @param d */
void ff_decoder_destroy(struct Decoder *d);


/** * decoder parameter initialization * @param d * @param avctx * @param queue * @param empty_queue_cond */
void ff_decoder_parameters_init(struct Decoder *d, AVCodecContext *avctx, struct PacketQueue *queue,
                                struct SDL_cond *empty_queue_cond);

/** * open decoder component * @brief stream_component_open * @param is * @param stream_index Stream index * @return return 0 if OK */
int ff_stream_component_open(struct VideoState *is, int stream_index);

/** * create a decoding thread, audio/video has a separate thread */
int ff_decoder_start(struct Decoder *d, int (*fn)(void *), const char *thread_name, void *arg);

/** * stop decoding * @param d * @param fq */
void ff_decoder_abort(struct Decoder *d, struct FrameQueue *fq);

* @param d * @param frame * @param sub * @return */
int ff_decoder_decode_frame(struct Decoder *d, AVFrame *frame, AVSubtitle *sub, struct FFOptions *ops);

Copy the code

conclusion

This article introduces the main components of FFplay, the implementation principle will be reflected in the subsequent articles, please look forward to it!