Explore mobile audio & Video & GSYVideoPlayer tour | Agora Talk

Hello everyone, I am Guo Shuyu, the director of Github open source GSY series projects. For example, GSYVideoPlayer on Android is one of the projects in this series. In addition, I am also the author of Practical Explanation of Flutter Development. Usually active in Github and Nuggets platform, this time mainly to share some interesting content of mobile audio and video development and GSYVideoPlayer.

Basic knowledge of

First of all, basic knowledge, this share will take up a large proportion in this section, why to talk with you about basic knowledge of audio and video? This leads to another classic issue I made a long time ago, as shown in the picture:

During the maintenance of GSYVideoPlayer for more than 5 years, I have actually received a lot of similar basic problems, but this is a typical one, and the process of dealing with these problems is similar. For example, the video playback problem I have received the most should be

“Play black screen, play failed, XXX can broadcast why can’t GSY broadcast?”

Every time I get this question, I ask,

“What’s the video code?”

And there’s a good chance that the response I get is:

MP4

Then you need to explain further: “MP4 is not video coding, how to view video coding…” Such a process.

Therefore, the basic knowledge of audio and video is also the content THAT I most often popularize. There are many contents in this part. Here I will select some common ones or ones that I often explain to share with you.

encapsulation

As shown in the figure, in general, a video stream needs to go through such processes as protocol unpacking and coding from loading to preparing for playback. Among them:

Protocol means streaming media protocol;
Encapsulation is the encapsulation format of video;
Coding is divided into video coding and audio coding;

Common protocols include HTTP, RTSP, and RTMP, among which the most common is HTTP network protocol. RTSP and RTMP are generally used for live streaming or support scenarios with control signaling, such as remote monitoring. HTTP also supports live streaming, such as HLS (also known as M3U8).

Video encapsulation protocol is our common MP4, AVI, RMVB common suffix, MKV, FLV format, they have said is multimedia encapsulation protocol, is in the process of transmission of audio and video together packaging packaging, so is need to untie this part before, to extract the corresponding audio coding and video coding.

So we can see that the MP4 belongs to the package format, which is mainly used to pack the video encoded and audio encoded files together, which is to facilitate transmission and storage, so before playing the general need to unlock this layer of packaging, so that we need to get the audio track and video track.

Audio coding

Audio encoding refers to the encoding method of audio data, such as: MP3, PCM, WAV, AAC, AC-3, etc., because the original data size of audio is generally not suitable for direct input, for example, the original size can be calculated according to sampling rate * number of tracks * sample format. Assuming the video has an audio sampling rate of 44100 and a sample format of 16 bits, mono, and 24 seconds, its original audio size should be

44100 * 16 * 1 * 24/8 ≈ 2MB

The actual size of the audio information extracted is only about 200 K, and that’s where audio coding comes in.

So generally audio transmission will use a variety of coding formats for compression and redundancy, such as no compression WAV/PCM coding audio quality will be better, but the volume will be larger; MP3 lossy compression can compress the volume of audio while the audio quality is still acceptable; AAC is also lossy compression, but there are LC-AAC, He-AAC and so on.

For example, the normal frequency range for our ears is 20Hz-20khz. In MP3 encoding, a lot of redundant signals and irrelevant signals are cut off.

Video coding

Video coding refers to the coding and compression mode of the picture image, generally H263, H264, HEVC (H265), MPEG-2, MPEG-4 and so on, among which H264 is the more common coding mode at present.

Under normal circumstances, we understand the picture is RGB combination, and the current video field may be more use YUV format, where Y represents brightness (gray), and U and V represent chroma (saturation).

“Y” refers to brightness, or gray scale, while “U” and “V” refer to chroma, which describes the color and saturation of the image and specifies the color of the pixel.

The most common representation is that Y, U, and V are all represented by 8 bytes. The value range of Y is 16 ~ 235, and the value range of UV is 16 ~ 240. This value range is to prevent overload caused by signal changes in the transmission process, which is later known as YCbCr.

For example, the most common YUV sampling format is 4:2:0, and YUV420 can understand storage of chromaticity at a sampling rate of 2:1.

YUV 420 does not mean that V components are not sampled, but 4 Y components share a set of UV, and then brightness through chroma to display the picture, more YUV here will not be discussed, and why YUV is used one of the factors is to compatible with the previous black and white TV.

Why not just use the original YUV? Here we assume that the MOV video above directly uses YUV420 format, then the size of a frame would be:

1080 * 1920 * 1 + 1080 * 1920 * 0.5 = 2.9MB, Y = width * height; U = Y / 4; V = Y / 4;

----------------------
|     Y      | U|V |
----------------------
Copy the code

Can see YUV 420 images of the sample, has saved the space of half than RGB images, but if on the basis of this, in a video frame rate (30) and the length of the (one hour), then a video original size would be astronomical, it obviously does not conform to the network transmission, video encoding was used to compress the image.

With the existence of video coding, the role of video coding is to compress the transmitted picture, so as to restore the picture as far as possible at the same time, get smaller volume.

Here are the basic concepts for the most common video coding:

1. IPB frame is a common frame compression method, in which I frame belongs to the key frame and is the reference frame of each picture; P frame is forward prediction frame; B frame is a two-way prediction frame. To put it simply, I frame can get a complete picture by itself, while P frame needs the previous I frame or P frame to help decode and get a complete picture, while B frame needs the previous I/P frame or the following P frame to help form a picture.

Therefore, I frame is very important. Compressing I frame can easily suppress the size of space, while compressing P/B frame can compress the redundant information in time. And generally in the video seek, I frame is very key, if the video seek after the beat forward, it is likely that your video compression is too much.

2. There is also a concept called IDR frame, because H264 adopts multi-frame prediction, so I frame cannot be used as an independent observation condition, so a special I frame called IDR frame is added for reference. The most critical concept of IDR frame is: Once the IDR frame is received during the decoder process, the reference frame buffer will be emptied immediately and the IDR frame will be regarded as the referenced frame.

IDR frame is to refresh immediately, so that if there are some abnormal pictures, it will not cause propagation because of continuous, starting from IDR frame to recalciate a new sequence to start encoding, such as BP frame after IDR frame can not reference the I frame before it. So an IDR frame must be an I frame, but an I frame is not necessarily an IDR frame.

3. DTS (Decoding Time Stamp) and PTS (Presentation Time Stamp) also exist in video Decoding. DTS is mainly used for video Decoding, and PTS is mainly used for video synchronization and output in the Decoding stage.

Because the decoding of the data in the video packet is not continuous, but requires the DTS obtained by decoding the data source to determine when the packet should be decoded, and the resulting PTS determines when the decoded picture is drawn.

4. GOP (Group Of Picture) is the distance between two I frames. Generally, the larger the GOP is, the higher the image compression efficiency will be, but the longer the decoding time will be. Therefore, if the bit rate is fixed and the GOP value is larger, the number of P/B frames will be larger.

It can be seen that the video coding is to remove redundancy to a large extent, that is, a single packet cannot compose a picture by itself, it needs the assistance of corresponding associated data to achieve a complete rendering, so the sequence of decoding does not represent the sequence of the picture, and i-frame is the most critical reference point.

Having covered these common basics, there are two special and common basics that I want to briefly cover: HLS and RTSP.

M3U8

Generally speaking, HLS and M3U8 refer to the same thing. HLS(HTTP Live Streaming) refers to the HTTP protocol based Streaming solution developed by Apple, which can directly provide on-demand and Live Streaming capabilities on ordinary HTTP applications.

In HLS, the video stream files are divided into small pieces (TS) and indexed files (M3U8). In most cases, the video stream code is H.264 and the audio stream code is AAC.

M3U8 is special in that it is played by retrieving index files first and then retrieving each TS file. For example, if you put an M3U8 link into a PC browser, it will most likely not support playing, but download an M3U8 file.

Although it has been called M3U8, the format is M3U, and the encoding format is UTF-8, so it is M3U8. It is a plain text file, and there are many tags in the file.

#EXTM3U: the required tag identifies the file as M3U;
#EXTINF: The duration of each TS media file, in seconds, only for the following URL;
# ext-x-targetDuration: specifies the maximum ts duration in the M3U file, in seconds. The duration in #EXTINF must be less than or equal to this maximum, and can only occur once.
# ext-x-key: indicates the use of encryption and decryption, more on this later.
# ext-x-playlist-type: optional. If VOD is used, the server cannot change the entire TS file list. If it is EVENT, the server cannot change or delete any part of the TS file list, but can add content to the file.
# ext-x-endList: indicates the end of the list. It can appear anywhere in the list, but only one;
# ext-x-media: used in multiple languages, such as multiple audio languages;
# ext-x-stream-INF: used to nest M3U8, such as providing multiple M3U8 playback links at different bit rates;
# ext-x-media-sequence = 0 # ext-x-media-sequence = 0 # ext-x-media-sequence = 0 We’ll talk about the effects of this later;

Therefore, to play the video in M3U8 format, first of all, we need to obtain the index file, then parse the corresponding tag, and then get a PlayList to play. During the period of maintaining GSYVideoPlayer, one of the most problems we received about M3U8 is actually encryption.

M3U8 can be encrypted with the tag # ext-x-key

# EXT - X - KEY: METHOD = < METHOD > [, URI = > < URI] [= < > IV IV]

In M3U8, the encryption is aes-128. In ExoPlayer, aes-CBC-PKCS7PADDING.

To better understand the encryption logic, it is necessary to introduce the COMPOSITION of AES/CBC/PKCS7Padding. In brief, the general encryption process is:

Encryption: Padding -> CBC encryption -> Base64 encoding
Decryption: Base64 decoding -> CBC decryption -> Unpadding

PKCS7Padding

The first is PKCS7Padding, because in AES encryption, sometimes the data may not fit a block, that is, it is not aligned, and PKCS7Padding is supposed to fill in a few bytes, so that each block is the same size.

For PKCS7Padding, if the size of the data is not an integer multiple of the specified block size, bytes are added until it becomes an integer multiple, and the value of each added byte is the total number of bytes added.

Isn’t it a bit loopy? For example, if the message is 3 bytes less than a multiples of 16, three 0x03 bytes are padded to the end of the block.

For PKCS7Padding, if the data is already an integer multiple of the block size, the entire 0x15 block (16 0x15 characters) is added to the end of the message. This is to get a checksum indicating that the end of the message has been reached during decryption and no more data follows, or we can read the last bit by default to get the number of bytes to strip.

In addition to the usual PKCS7Padding, there is another called PKCS5Padding, which is different in that it is only defined for block ciphers that use a 64-bit (8-byte) block size, that is, a fixed block size of 8.

CBC

CBC is to first cut the plaintext into several small sections, and then perform xOR operation with the initial block or the ciphertext section of the previous section, and then encrypt it with the key. That is, the next section of content is related to the previous section. There are also parameters similar to intersection IV in M3U8.

IV is usually 0. If an Initialization Vector (IV) is not available, use the sequence number as the IV encoding and decoding. Assign the high part of the sequence number to the 16-byte buffer, and fill in the left part with zeros.

The first block uses initialization vector (IV) or 16-byte random value encryption, and the next block uses it to start the encryption process. Each subsequent block is encrypted using ciphertext from the previous block in a method called cipher block chaining (CBC).

AES

As symmetric encryption, AES encrypts and decrypts using the same key.

In addition, AES is a block password. The plaintext data is divided into groups with the same length. Each group of data is encrypted until the whole plaintext is encrypted.

AES standard specification, the packet length can only be 128 bits, that is, each packet is 16 bytes (8 bits per byte), also known as AES-128, for the length of the key can be 128 bits, 192 bits or 256 bits, which involves matrix operation, I will not expand the details, I did not go into too much. Remember this number is 16 bytes.

After introducing encryption, let’s look at the M3U8 format in ExoPlayer. If there is encryption, first we need to obtain the key through the URL

There is an interesting problem here, because the key is usually base64, and the user has encountered a base64 key that should be 16 bytes, after parsing:

“Java. IO. IOException: javax.mail crypto. IllegalBlockSizeException: Error: XXXX Cipher functions provides: OPENSSL_internal: WRONG_FINAL_BLOCK_LENGTH”

This is because on Android, the Android Base64 contains line terminators by DEFAULT, and to get the same result as the Apache encoding, you need to use base64.no_wrap, because the DEFAULT is base64.default.

publich String encode(byte[] bytes) {
    return Base64.encodeToString(bytes, Base64.NO_WRAP);
}
Copy the code

Here why specifically explain so much about M3U8 encryption and decryption logic? Because for M3U8 encryption is very common, and very prone to error, the introduction of this is also for everyone in the use of M3U8 format playback, can be more skilled to find a solution to the problem.

RTSP

RTSP (Real Time Streaming Protocol) is a real-time Streaming Protocol. Of course, it can also be used on demand, mostly for surveillance and network TV.

If you look at the concepts of the RTSP, you’ll see a bunch of concepts like this, and we won’t go into the details here, but what they are:

RTSP: it is some network protocols at the application layer. It is a real-time stream transmission protocol. Generally, RTSP uses SDP to transmit media parameters and RTP (RTCP) to transmit media streams. In addition, it is a two-way real-time data transmission protocol, which allows the client to send requests to the server, such as playback, fast forward, backward and other operations, that is, in addition to live broadcast, it can also bring the logic of control.
SDP: It is the format of the session description, which does not actually contain the transport protocol, but a convention format, which contains the transport protocol (RTP/UDP/IP, H.320, etc.) and the media format (H.261 video, MPEG video, etc.). That is, session level and media level description.

So you can see that SDP is used in the RTSP, but they are basically at the application level.

RTP (Real-time Transport Protocol) is a real-time Transport Protocol. Defining it at the Transport layer is not very accurate because it relies on TCP or UDP, which is UDP by default, because in general, RTP does not guarantee transmission or prevent disordered transmission, nor does it determine the reliability of the underlying network. For example, the sequence number in RTP allows the receiver to rearrange the sender’s packet sequence, and the sequence number can also be used to determine the appropriate packet location. For example, in video decoding, sequential decoding is not required.
Real-time Transport Control Protocol (RTCP) is a sister Protocol of real-time Transport Protocol (RTP). RTCP provides out-of-band Control for RTP media streams. RTCP does not transmit data. But work with RTP to package and send multimedia data.

In general, RTP provides real-time transmission of time information and stream synchronization, while RTCP ensures quality of service.

Therefore, RTSP contains SDP at the application layer and RTP at the transport layer. When using RTP, you can use UDP or TCP. Understanding these concepts will help you better understand the overall concept of RTSP.

In ijkPlayer, for example, if you access RTSP, you can configure the transport protocol using the rtSP_transport and RtSP_flags configurations.

In addition, there is a concept called RTC. RTC (Real-Time Communications) is a real-time communication protocol. Common WebRTC is a part of RTC, which has nothing to do with RTSP and is generally used for video conferences, real-time communication and live interaction. RTC is usually point-to-point (P2P) communication, covering the steps of acquisition, coding, pre and post processing, transmission, decoding, buffering, rendering, and so on.

Of course, streaming and sharing audio and video through WebSocket is possible, but the API is not as powerful as the corresponding functions in WebRTC. Simply speaking, such as collecting, encoding, packaging, receiving and synchronization need to be handled by themselves.

GSYVideoPlayer

After introducing so many basic concepts, next introduce GSYVideoPlayer, GSYVideoPlayer to introduce some common sense of Android audio and video development.

In fact, GSYVideoPlayer is not a technical project, the amount of code is not much, it is more of an integration of existing resources. As is shown in the picture, it can be seen that the main points are:

Play kernel layer

It is mainly used to implement real player kernels, such as the common IJKPlayer, MediaPlayer and ExoPlayer, to integrate their playback capabilities, but the differences between them will be covered later.

The Manager layer

It is an intermediate process for managing and bridging the kernel layer and UI layer, and typically does not operate on the kernel directly, but through a unified Manager layer, such as release, fast forward, pause, and so on.

The UI layer

It’s a View that the user uses directly in GSYVideoPlayer, and basically what the user uses in GSYVideoPlayer is a subclass of uI-level objects, like StandardGSYVideoPlayer, The UI layer mainly connects the Manager layer and the rendering layer.

Rendering layer

The rendering layer is mainly used to provide different rendering media, i.e. TextureView, SurfaceView, GLSurfaceView, etc., to provide different rendering views, because different scenes or machine environments require different rendering effects.

For example, if you need to customize the screen data while rendering, you might need GLSurfaceView; SurfaceView rendering may be clearer than TextureView on some hardware devices; Use TextureView for transparency operations and so on.

Buffer layer

Is used for video playback, for the need to achieve video playback while downloading logic.

As shown above, GSYVideoPlayer simply makes some layers for the player according to the requirements, and each layer is assembled together through a fixed interface. In fact, why is it necessary to support multiple player kernels and multiple render surface?

Because in general, a kernel is difficult to meet a variety of playback and rendering common, strange audio and video sources, the best is through a variety of playback kernel to support, so as to achieve a more stable playback effect.

IJKPlayer, MediaPlayer and ExoPlayer

There are three built-in kernels: IJKPlayer, MediaPlayer, and ExoPlayer.

IJKPlayer

First of all, the bottom layer of IJKPlayer uses FFmpeg, which indicates that IJKPlayer can flexibly configure the playback format that needs to be supported, because the decoding of FFmpeg is mostly soft decoding capability, that is, decoding playback is realized through CPU. So you have the flexibility to support and configure the video format you need to support, of course, this support is to modify the configuration and packaging.

In addition, FFmpeg is used, so IJKPlayer uses soft decoding by default. Soft decoding is simply understood as the decoding ability of pure CPU. Therefore, in the face of high bit rate video, there may be bottlenecks in decoding ability, such as splintered screen, and out of sync between picture and video.

Of course, IJKPlayer can also enable the hard decoding capability, which is to increase the decoding capability of assisting videos with hardware and GPU. However, the hard decoding effect of IJKPlayer is not good at present.

MediaPlayer & ExoPlayer

In fact, both MediaPlayer and ExoPlayer have the decoding capability through MediaCodec. The underlying difference between them is not big. Generally speaking, it can be understood that they use hard decoding capability.

Why in general? Since Android has a prioritized list of codecs underneath, if the device has a decoder for that format, the hard decoder will usually be chosen.

This order of precedence is not disclosed to the upper layer. By default, Android is distinguished by the naming of decoders. Soft decoders usually start with omx. Google and hard decoders usually start with omx. [hardware_vendor]. Not to. OMX.. That would also be considered a soft decoder.

For example, the following code can be simply seen as a judge whether to support H265 hardware decoding

   public static boolean isH265HardwareDecoderSupport(a) {
        MediaCodecList codecList = new MediaCodecList();
        MediaCodecInfo[] codecInfos = codecList.getCodecInfos();
        for (int i = 0; i < codecInfos.length; i++) {
            MediaCodecInfo codecInfo = codecInfos[i];
            if(! codecInfo.isEncoder() && (codecInfo.getName().contains("hevc")
                    && !isSoftwareCodec(codecInfo.getName()))) {
                return true; }}return false;
    }
    public boolean isSoftwareCodec(String codecName) {
        if (codecName.startsWith("OMX.google.")) {
            return true;
        }

        if (codecName.startsWith("OMX.")) {
            return false;
        }

        return true;
    }
Copy the code

However, using MediaCodec means that you can’t interfere with its decoding support, so MediaCodec’s encoding capability is generally weak, especially for some weird video encodings.

The biggest difference between MediaPlayer and ExoPlayer is: MediaPlayer has very little customization space. Basically, if MediaPlayer fails to play, it is basically a dead end, because it can provide very little operation space, such as HTTPS certificate does not verify.

ExoPlayer is an open source project that provides more flexible configuration and debugging. The most important things in ExoPlayer are DataSource and MediaSource.

The official MediaSource provides a wealth of MediaSource for processing data for various protocols, such as

SsMediaSource
RtspMediaSource
DashMediaSource
HlsMediaSource
ProgressiveMediaSource

SsMediaSource and DashMediaSource may be used less often, They correspond to Microsoft Smooth Streaming and MPEG-DASH (Dynamic Adaptive Streaming over HTTP).

They are actually similar to Apple’s HLS scheme, which supports an adaptive streaming video scheme over HTTP.

In general, the ProgressiveMediaSource will be the most commonly used in current ExoPlayer versions, and its internal default DefaultExtractorsFactory will find the appropriate decoding format for this video.

For all protocols other than RTMP and RTSP, an HTTP DataSource is required. The DataSource is responsible for downloading videos according to the network protocol. The official ExoPlayer video cache CacheDataSource is also implemented in this layer.

Here’s an example:

If you’re a Rtmp video, that’s itProgressiveMediaSource + RtmpDataSource ；
If it’s your HTTP MP4, yesProgressiveMediaSource + DefaultDataSource;
If you want to cache your video while HTTP is playing, that’s itProgressiveMediaSource+ (CacheDataSourcenestedDefaultDataSource)
If you want to intervene in the network download process, such as ignoring SSL certificates, then you can define your ownDataSourceTo implement;
If you need a preloaded list to play, you can use itConcatenatingMediaSource;
If you want to customize the decryption and TS list parsing for M3U8, you can customize itHlsMediaSourceAnd implement the insideHlsPlaylistParser 和 Aes128DataSource;

There used to be a LoopingMediaSource, but this is now deprecated in favor of the setRepeatMode method.

As you can see, ExoPlayer is much more flexible and customizable. With the nesting and definition of MediaSource and the nesting of DataSource, you can get a much more controllable and debugable code structure. Of course, it is a bit more complicated to use than MediaPlayer.

There is also news that the ExoPlayer may migrate to the Media project on androidx in the future.

Q&A

First of all, let’s look at a picture. This is a partial screenshot of some detailed conditions encountered in audio and video development before, and the corresponding branch needs to be processed:

In fact in audio development if only to a state of “to catch line” is not difficult, but if the details on each question, there is a lot to deal with the situation, so if the boss says to you “do a similar iQIYI play page”, might as well put the details of the problem and discuss again, let him feel the do a player is how not easy.

What do you need to know when starting an audiovisual project

Because a lot of times developers have the misconception that “isn’t it just a matter of putting a Url in the PLAYER SDK?” “, leaving too many holes to fill for the subsequent development.

1, first of all, when doing audio and video development, to determine their need to support the packaging protocol, video coding, audio coding format, because there are thousands of coding formats, under normal circumstances you can not all support, so first of all in the demand to determine the need to support the format range.
2. If there is a scenario where users upload videos independently, it is better to provide formatting and transcoding rate functions on the server side. Because in the server to judge the video format and transcoding can standardize the unified coding, this can reduce the client end because of codec failure cannot play the problem; In addition, by providing links of different bit rates for the same video, you can have a better playback experience on different mobile phone models and systems, and reduce the problems of audio and video synchronization or lag caused by high bit rates mentioned above.

Similar features are available on streaming platforms such as Alibaba Cloud and Tencent Cloud.

By addressing the data source adaptation, it’s much cheaper and more reliable than if you just do all the videos on the client side.

“Why do I need to rerequest after SEEK when my video is buffered?”

This explains the difference between caching and buffering:

Buffer: Like in the garbage, it is not possible to immediately run to the garbage dump as soon as there is garbage, but first put the garbage into the garbage can, garbage can is full and then put together to the garbage pile. Because the buffer is in memory, you can’t buffer the entire video in memory, so you usually see the buffer as a temporary chunk of data, a buffer block is constantly loading and clearing process.
Caching: The explanation of caching is much simpler, which means that the video is downloaded locally while playing, so that the data in the cache area does not need to be requested twice.

In addition, the problem caused by Seek is also introduced previously, which is actually the problem of key frames. You Seek to a position, because if there is no key frame, it is actually unable to get a complete picture, so when the video key frame is compressed too little, it will cause the frame to jump when Seek.

“Why is my video the wrong size and direction?”

Generally, the video information has rotation Angle. For example, the video recorded on an Android phone may have rotation Angle. Therefore, the rotation Angle such as 270 and 90 should be considered in layout and drawing.

In addition, the video will also have the information of Width Height Ratio, that is, the aspect Ratio. For example, this information is obtained by videoSarNum/videoSarDen in iJKPlayer. Only when the aspect Ratio is calculated together with the Width and Height of the video, To get the true display width and height.

Times the speed

As mentioned before, the audio encoding and video encoding will be decoded separately when the video is playing. At this time, they need to be combined when playing, and there will be a synchronous clock concept. For example, the audio is used as the synchronous clock, and the video picture will be played synchronously.

One of the keys to double speed playback is the synchronization clock. In this case, for example, the synchronization clock is audio, but the video is actually silent video with no audio, so setting double speed playback will not work.

In addition, if the interleaving of audio and video frames is not reasonable, for example, most of the data in the front are audio frames, and a large section is video frames, this will lead to a lot of preread data, which needs to repeatedly skip, resulting in frequent seek and lag in the playing process.

If you play video A on page A, skip to page B and play video B. When you return to page A, video A does not continue to play

By default there is only one active kernel in GSYVideoPlayer and this is one of the most frequently asked questions.

This is because there is only one active playback kernel by default.

When the B video on page B starts to play, the kernel corresponding to page A has been released. This is because it is inconvenient to manage multiple kernels at the same time and occupies too much memory. Therefore, when the VIDEO on page B plays, the kernel of page A will be released. Is to reset and call the playback interface to resume playback.

Of course, if page A and page B have the same video, for example, when the list is adjusted to the scene of the details page, the effect of seamless switching can be achieved by switching the surface of the rendering layer without releasing the kernel.

Of course, for the scenario that must have multiple kernels, GSYVideoPlayer also does the corresponding instance, such as:

Turn on a separate kernel for AD playback, which also synchronously loads video content during AD playback;
Multiple monitoring pictures can be viewed at the same time, and multiple kernels can be played at the same time.

Overall, GSYVideoPlayer is not a perfect project, because it still has a lot of problems, there are still a lot of scenarios that need to be compatible and handled by itself, but it is free, it is open source.