start

I have been doing the development of live broadcast before, but I have not had the opportunity to contact with some push-stream and player technology, so I wrote a prototype of the push-stream module in my spare time.

I also summarized some common knowledge:

process

modules

I will push the flow as five pieces of knowledge, respectively: collection, processing, coding, packet, push flow.

Acquisition module

Capture is the transfer of images and videos acquired by the machine into a frame of image data returned to the developer.

The following is the collection module some knowledge

Yuv, RGB, YCBCR color space model

Hard knowledge, please refer to: blog.csdn.net/u010186001/…

IOS Collecting Knowledge points

Image data collected by hardware on iOS will be saved to the user using CMSampleBufferRef. CMSampleBufferRef structure is as follows: (audio: replace CVPixelBuffer with CMBlockBuffer in structure), CVPixelBuffer is RGB/YUV/YCBCR data.

GPUImage knowledge

Source: github.com/BradLarson/…

GPUImage is an open source framework for image and video processing based on OpengL on IOS. It has a large number of filters, and it is also very convenient to add your own Filter Filter on the original basis. All filters are based on OpengL Shader, so the image processing of Filter effect is realized on GPU. Processing efficiency is relatively high.

GPUImage is a chain structure consisting of a GPUImageOutput interface and a GPUImageInput Protocol. GPUImageInput Input Texture, which is responsible for the entire chaining of image data. Camera, Stillimage and other image and video sources inherit from GPUImageOutput, Filters inherit from GPUImageOutput and achieve GPUImageInput, View, FileWriter and other Outputs implement GPUImageInput.

Processing module

OpenGL ES address: www.khronos.org/opengles/

The Metal address: developer.apple.com/documentati…

Metal is a low-level graphical programming interface similar to OpenGL ES, using the related API to directly operate the GPU. It was first released at WWDC in 2014, and was released this year with Metal 2, a son of Apple.

The Metal framework supports GPU hardware acceleration, advanced 3D graphics rendering, and big data parallel computing. Advanced and streamlined apis are provided to ensure fine-grain and low-level control over organizational structure, program processing, graphical presentation, operational instructions, and management of instruction-related data resources. The core goal is to reduce CPU overhead as much as possible, leaving most of the load generated at runtime to the GPU.

(We need to continue to study Metal in the future)

GPUImageFilter

Source: github.com/BradLarson/…

You can refer to: www.jianshu.com/p/468788069…

Coding knowledge

H264 is the most widely used format

The I frame

In-frame encoding frame, I frame represents the key frame, which you can understand as the complete preservation of the frame; The decoding only needs the data of this frame (because it contains the whole picture).

Intra-frame coding is used to reduce the spatial redundancy of the image. In order to improve the efficiency of encoding within H.264 frames, the spatial correlation of adjacent macro blocks is fully utilized in a given frame. Adjacent macro blocks usually contain similar properties. Therefore, when a given macro block coding, the first can be predicted according to the macro block (typically based on the upper left corner of macro block, the macro block, and the above macro block, because this macro blocks have been coding processing), and then to encode the difference with the actual and estimated values of, so, relative to the directly to the frame coding, can greatly reduce the bit rate.

H.264 provides 9 modes for 4×4 pixel macro block prediction, including 1 DC prediction and 8 directional prediction. Adjacent block in the diagram, A to I, A total of nine pixels has been coded, can be used to predict, if we choose pattern 4, so, A, b, c, d4 pixels has been forecast for equal value, and E, E, f, g, h4 pixels has been forecast for equal value, and f for little space information is contained in the image smooth area, H.264 also supports 16×16 intra-frame encoding.

I-frame features:

  1. It is a full-frame compressed encoded frame. The whole frame image information is compressed, encoded and transmitted by JPEG.
  2. When decoding, only I frame data can be used to reconstruct the complete image;
  3. I-frames describe the details of the image background and the moving subject;
  4. I-frames are generated without reference to other frames;
  5. Frame I is a reference frame for frame P and frame B (its quality directly affects the quality of all subsequent frames in the same group);
  6. I-frame is the base frame (the first frame) of the frame group GOP. There is only one I-frame in a group.
  7. I-frames do not need to consider motion vectors;
  8. I frame occupies a large amount of data information.

P frame

Forward predictive coding frame. P frame represents the difference between this frame and a previous key frame (or P frame). During decoding, the differences defined by this frame should be superimposed on the previous cached frames to generate the final frame. Prediction and reconstruction of P-frame: p-frame takes i-frame as the reference frame, and finds out the predicted value and motion vector of “certain point” of P-frame in i-frame, and transmits the predicted difference and motion vector together. At the receiving end, the predicted value of “certain point” of P frame is found out from I frame according to the motion vector, and the sample value of “certain point” of P frame is obtained by phasing it with the difference value, thus the complete P frame can be obtained.

P frame features:

  1. P frame is a coded frame separated by 1~2 frames behind I frame.
  2. P frame adopts motion compensation method to transmit the difference between it and the previous I or P frame and motion vector (prediction error);
  3. In decoding, the predicted value of i-frame and the predicted error must be summed up to reconstruct the complete P-frame image.
  4. P frame belongs to the forward prediction interframe coding. It only refers to the i-frame or p-frame closest to it;
  5. The P frame can be the reference frame of the P frame behind it, or the reference frame of the B frame before and after it.
  6. Because p-frame is a reference frame, it may cause the spread of decoding errors.
  7. Because of the differential transmission, the compression of P frame is higher.

B frame

Bidirectional predictive interpolated coded frames. B frame is two-way difference frame, that is, B frames are recorded before and after this frame and the frame difference (specific is more complex, there are 4 kinds of circumstances, but I say so simple), in other words, to decode B frame, not only to get the cache before the picture, even after decoding, through the picture with the frame of the data before and after stacking the images. B frame compression rate is high, but the CPU will be tired when decoding. Therefore, mobile terminals generally do not decompress B frames.

Prediction and reconstruction of B Frame I or P frame in front of B frame and P frame in back of B frame are used as reference frame, “find out” the predicted value and two motion vectors of “certain point” of B frame, and take the predicted difference and the motion vectors for transmission. According to the motion vector, the receiver “finds out (calculates)” the predicted value in the two reference frames and sums it with the difference to obtain the sample value of “certain point” of B frame, thus obtaining the complete B frame.

B frame characteristics

  1. B frame is predicted by the I or P frame in front and the P frame in the back;
  2. B frame transmits the prediction error and motion vector between it and the preceding I or P frame and the following P frame;
  3. B frame is a bidirectional predictive coding frame;
  4. The compression ratio of B frame is the highest, because it only reflects the change of the moving subject between c reference frames, and the prediction is more accurate.
  5. B frame is not a reference frame and will not cause decoding errors to spread.

VideoToolBox

Support for iOS8 and above, hardcoded library.

VideoToolBox:developer.apple.com/documentati…

FFmpeg

For the time being, iOS supports soft coding, but it can be used in a variety of formats and models.

Thrust knowledge point

Rtmp protocol

Is an Adobe protocol that is not fully exposed over TCP.

Advantages: Low latency, normally around 3s. It also supports encryption. Disadvantages: iOS and Android need other supported players to play, H5 can not directly play Rtmp protocol, so many H5 will also use HLS to play. Another point is that Rtmp is based on THE TCP protocol, so Rtmp will accumulate delays, so when the accumulated delay needs to be cleared.

Protocol: www.adobe.com/devnet/rtmp…

HLS protocol and HTTP protocol

Advantages: Simple protocol and high performance. Disadvantages: High delay, and HLS delay is determined by block size.

Demo

Github:github.com/KoonChaoSo/…