First, audio and video collection

slightly

Two, pretreatment

  • Image processing library GPUImage(rightOpenGLOC package), provides rich preprocessing effects, can also use the library custom design.

Three, coding

What you need to know about streaming
  • Streaming media development: Network layer (Socket) is responsible for transmission, protocol layer (RTMP or HLS) is responsible for network packaging, packaging layer (FLV, TS) is responsible for the encapsulation of codec data, coding layer (H.264 and AAC) is responsible for image, audio compression.
  • frame: Each frame represents a still image
  • GOP(Group of Pictures) : A GOP is a Group of continuous Pictures, each picture is a frame, and a GOP is a collection of many frames
    • The data of live broadcast is actually a group of pictures, including I frame, P frame and B frame. When the user watches it for the first time, he will look for I frame, and the player will find the latest I frame from the server and feed back to the user. Therefore, GOP Cache adds end-to-end latency because it must fetch the latest I frame
    • The longer the GOP Cache, the better the picture quality
  • Bit rate: The amount of data displayed per second after the image is compressed.
  • Frame rate: Number of images displayed per second. Affect the smoothness of the picture, is proportional to the smoothness of the picture: the larger the frame rate, the smoother the picture; The smaller the frame rate, the more jumpy the picture.
  • The resolution of the(rectangle) The length and width of the picture, i.e. the dimensions of the picture
  • Data volume per second before compression: Frame rate X resolution (should be bytes)
  • Video package format: a container for storing video information. Streaming encapsulation can be TS, FLV, etc. Indexed encapsulation can be MP4,MOV,AVI, etc.
    • Main functions: A video file usually contains images and audio, and some configuration information (such as the relationship between images and audio, how to decode them, etc.) : these contents need to be organized and packaged according to certain rules.
    • Note: It will be found that the package format is the same as the file format, because the suffix of the general video file format is the name of the corresponding video package format, so the video file format is the video package format.
encoding
  • Hard coding: use graphics card GPU, dedicated DSP chip to encode
  • Soft coding: using CPU coding, mobile phone is easy to heat
Coding standards
  • Video coding: H.265, H264, VP8, VP9, etc
  • Audio encoding: AAC, Opus
How big is the uncompressed video data?
  • The length: 1 minute,The resolution of the: 1280 * 720Frame rate: 16-frame video size: 1280 x 720 x 16(frames) x 60(seconds) / 1024/1024 = 843.75m
Why can video be compressed and encoded?
  • Redundant information exists
    • Spatial redundancy: Strong correlation between adjacent pixels of an image. It means that there are many pixels in the same image that represent the same information, for example, a pure white image. If each pixel is stored separately, space will be wasted
    • Temporal redundancy: Similar content between adjacent images of a video sequence. It means there’s a lot of correlation between the captain pictures
    • Visual redundancy: The human visual system is insensitive to certain details. The human visual system is insensitive to high frequency information (Discard high frequency information and encode only low frequency information), more sensitive to high contrast (Improve the subjective quality of edge information), more sensitive to brightness information than chromaticity information (Reduce chroma resolution), more sensitive to exercise information (Special treatment for ROI)

Compression coding standards

  • H.26X series (byITU International Telecommunication UnionThe dominant)
    • H.261: Mainly used in older video conferencing and video telephony products
    • H.263: Mainly used for video conferencing, video calling, and network video
    • H.264: H.264/MPEG-4 Part X, or AVC(Advanced Video Coding), is a Video compression standard, a widely used high-precision Video recording, compression, and publishing format.
    • H.265: The current trend of development is not mature enough to have popular coding software
  • MPEG(developed by MPEG, part of ISO)
    • Mpeg-1 Part 2: Mainly used in VCDS
    • Mpeg-2 Part II: Equivalent to H.262
    • Mpeg-4 Part 2: Use for network transport, broadcasting, and media storage

H.264(currently the most widely used)

  • Three types of frames are defined in the H.264 protocol
    • I frame (key frame): Fully coded frame, retain a complete picture, decoding only need this frame data can be completed (because it contains the complete picture)
    • P frame (differential frame): Refer to the previous I frame to generate the frame that only contains the encoding of the difference part, keep the difference between this frame and the previous frame, and use the previous cached frame to overlay the difference defined in this frame during decoding to generate the final picture. (P frame does not have complete picture data, only the data that is different from the picture of the previous frame)
    • B frame (bidirectional differential frame): Retain the difference between this frame and the frame before and after, decoding B frame, not only to obtain the previous cache picture, but also to decode the picture after, through the data of the frame before and after the overlay of the picture and the frame to obtain the final picture. B frame compression rate is high, but the CPU will be more tired when decoding
  • The core algorithm used by H.264 isFrame compressionandInterframe compression
    • Frame compressionIt’s the algorithm that generates I frames
    • Interframe compressionIt’s the algorithm that generates B frames and P frames
  • H.264 compression method
    • Grouping: Several frames of images are divided into a group (GOP, that is, a sequence). In order to prevent motion changes, the number of frames should not be too many
    • Frame definition: each frame image in each group is defined as three types, namely I frame, P frame and B frame
    • Prediction frame: I frame is used as the basic frame, I frame predicts P frame, and then I frame and P frame predict B frame.
    • Data transmission: Finally, the difference between I frame data and predicted value information is stored and transmitted.
  • H.264 layered design
    • Layered design:
      • The H.264 algorithm is conceptually divided into two layers: the video coding layer (VCL: Video Coding Layer) responsible for efficient presentation of Video content; Network Extraction Layer (NAL: Network Abstraction Layer) is responsible for packaging and delivering data in the proper way required by the Network.
      • Thus, efficient coding and network friendliness are respectively determined byVCLNALDo it separately.
    • NALEncapsulation mode:
      • NALIs to write each frame of data to a NAL unit (NALU) for transmission and storage
      • NALUDivided intoNALU headNALU body
      • NALU headUsually in the form of00 00 00 01, as a new oneNALUStart identifier of
      • NALU bodyEncapsulate theVCLEncoded information and other information
    • Encapsulation process
      • I frames, P frames, and B frames are packaged into one or more NALUs for transmission and storage
      • I frame also has a non before it startsVCLtheNALUnit for holding additional information, such as:PPS,SPS
      • PPS(Picture Parameter Sets): Image collection set
      • SPS(Sequence Parameter Sets): Sequence collection set

The iOS software coding

Soft coding mainly uses CPU to carry on the coding process, the specific general use FFmepg or X264

  • FFmepg
    • FFmepg is a very powerful audio and video processing library, includingVideo capture function, video format conversion,Video capture,Video watermarking ‘, etc
    • FFmepg is developed on the Linux platform, but it can also compile and run in other operating system environments
  • X264
    • H.264Is the VIDEO coding standard developed by ITU, and X264 is an open source H.264/MPEG-4 video coding function library, is the best lossy video encoder, which integrates many excellent video coding algorithms. (http:blog.csdn.net/leixiaohua1020)

Four, transfer,

  • From the push stream to the server
    • The data is collected and preprocessed by the push-stream end, and then encoded and streamed to the server end
    • Common protocols for stream transmission include RTMP, RTSP, and HLS
  • Build nginx+ RTMP to push the stream

Five, decoding playback

  • After the audio and video data is obtained by the pull stream, it needs to be decoded by the decoder before rendering can be played in the player.
  • Specific steps:
    • Protocol solution: Remove some useless information during network transmission
    • Unpack: Get audio and video together in a package file
    • Audio and video decoding: Audio and video are compressed and encoded content, which can only be played after decoding
    • Audio and Video synchronization: Video and audio must be played simultaneously
    • Audio and video playback: audio and video playback by sound card & video card