Basic concepts of audio technology
1. Physical properties of sound
It is well known that sound is produced by the vibration of objects. Sound is a kind of pressure wave. When playing an instrument or speaking, their vibration will cause the air to vibrate rhythmically. The density of the surrounding air will change, and thus the sound wave will be produced.
Sound waves have three elements:
- Tone: Also known as audio, the higher the frequency, the shorter the wavelength, while low frequency sounds have longer wavelengths. So it’s easier to get around obstacles. The less energy decays, the farther the sound travels;
- Volume: is the amplitude of vibration. With different strength to beat the desktop, the size of the sound is bound to change. In everyday life, we use decibels to describe the loudness of sound;
- Timbre: Different objects make different sounds at the same frequency and loudness. The shape of the waveform determines the timbre of the sound. Because the waveform produced by different media is different, it will produce different timbre.
Digital audio
In nature, sound is continuous, an analog signal, and in order to preserve it, it needs to be digitized, converted into a digital signal. Since sound is a wave, with its own amplitude and frequency, preserving a sound requires preserving its amplitude at various points in time. Digital signals do not continuously store amplitudes at all points in time, and in fact, do not need to store continuous signals to restore sound acceptable to the human ear. The digitalization of analog signals,It's the process of converting analog signals into digital signals
, includingSampling, quantization and coding. Digital audio acquisition steps are divided into the following steps: analog signal -> sampling -> quantization -> coding -> digital signal.
3. The sampling
Audio sampling is the conversion of sound from analog signals to digital signals, which can also be understood as the digitization of the signal on the time axis. Sampling rate: refers to the number of sound samples per second, which is also the number of digital samples per second. The number of samples per second taken from a continuous signal to form a discrete signal, expressed in Hertz (Hz). The higher the sampling frequency, the better the audio quality. Common audio sampling frequencies are: 8kHz, 16kHz, 44.1khz, 48kHz, etc. The sound that human ear can hear is generally between 20Hz and 20KHz. Sampling rate is usually used to ensure that the sound within a certain frequency range can be digitized successfully. For example, the sampling frequency is generally 44.1KHZ, which means that the sound will be sampled 44100 times per second.
- 8000 Hz: The sampling rate used by the telephone is sufficient for human speech.
- 11025 Hz: sampling rate used for AM broadcast.
- 22050 Hz and 24,000 Hz: sampling rates used for FM FM broadcasting.
- 44100Hz: audio CD, usually used for mpeg-1 audio (VCD, SVCD, MP3) sampling rate.
- 47,250 Hz: Sampling rate used by commercial PCM recorders.
- 48,000 Hz: The sampling rate used for digital sound for miniDV, DIGITAL TV, DVD, DAT, film, and professional audio.
The standard sampling frequency of CD music is 44.1khz, which is also the most commonly used sampling frequency between sound cards and computer operations at present. Currently, the sampling rate of popular blue ray is quite high, reaching 192kHz. With the current sound card,
Most of them can support 44.1khz, 48kHz, 96kHz, and high-end products can support 192kHz or even higher. In short, the higher the sampling rate, the better the quality of the sound file obtained, the more storage space will be occupied.
4. The quantitative
After the audio is sampled, it is necessary to perform quantization of the sample. Quantization refers to the digitalization of the signal along the amplitude axis, and the amount of data represented by each sampling point, which is expressed in bits (bits). The more bits, the finer the presentation, the better the sound quality, and of course, the more data. For example, 16bit binary signals are commonly used to store a sample, which is also called quantization level. Quantization level is an important indicator of digital sound quality, which is usually described as 24bit (quantization level) and 48KHZ (sampling rate).
- 8 bits (that is, 1 byte) can only record 256 numbers, that is, only 256 levels of amplitude can be divided.
- Sixteen bits (two bytes) can be as thin as 65,536, which is standard for CDS.
- 32 bits (that is, 4 bytes) can subdivide the amplitude into 4294,967,296 levels, which is unnecessary.
Quantization number is also called sampling number, bit depth, resolution, it refers to the continuous intensity of sound can be divided into the number of levels. The intensity of sound is graded 2 to the N. At 16-bit, it is 65535.
That’s a huge number, and a human might not be able to tell the difference in pitch intensity by one of sixty-five thousand five hundred and thirty-five. It can also be said that the resolution of the sound card, the greater its value, the higher the resolution, the stronger the ability of the sound. Here, the sampling multiple mainly refers to the intensity characteristics of the signal, while the sampling rate refers to the time (frequency) characteristics of the signal, which are two different concepts.
Audio coding
Audio encoding refers to the recording of sampled and quantized data in a certain format. The reason for coding is that the data collected and quantized is very large, which is too large for storage or network real-time transmission. There are many formats of audio encoding, and the audio raw data usually refers to pulse code modulation (PCM) data. The bit rate of PCM audio stream can be used to describe PCM data. The collected original PCM audio is generally not directly used for network transmission. The main function of audio coding is to compress audio sampling data (PCM, etc.) into audio code stream, thus reducing the amount of audio data. The essence of compression coding is to compress redundant signals. Redundant signals refer to signals that cannot be perceived by human ear, including audio signals outside the auditory range of human ear and audio signals that are covered. Commonly used formats mainly include: WMA coding; MP3 encoding. AAC coding, OGG coding and so on. AAC is a new generation of audio lossy compression technology, a high compression ratio of audio compression algorithm. Most of the audio data in MP4 video is in AAC compression format. The collected original PCM audio is generally not used for network transmission directly, but compressed into AAC by encoder, which improves transmission efficiency and saves network bandwidth.
- The index of compression encoding is compression ratio, which is usually less than 1.
- There are two kinds of compression coding algorithms: lossy compression and lossless compression.
- Lossless compression: The decompressed data can be completely restored. In commonly used compression formats. Most commonly used is lossy compression.
- Lossy compression: The decompressed data cannot be restored completely, and some information will be lost. The smaller the compression ratio, the more information will be lost and the greater the distortion of signal restoration.
We often talk about audio lossy and lossless, but lossy and lossless are relative. No matter how encoded, audio in a computer is lossy relative to the original sound in nature and cannot be completely restored.
And most of us in the population of lossless generally refers to the audio sampling rate of 44.1KHz, sampling size of 16bit, two-channel audio, lossy nature is compared to the poor quality of the audio.
channel
Sound channel refers to the sound in the recording will be played in different spatial location collection or playback of the mutually independent audio signal, so the sound channel is the number of sound sources or the corresponding number of speakers when playing.
- Mono channel: 1 channel
- Two-channel: Two channels
- Stereo channel: Two channels by default
- Stereo channel (4 channels) : 4 channels
BitRate
The number of bits (transmission rate Per Second) occupied by audio, in bits Per Second (BPS), such as 705.6 KBPS or 705600bps, where b stands for Bit and ps stands for Per Second. Represents the capacity of 705600 bits per second. Compressed audio files are often expressed in terms of multiple speeds, such as 128kbps/ 44,100 hz for cd-quality mp3s. Note that a Byte is equal to eight bits. The bit is the smallest unit and is used to describe the network speed and communication speed. The Byte is used to calculate the size of hard disks and memory. The higher the bit rate, the lower the compression ratio, the better the sound quality, and the larger the audio volume.
- Mbps: Milionbit per second;
- Kbps is Kilobit per second.
- BPS = bit per second
6. The relationship among sampling rate, sampling bit and bit rate
Ex. : Calculate the file length based on the size of a file. For example, the file length of “Windows XP boot.wav “is 424,644 bytes, and it is in “22050HZ / 16bit/stereo” format (as can be seen from its “properties -> summary”). So its transmission rate per second (bit rate, also called bit rate, sampling rate) is 22050162 = 705600(bit/s), converted into byte units 705600/8 = 88200(bytes/second), playback time: 424644(total bytes) / 88200(bytes per second) ≈ 4.8145578(seconds). However, this is not precise enough, the standard PCM format WAVE file (.wav) contains at least 42 bytes of header information, which should be removed when calculating the playback time, so :(424644-42)/(2205016*2/8) ≈ 4.8140816(s). That’s a little bit more accurate. That is: (total file size – header information)/(sampling rate * sampling bits * number of channels / 8) {that is, bit rate} ≈ file length.
Through the above process, the audio signal digitalization process is realized. Once converted to digital signals, the data can be stored, played back, copied and retrieved, and other operations.
2. Basic concepts of video technology
Video is composed of content elements, encoding formats and packaging containers.
- Content elements: include Image, Audio, and Metadata.
- Encoding format: video encoding format H264 and audio encoding format AAC.
- Container: These are common file formats such as MP4, MOV, FLV, RMVB, AVI, etc.
An image is a physical reproduction of human visual perception. The objects of three-dimensional images include depth, texture and brightness information, while two-dimensional images include texture and brightness information. We can simply understand texture as an image. Video is composed of multiple images, a group of continuous images. A basic digital video is basically formed by “acquisition – processing – display”.
1. Physical phenomena of images
Red, green and blue cannot be decomposed, so they are called primary colors. Each pixel consists of three sub-pixels. These three pixels are respectively red, green and blue, which together form a color.
2. Numerical representation of images
The two common modes of representation are RGB and YUV
RGB representation
Any image can be composed of RGB, so how to represent the sub-pixels in the pixel. It is usually expressed in the following two ways:
- Floating point: The value ranges from 0.0 to 1.0.
- The value is an integer ranging from 0 to 255 or from 0 to FF.
YUV representation
For the naked data representation of video frame, in fact, more is the representation of YUV data format. YUV is mainly used to optimize the transmission of color video signals. Compared with RGB video signal transmission, its biggest point is that it only needs to occupy a very small bandwidth (RGB requires three independent video signals to be transmitted at the same time), where “Y” stands for brightness (Luma), also known as the gray order value; “U” and “V” represent chroma, which describes the color and saturation of an image.
Image compression format JPEG, PNG and BMP
With the original data of the image (YUV or RGB data), we need to store the original data. Without compression, the space needed to save the image will be very large. YUV420 format data size of 800W pixels is 12M (326424483B*0.5). Different compression methods produce different image formats, usually this process is called encoding, encoding YUV data to JPEG means that YUV data is compressed according to the JPEG image compression standard. The compression of original data is also called encoding, and the restoration of compressed data is called decoding. Therefore, the encoding and decoding involved in image or video processing is actually the compression of original data and the restoration of compressed data. The codec process is usually completed by the corresponding codec. Different codecs implement different compression and restoration algorithms. For example, the JPEG codec implements the JPEG compression algorithm.
- JPEG
JPEG is also known as JPG (since the 8.3 naming rules adopted by early systems such as DOS and Windows 95 only supported extensions up to 3 characters long, and.jpg was used for compatibility. JPEG is also one of the most widely used image formats. It uses a special lossy compression algorithm to remove colors that are not easily visible to the human eye.
- PNG
PNG: Using a lossless data compression algorithm derived from LZ77. Its support supports a transparent effect that makes the edges and backgrounds of images appear smoother.
- BMP
BMP: a standard bitmap format in Windows. The format is not compressed, resulting in large image files. Other SVG, GIF, webP and so on are all image formats, which will not be expanded here.
Although YUV is the naked data of video frames, YUV data cannot be directly used for rendering. An image can be rendered on the display screen only after it is turned into RGB. YUV is mainly applied to optimize the transmission of color video signals.
3. Video-related concepts
The Frame Frame
Frame is a common basic concept of video development, which can be expressed as a picture. A video is essentially composed of many pictures. Therefore, a video is composed of many video frames. A simple understanding of the frame is every picture in the video or animation, and video and animation effects are composed of countless pictures, each picture is a frame. In the compression processing of video data, each frame represents a picture. Since the pictures of the two frames before and after the video are very similar, the picture data of the previous frame can be compressed or decompressed. According to the different reference frames, it can be divided into I frame, P frame and B frame.
- I Picture :(Intra Picture) an Intra Picture, also known as a key frame, can be decoded without reference to other frames.
- P Frame :(Predictive Frame) forward Predictive coding Frame, that is, the previous Frame is required for decoding.
- B frame :(bi-directional interpolated prediction frame) bi-directional prediction frame. In order to decode, it is necessary to refer to both the previously encoded frame and the encoded frame behind the image sequence.
GOP, Group of Picture, refers to the interval between two I frames.
Frame rate
Frame rate is the number of frames per unit of time. The unit is frames per second or FPS (Frames per second). The higher the frame rate, the more images that can be played per unit of time, the smoother the picture and the more natural the transition. For example, 30 frames of a video is the video frame rate, which is 30 frames per second; Video frame rate affects the sense of smoothness, that is to say, the higher the frame rate, the smoother the picture. Of course, the higher the video frame rate, the more pictures, the corresponding size of the video file will also increase, the storage space will increase a lot. For example, frame rate:
- 24/25 FPS: a typical movie frame rate of 24/25 frames per second.
- 30/60 FPS: 30/60 frames per second, the game’s frame rate (e.g. 30fps in King normal mode, 60fps in High mode).
The frame rate is related to the smoothness of the video, the higher the frame rate, the smoother the video, and the lower the frame rate, the video will freeze. It is common to set the frame rate to 25 FPS because when the frame rate is greater than 24 FPS, the image is continuous to the human eye. Frame rate = Frames/Time (f/s, Frames per second, FPS)
Frame rate is a measure of the number of frames displayed. It is measured in “Frame per Second” (FPS) or “Hertz” (Hz). FPS is generally used to describe how many frames are played per Second for video, electronic graphics or games.
The refresh rate
The number of times a screen (hardware) is refreshed per second, measured in Hertz (Hz). The higher the refresh rate, the more stable the image and the more natural and smooth the image display. For example, in a game where you can get more than 200 frames per second (FPS), but because the monitor has a refresh rate of 30Hz, you can only “grab” 30 of them and show them. You end up with 30 frames and 170 invalid frames.
The resolution of the
Resolution = horizontal pixel value * vertical pixel value of the picture. Common is 1280×720 (720P), which means 1280 pixels horizontally and 720 pixels vertically. For fixed display width and height, the higher the resolution, the larger the image size, the clearer; The lower the resolution, the smaller the image size, if the fixed width is higher than the image resolution, then the image will become blurred.
1280×720 (720p) or 1920×1080 (1080p) is the resolution of the video. The resolution affects the image size, which is proportional to the image size: the higher the resolution, the larger the image; The lower the resolution, the smaller the image. In theory, the higher the resolution is, the clearer the video will be. However, in actual movie and video files, 1080P files are 1GB, 3GB, 4GB, and 10GB. They certainly have different levels of clarity. To explain this phenomenon, we need to introduce the concept of bit rate.
Bit rate
BPS, Bits per second. Refers to the number of bits of data transmitted per second. Common units are KBPS (k bits per s) and MBPS (m bits per s). The higher the frame rate, the greater the number of frames transmitted per second; The larger the resolution, the larger the content size of each frame; So the higher the frame rate, the higher the resolution, the higher the bit rate. Bit rate refers to the number of bits of continuous media played per unit of time. It’s usually in kilobytes per second or thousands of bits per second. The higher the bit rate, the more bandwidth is consumed. The bit rate affects the volume and is directly proportional to the volume: the larger the bit rate, the larger the volume; The smaller the bit rate, the smaller the volume. Bit rate also affects sharpness, and the higher the bit rate, the higher the sharpness at the same resolution. It should be noted that when the bit rate exceeds a certain value, the impact on the image quality becomes smaller. Almost all encoding formats focus on how to achieve the minimum distortion with the lowest bit rate. Generally speaking, a video file consists of picture and sound, and both audio and video have different bit rates. That is, the audio and video in the same video file have different bit rates. The video file bit rate size usually refers to the sum of audio and video bit rate in the video file. The higher the bit rate is, the larger the data will be transmitted, and the better the sound and picture quality will be. In the field of video, bit rate is often translated as bit rate.
3. Video coding
Why encode audio and video?
In the computer world, everything is made up of zeros and ones, and audio and video data are no exception. Due to the huge amount of audio and video data, if it is stored according to raw stream data, it will consume very large storage space and is not conducive to transmission. In fact, audio and video contain a lot of repeated data of 0 and 1, so we can compress the data of 0 and 1 through certain algorithms. Especially in video, because the picture is gradually transitioned, the whole video contains a lot of picture/pixel repetition, which provides a very large compression space. Significance of coding: coding can greatly reduce the size of audio and video data, so that audio and video storage and transmission easier. The core of coding: it is to remove video or audio redundant information.
The process of video codec refers to the process of compression or decompression of digital video. There are many standards for compression coding. At present, the most widely used video encoding and decoding methods are H.26X series and MPEG series. Its advantages are low bit rate, high image quality, strong fault tolerance and stronger network adaptability, and it has been widely used in real-time video applications.
Video encoding format
Video Coding Format (Video Coding Format), also known as Video Coding specification, Video compression Format. Common ones are H.264 and H.265. Because the original video and audio data is very large, it is not convenient to store and transmit, so the original video and audio are compressed by means of compression coding. It defines the specification for the storage and transmission of video data. There are many video coding formats, such as H26x series and MPEG series of coding, these coding formats are to adapt to the development of The Times and appear. Among them, The H26x (1/2/3/4/5) series is dominated by the International Telecommunication Union (ITU). The MPEG (1/2/3/4) series is dominated by the MPEG (Moving Picture Experts) Group, an organization under ISO). Of course, they also have a joint coding standard, which is now the dominant encoding format H264. Here are some common video encoders:
- H.264/AVC
- HEVC/H.265
- VP8
- VP9
- FFmpeg
Note: Audio encoders include Mp3, AAC and so on.
H.264
Principle: Grouping. Each frame in the group can be cross-referenced to remove redundant information. In H.264, an image consists of a frame, a top field and a bottom field, and a complete image is a frame. When video signals are collected, if progressive scanning is used, the signal obtained from each scan is an image, that is, a frame. If interlaced (odd and even lines) is used, the scanned frame is divided into two parts. Each part is called the field and divided in order: the top field (also known as the even field) and the bottom field (also known as the odd field). The concepts of frame and field lead to different coding methods: frame coding and field coding. Progressive scanning is suitable for moving images, so frame coding is better for moving images. Interlacing scan is suitable for non-moving images, so field coding is more ideal for non-moving images. In addition, each frame can be divided into multiple slices, each of which is made up of macro blocks, and each of which is made up of sub-blocks. H.264 is a high performance video encoding and decoding technology. It is a new digital video coding standard jointly developed by the joint video Group jointly established by ITU and ISO. Now that we’ve talked about the advantages of H.264’s coding technology, let’s look at the key technologies involved in H.264: The first thing we need to know is that whether video or audio is encoded, the purpose is compression. The purpose of video coding is to extract redundant information, including spatial redundancy, temporal redundancy, coding redundancy, visual redundancy and knowledge redundancy. Based on this, h.264’s compression technology involves:
- Intra-frame predictive compression solves the problem of spatial data redundancy. Spatial redundant data is the data in the graph that contains many colors and lights in the wide and high space, which is difficult to be detected by the human eye. For these data, we can just compress them.
Intra-frame compression corresponds to I frames — key frames. So what is an I-frame? A classic example from an online tutorial is that if the camera is pointing at you, very little actually happens to you in a second. Generally, a camera will capture dozens of frames of data per second, such as animation, which is 25 frames /s, and generally video files are around 30 frames /s. For those that change very little for a set of frames, the first frame is saved in its entirety for easy compression. The I frame is particularly critical because it is impossible to decode data without this key frame.
- Interframe predictive compression solves the problem of time data redundancy. In the above example, the data captured by the camera does not change significantly in a period of time, so we compress the same data in this period of time, which is time data compression.
Interframe compression corresponds to P frames and B frames. A P frame is a forward reference frame and only refers to the previous frame when compressed. B frame is a bidirectional reference frame, which refers to the previous frame as well as the next frame during compression.
- Integer discrete cosine transform (DCT) transforms spatial correlation into frequency domain independent data and then quantizes it.
- CABAC compression: lossless compression.
In addition to the key techniques mentioned above, there are several important concepts to know about H.264:
- GOF: a set of frames, that is, one I frame to the next I frame, this set of data. Including B frames /P frames, which we call GOF.
- SPS and PPS: SPS and PPS are the parameters of GOF. SPS is the number of stored frames, the number of reference frames, the size of decoded images, the identification of frame field coding mode selection, etc. PPS is the identifier for storing entropy encoding mode selection, number of slices, initial quantization parameters and filtering coefficient adjustment.
When decoding the video, we first receive SPS/PPS data before receiving a set of frame GOF. Without this set of parameters, we cannot decode the video.
Therefore, if an error occurs during decoding, first check for SPS/PPS. If not, check whether the peer end did not send it or whether the peer end was lost in the process of sending it.
More detailed H.264 coding principle is not introduced here, we are interested in the Internet to look up information, such as: macroblock grouping, macroblock search, intra-frame prediction, DCT compression and H.264 code flow structure knowledge.
Video coding is mainly to remove redundant information, with inter-frame coding and intra-frame coding technology, intra-frame coding is independent of other pictures can be decoded, and inter-frame decompression requires a reference frame (key frame), equivalent to doing difference.
Audio and video packaging
A video file usually contains audio data and video data. A video file contains two data streams, one video stream and one audio stream. A video with Chinese-English switchover has two audio streams. The process of putting compressed video and audio data into a file in a certain format is called encapsulation, and the encapsulation format that carries the data is usually called a container. For example, the MP4 container has a video stream from H264 and an audio stream from AAC. Common video packaging formats include Mp4, Flv and AVI. When it comes to video formats, the first thing that comes to mind may be mp4, 3GP, AVI, FLV, RMVB, MKV and other common video file formats in daily life. These formats are really just video encapsulation formats. There are actually two types of video formats: encapsulated and encoded.
1. Audio and video packaging format
Encapsulation is equivalent to a container for storing video information. The video we watch includes audio and video parts (sometimes including subtitles). H.264 is the video encoding, aaC and MP3 are the audio encoding, and h.264 and MP3 are encapsulated according to MKV packaging standards. So what we see is a video file in MKV format. The package format does not affect the quality of the image, it is only responsible for the internal video and audio tracks integrated together, and does not affect the video and audio tracks. Common video packaging formats:
- AVI format (suffix:.avi)
- Dv-avi format (suffix.avi)
- QuickTime File Format (.mov)
- MPEG format (the file suffix can be.mpg.mpeg.dat. Vob.asf.3gp.mp4, etc.)
- WMV format (suffix.wMV.asf)
- Real Video format (suffix.rm.rmVB)
- Flash Video format (.flv)
- Matroska format (suffix.mkv)
- Mpeg2-ts format (suffix.ts)
Package format:
Video encapsulation format can be regarded as a container containing video, audio, video codec and other information. A video package format can support multiple video codec methods. For example, QuickTime (.mov) supports almost all video codec methods, and MPEG (.mp4) supports most video codec methods.
On PCS, we use it all the time. MOV video file. With the introduction above, we know that the file format of this video is. MOV, the encapsulation Format is QuickTime File Format, but we cannot know its video codec mode. H.264/MOV is a QuickTime File Format in which a video is stored in a QuickTime File Format. MOV, encoded in H.264.
A video file usually contains audio data and video data. A video file contains two data streams, one video stream and one audio stream. A video with Chinese-English switchover has two audio streams. The process of putting compressed video and audio data into a file in a certain format is called encapsulation, and the encapsulation format that carries the data is usually called a container. For example, the MP4 container has a video stream from H264 and an audio stream from AAC. Common video packaging formats include Mp4, Flv and AVI.
AVI
AVI English full name Audio Video Interleaved, that is, Audio Video Interleaved format, is a container format introduced by Microsoft as part of its Windows Video software. AVI files contain audio and video data in a file container, allowing simultaneous playback of audio and video.
MP4
MP4 (MPEG-4 Part 14) is a Part of MPEG-4. It is thought that any form of data can be embedded in it, but most of our common MP4 files store AVC (H.264) or MPEG-4 (Part 2) encoded video and AAC encoded audio.
FLV
FLV is known as Flash Video. A FLV file consists of a FLV Header and multiple FLV bodies. A FLV body consists of a previousTagSize and a tag, which can be classified as video, audio or scripts. Each tag contains audio or video data.
TS
The full name of TS is Transport Stream, belonging to MPEG2-TS. The mPEG2-TS format is characterized by the requirement that it can be decoded independently from any segment of the video stream. Therefore, a common TS file is a small segment of a complete video, which consists of all TS files to form a complete video.
2. Streaming media protocol for audio and video data transmission
RTMP, HLS, and HTTP-FLV are live streaming protocols.
RTMP
RTMP stands for Real Time Messaging Protocol. RTMP is a protocol that works on top of TCP and uses port 1935 by default. The basic data units in the protocol become messages, which are broken down into smaller message blocks during transmission. Finally, the segmented message block is transmitted through TCP protocol, and the receiving end decomparts the received message block to restore the streaming media data
HLS
HLS is Apple’s live streaming protocol, which uses video streams sliced into file fragments to broadcast live. The first request will be an M3U8 file, which will have different bit rate streams, or directly a list of TS files, through the given TS file address to play in turn. While live, m3U8 files are constantly requested to check the TS list for new TS slices. Simply speaking, THE HLS protocol uses the video encapsulation format is TS. In addition to the TS video file itself, an M3U8 file (text file) is defined to control playback. M3u8 is the Unicode version of M3U, encoded in UTF-8. Both “M3U” and “M3U8” files are the basis of HLS protocol format. M3u8 file is essentially a playlist, and its internal information records a series of media snippets. Multimedia resources can be displayed by playing these snippets in sequence. The total duration of the entire video is the sum of the duration of all the.TS slices. HLS playback process when downloading M3U8 content
HTTP-FLV
Http-flv encapsulates streaming media data into FLV format and transmits the data through HTTP. It relies on the characteristics of MIME to select the corresponding program to process the corresponding Content according to the content-Type in the protocol, so that the streaming media can be transmitted through HTTP. A multimedia file or stream may contain multiple videos, audio files, subtitles, synchronization information, chapter information, and metadata. So mp4, avi,.rmVB files are a container format that encapsulates the data, not the encoding format of the video.
Reference article: Bit rate, resolution, OpenGL ES 2.0 GLSurfaceView (OpenGL ES 2.0) Basic knowledge (hard decoding) Android Audio and Video Playback: Audio and Video synchronization Veteran driver takes you into an in-depth understanding of mobile live broadcast technology basics audio and video technology from scratch – audio basics