RTMP protocol and source code parsing (2)

The background,

Real-Time Messaging Protocol (RMS) is the main Protocol for live broadcasting. It is an application-layer private Protocol designed by Adobe to provide audio and video data transmission service between Flash player and server. RTMP protocol is the basic live broadcast push-and-pull protocol common to the direct broadcast services of various cloud manufacturers at present. With the development of domestic live broadcast industry and the arrival of 5G era, it is also a basic skill for us programmers to have a basic understanding of RTMP protocol.

This paper mainly describes the basic ideas and core concepts of RTMP, and supplemented by LiveGo source code analysis, and we deeply learn the most core knowledge points of RTMP protocol.

Second, RTMP protocol characteristics

The main characteristics of RTMP protocol are multiplexing, subcontracting and application layer protocols. These features are described in detail below.

2.1 Multiplexing

Multiplexex refers to the simultaneous transmission of multiple signals by the signal sender through a channel, and then the signal receiver combines the multiple signals transmitted through a channel respectively to form an independent and complete signal information, so as to use the communication line more effectively.

In short, the Message is divided into one or more chunks over a TCP connection. The ChunkStream consists of several chunks of the same Message. The basic idea behind multiplexing is to combine the chunks of the ChunkStream and restore it to a single Message.

The figure above is a simple example. Suppose we need to send a 300-byte Message. We can split it into three chunks, each of which can be divided into a Chunk Header and a Chunk Data. In the Chunk Header we can mark some basic information in the Chunk, such as the Chunk Stream ID and Message Type; The Chunk Data is the original information. In the figure above, the Message is divided into 128+128+44 =300, so that the whole Message can be transmitted.

The format of the Chunk Header and the Chunk Data will be explained in more detail in a later article.

2.2 the subcontract

The second big feature of the RTMP protocol is subcontracting, which is a feature of RTMP compared to the RTSP protocol. With common business application layer protocols (e.g. : RPC protocol) is not the same, in the case of the multimedia network transmission, the vast majority of multimedia transmission of audio and video packets are relatively large, on the reliable transport protocol TCP for large packet transmission, is likely to be blocking the connection, leads to higher priority information cannot be passed, the subcontract and transmission is in order to solve this problem appeared, Specific subcontracting formats will be introduced below.

2.3 Application Layer Protocol

The final RTMP feature is the application layer protocol. RTMP protocol is implemented by default based on the transport layer protocol TCP, but in the official document of RTMP, only the standard data transmission format specification and some specific protocol format specification are given, and there is no specific official complete implementation, which gives birth to a lot of other related industry implementations. For example, RTMP over UDP and other related private adapted protocols appear, which gives people more expandable space, and it is convenient for people to solve the problems such as live broadcast delay existing in native RTMP.

Three, RTMP protocol analysis

As an application-level protocol, RTMP, like other proprietary transport protocols (such as RPC), also has some concrete code implementations, such as nginx-RTMP, livego, and SRS. In this paper, LiveGo, an open source live streaming server implemented based on GO language, was selected to analyze the main process at the source level, and to deeply learn the realization of the core process of RTMP push and pull flow with everyone, so as to help everyone have an overall understanding of the RTMP protocol.

Before the source code analysis, we will help you to have a basic understanding of the format of the RTMP protocol by analogy with the RPC protocol. First of all, we can look at a relatively simple but practical RPC protocol format, as shown in the figure below:

We can see that this is a data transfer format that is used during RPC calls. The basic purpose of this format is to solve the problem of “pasting and unpacking”.

The following is a brief description of the format of the RPC protocol in the figure. Firstly, the protocol is marked with two bytes, MAGIC, to represent the MAGIC number, marking that the protocol is recognized by both sides. If the two bytes received are not 0xBABE, the packet will be dropped directly. The second sign takes up one byte, low four types of said message request/response/heartbeat, such as json serialization type four said, hessian, protobuf, kyro, etc. The third status occupies one byte and represents the status bit. Eight bytes are then used to represent the requestId of the call, and generally the lower 48 bits (2 to the 48th power) are sufficient to represent the requestId; Then, the body size with a fixed length of 4 bytes is used to represent the body Content. In this way, the complete request object of the RPC Message can be quickly resolved.

By analyzing the above simple RPC protocol, we can actually find a good idea, is the maximum efficient use of bytes, that is, the use of the smallest byte array to transfer the most data information. A small byte can bring a lot of information, after all, a byte can have 64 different variations. In the network, if only need to use a byte to be able to convey a lot of useful information, then we can use extremely limited resources to get the maximum utilization of resources. The official document of RTMP appeared in 2012. Although from the current perspective, RTMP protocol implementation is very complex, and even a little bloated, it can have a relatively advanced idea in 2012, which is indeed a model for us to learn from.

In today’s era of WebRTC protocol, we can also see the shadow of RTMP from the design and implementation of WebRTC, the above RPC protocol we can think of as a simplified version of design with similar design concept with RTMP.

3.1 RTMP Core Concepts

Before analyzing the RTMP source code, we first explain several core concepts of the RTMP protocol in detail, so that we can have a basic understanding of the whole RTMP protocol stack on a macro level. And during the analysis of the source code in the following paper, we will also help us to analyze the relevant principles more intuitively by capturing packages.

First of all, the entity object that RTMP actually transfers is the Chunk, just like the format of the previous RPC protocol. A Chunk consists of two parts, the Chunk Header and the Chunk Body, as shown in the figure below.

3.1.1 the Chunk Header

This part of the Chunk Header is different from the RPC protocol we talked about earlier, mainly because the length of the Chunk Header in RTMP protocol is not fixed, why not fixed? In fact, or Adobe to save data transfer overhead. From the example of splitting a 300-byte Message into three chunks, we can see that multiplexing has one obvious drawback, which is that we need a Chunk Header to mark the basic information of the Chunk. This is the cost of an extra byte stream transfer at the time of transmission. Therefore, in order to ensure the minimum number of bytes transferred, we need to constantly squeeze the size of the RTMP Header to ensure that the size of the Header is minimum, so as to achieve the highest transmission efficiency.

First let’s look at the Basic part of the Chunk Header. The length of the Basic Header is variable. It can be 1 byte, 2 bytes, or 3 bytes, depending on the Chunk Stream ID (abbreviation: CSID).

The CSID supported by the RTMP protocol ranges from 2 to 65599, with 0 and 1 being protocol reserved values and not available to the user. The Basic Header contains at least 1 byte (low 8 bits), and its length is determined by this 1 byte, as shown in the figure below. The two bits higher is reserved for FMT, which determines the format of the Message Header, as we’ll see later. The lower 6 bits of this byte is the value of CSID. When the lower 6 bits CSID value is 0, it means that the real CSID value is too large to be represented by 6 bits, and a subsequent byte is needed. When the lower CSID value of 6 bits is 1, the true CSID value is too large to be represented by 14 bits, and a subsequent byte is needed. As a result, the length of the entire BASIC Header does not look fixed, depending entirely on the CSID value of the lower 6 bits of the first byte.

In practice, we don’t use that many CSIDs, which means that the Basic Header is usually one byte long and the CSID ranges from 2 to 63.

The author of the RTMP protocol, who is very busy with the RTMP, has designed the Chunk Header module to be dynamically sized again. In short, in order to save transmission space, it is convenient to understand that the length of the Chunk Message Header can also be divided into four cases, which is determined by the value of FMT mentioned above.

The four formats of the Message Header are shown below:

When FMT is 0, the Message Header takes 11 bytes (note that the 11 bytes here do not include the length of the Basic Header), the timestamp of 3 bytes, the Message length of 3 bytes, A message type Id of 1 byte length and a message stream Id of 4 bytes.

Where, timestamp is an absolute timestamp, representing the time when the message was sent; Message Length is the length of the chunk body. Message Type ID represents the message type, which we’ll talk about later; The Message Stream ID is a unique message identifier. Note that if the absolute Timestamp of the message is greater than 0xFFFFFF, this means that the time is too large to be represented by 3 bytes. It needs to be represented by Extended Timestamp. The Extended Timestamp is 4 bytes long. Default is placed between the Chunk Header and the Chunk Body.

When FMT is 1, the Message Header occupies 7 bytes, which is a Message stream ID removed from the previous 11 bytes of the chunk Header. This chunk is reused with the previous chunk stream ID. This is typically used for variable-length message structures.

When FMT is 2, the Message Header only takes 3 bytes, so it contains only 3 bytes of timestamp, and has both the stream ID and the Message length. Generally used for fixed length messages that require correction time (e.g., audio data).

When FMT is 3, the Chunk Header does not contain the Message Header. In general, when unpacking a complete RTMP Message, the first Chunk of the Message will be split into a Chunk of FMT 0, and the next Chunk of the Message will be split into a Chunk of FMT 3. The subsequent Chunk headers will be smaller, which makes the implementation easier and the compression rate better. Of course, if the first Message is sent successfully and the second Message is sent again, the first Chunk of the second Message will be set to a Chunk whose FMT is of type 1 and the subsequent Chunk whose FMT is of type 3, thus allowing the Message to be separated.

3.1.2 the Chunk Body

After spending a lot of time describing the Chunk Header, let’s briefly describe the Chunk Body. Compared to the Chunk Header, the Chunk Body is relatively simple, with less variable length controls and a relatively simple structure. The data inside this Chunk is the data with real business meaning. The default length is 128 bytes (changes can be negotiated through the Set Chunk size command). The packet organization format is generally AMF or FLV format of audio and video data (without FLV Tag header). The data composition of the AMF organizational structure is shown in the figure below. The FLV format is not described in depth in this article. If you are interested, you can read the official FLV documentation.

3.1.3 AMF

AMF(Action Message Format) is a binary data serialization Format similar to JSON and XML. Adobe Flash and remote servers can communicate through AMF data.

The format of AMF is actually very similar to the data structure of MAP, that is, based on the KV key-value pair, the length of the Value Value is added in the middle. Sometimes the len field is empty, depending on the type. For example, if we are transferring AMF data of type number, then we can ignore the len field, because our default number field takes 8 bytes. We can ignore it on our side.

For example, if AMF is transmitting data of type 0x02 string, the default length of LEN is 2 bytes, because 2 bytes is enough to represent the maximum length of value. And by the same token, of course there are times when len and value don’t exist, like when we pass 0x05 and pass null, we don’t need len and value.

The following is a list of the corresponding tables for some commonly used AMF types. For more information, please refer to the official documentation.

We can use Wireshark to capture packets and actually experience the specific AMF0 format.

As shown in the figure above, this is a very typical capture of a string structure of type AMF0. There are two major versions of AMF, AFM0 and AMF3, and AMF0 is still the dominant version of AMF in today’s practical usage scenarios. What is the difference between AMF0 and AMF3? When a client sends a Chunk of AMF Data to the server, how does the server know whether it is AMF0 or AMF3 when it receives that information? In fact, RTMP uses Message Type ID in the Chunk Header to make the distinction, which is equal to 20 when the message is encoded in AMF0 and 17 when the message is encoded in AMF3.

3.1.4 the Chunk & Message

A Message is composed of multiple chunks. Multiple chunks with the same ID are called Chunk streams, and the receiver can recombine and parse them into a complete Message. RTMP has many more message types than RPC messages. The RPC message types mentioned above are basically request, response, and heartbeat, but the RTMP protocol has many more message types. RTMP messages fall into three main types: protocol control messages, data messages, and command messages.

Protocol control messages: Message Type ID = 1~6, mainly used for control within the protocol.

Message Type ID = 8 9

188: Audio data

9. Video data 1

8: Metadata includes audio and video encoding, video width and high audio and video Metadata.

Command Message (20, 17) : This type of Message mainly includes NetConnection and NetStream. The two types have multiple functions respectively. The call of this Message can be understood as remote function call.

The general diagram is as follows. It will be introduced in detail in the following chapter of source code analysis, in which the colored part is the common message.

3.2 Core implementation process

Learning the network protocol is a boring process. We try to combine the original RTMP protocol with Wireshark’s way of capturing the package, and try to vividly describe the core process of RTMP, including handshake, connection, createStream, push flow and pull flow. The basic environment for capturing packet data in this section is: LiveGo as RTMP server (service port 1935), OBS as push stream application, and VLC as pull stream application.

As an application layer protocol analysis, first of all, we should pay attention to the grasp of the main process, for each RTMP server, each push flow and pull flow from the code level, is a network link, for each connection, we need to carry out the corresponding procedures for processing. As shown in the LiveGo source code, there is a handleConn method which, as the name implies, handles each connection. In terms of the main flow, there is the first part of the handshake, and the second core module is based on the RTMP package protocol. Analyze the Chunk headers and the Chunk body, and then do specific processing according to the Chunk headers and the Chunk body that are resolved.

As you can see from the above code block, there are two core methods: HandShakeServer, which handles the handshake logic; The other is the readMsg method, which handles the reading of the Chunk header and body information.

3.2.1 Part 1 – Handshake

Section 5.2.5 of the original protocol describes in detail the RTMP handshake process, as shown below:

At first glance, this process may seem a little complicated. So, let’s use Wireshark to catch the package first to see the whole process.

Wireshark’s packet capture Info can tell us the meaning of the RTMP package. As you can see from the following figure, the handshake mainly involves three packages. Where packet 16 is the client sends C0 and C1 messages to the server, packet 18 is the server sends S0, S1 and S2 messages to the client, packet 20 is the client sends C2 messages to the server. This completes the handshake between the client and the server.

As can be seen from Wireshark packet capture, the handshake process is very simple, a little similar to the TCP three-time handshake process. Therefore, in terms of the actual packet capture, there are some differences with the RTMP protocol original 5.2.5 introduced, and the overall process is very simple.

Now it’s time to look back at the more complicated handshake flowchart above. The client and server are divided into four states in the figure, which are: uninitialized, version number sent, ACK sent, and handshake completed.

Uninitialized: There is no communication phase between the client and the server.

Sent version number: Sent C0 or S0;

ACK sent: C2 or S2 sent;

Handshake completed: S2 or C2 received.

The RTMP protocol specification does not specify the order of dead C0, C1, C2 and S0, S1, S2, but does specify the following rules:

The client must receive S1 from the server before sending C2;

The client can only send other data after receiving S2 from the server;

The server must receive the C0 sent by the client before sending S0 and S1;

The server must receive C1 from the client before sending S2;

The server must receive C2 from the client before it can send any additional data.

According to Wireshark’s packet capture analysis, the handshake process does follow the above rules. Now the question is, what are these messages, C0, C1, C2, S0, S1, and S2? In fact, their data format is clearly defined in the RTMP protocol specification.

C0 and S0:1 byte length. This message specifies the RTMP version number. The values range from 0 to 255, and we just need to know that 3 is what we need. If you are interested in other meanings, read the original agreement.

C1 and S1:1536 bytes long, consisting of timestamp + zero + random data, the middle package of the handshake process.

C2 and S2:1536 bytes long, consisting of timestamp + timestamp 2+ random data back, basically C1 and S1 echo data. In general, the implementation would make S2 = C1, C2 = S1.

Let’s combine the LiveGo source code to enhance our understanding of the handshake process.

So far, the simplest handshake process is over. It can be seen that the whole handshake process is relatively clear, and the processing logic is relatively simple and easy to understand.

3.2.2 Part II – Information Exchange

3.2.2.1 Parse Chunk information of RTMP protocol

After the handshake, it is time to do the connection and other related things, and then do the information processing, the work to do a good thing must first sharpen its tools.

We first need to parse the Chunk Header and Chunk body in accordance with the RTMP protocol specification, convert the byte packet data transmitted by the network into the information processing that we can recognize, and then according to these identifiable information data, and then do the corresponding process processing. This is the key core of source code parsing, which involves a lot of knowledge points. We can combine the above together, you can easily understand the ReadMsg core logic understanding.

The above code block is very logical. It mainly reads each conn connection, carries out corresponding codec to get each Chunk, and merges the same ChunkStreamId again into the corresponding ChunkStream. The final Chunk Stream is the Message.

This piece of code is close to the chunkstreamId that we introduced in the theory section, so you can look at it together. In your mind, you should pay attention to a conn connection that passes multiple messages, such as a connection Message, CreateStreamMessage and so on, each Message is a Chunk Stream, so the authors of LiveGo use a data structure like a map to store it, a key is a CSID, Value is chunkstream, so that all the information sent to the RTMP server can be saved.

The logical implementation of the ReadChunk code is divided into the following parts:

1) Modification of CSID. As for the theoretical part, refer to the above logic. In fact, this part is the processing of basic header.

2) The Chunk Header is processed in accordance with the Format value, which has been introduced in the theoretical part above and explained in the following comments. There are two technical points that need to be noted. The first is the processing of timestramp timestamp, and the second is the chunk.new(pool) line of code. Note that the code comments are also very clear.

3) Read processing of the Chunk Body. As mentioned in the theory part above, when FMT of the Chunk header is 0, there will be a message length field which will control the size of the Chunk Body. According to this field, We can easily read the Chunk body information. The overall logic is as follows.

So far, we have successfully parsed the Chunk Header and read the Chunk Body. Note that we have only read the Chunk Body and have not parsed the Chunk Body according to the AMF format. For the logical processing of the Chunk Body part, we will introduce the detailed source code in the following paragraphs. But now that we’ve parsed a ChunkStream that was sent from the connection, we can return to the analysis of the main flow.

As mentioned earlier, after the handshake is completed, and we have also analyzed the ChunkStream information, then we will carry out the corresponding process process according to the typeId of ChunkStream and the AMF data in the Chunk Body. The specific idea can be understood as follows: client A sends XXXCMD command. The RTMP server parses the XXXCMD command according to the TypeID and AMF information, and gives the corresponding command response.

AMF3 and AMF0 are supported by the RTMP server. AMF3 and AMF0 are supported by the RTMP server. AMF3 and AMF0 are supported by the RTMP server in the handleCmdMsg code block. The next step is to parse the Chunk Body data in AMF format, and the results are also stored in Slice format.

With the TypeID and AMF resolved, it’s time to process the individual commands naturally.

Next comes the processing of each client command.

3.2.2.2 connection

Connect: The client and server will confirm the size of the window, the size of the transmission block and the size of the bandwidth during the connection process. The RTMP protocol describes the connection process in detail, as shown below:

Again, here we use Wireshark to capture the packet analysis:

As can be seen from the packet capture, the connection process was completed in only three packages:

Packet 22: The client tells the server that I want to set chunk size to 4096;

Packet 24: The client tells the server that I want to connect to an application called “live”.

Packet 26: The server responds to the client’s connection request, determines the window size, bandwidth size, and chunk size, and returns “_result” to indicate a successful response. This is done through a TCP package.

So how do clients and servers know what these packages mean? This is the RTMP protocol specification for the rules, we can read the specification to understand, and of course we can use Wrieshark to help us quickly resolve. The following is the detailed parsing of package 22, focusing on the RTMP protocol parsing information.

As can be seen from the figure, RTMP Header contains Format information, Chunk Stream ID information, Timestamp information, Body size information, Message Type ID information and Messgae Stream ID information. The Type ID has a hexadecimal value of 0x01, which means Set Chunk Size and belongs to Protocol Control Messages.

Section 5.4 of the RTMP protocol specification states that for protocol control messages, the Chunk Stream ID must be set to 2 and the Message Stream ID must be set to 0. The timestamp is ignored. From the information parsed by Wireshark, packet 22 is indeed RTMP compliant.

Now let’s look at the detailed parsing of packet 24.

Packet 24 is also issued by the client, and you can see that it sets Message Stream ID to 0 and Message Type ID to 0x14 (decimal 20), meaning the AMF0 command. AMF0 is RTMP Command Messages. The RTMP protocol specification does not specify the Chunk Stream ID that must be used during the connection process because the Message Type ID is the one that really works. The server responds accordingly based on the Message Type ID. The AMF0 command sent during the connection process carries data of type Object, which tells the server the application name and playback address to connect to.

The following code is how LiveGo handles a client request for a connection.

After receiving the request from the client to connect to the application, the server needs to make a corresponding response to the client, which is the contents of packet 26 captured by Wireshark. The details are shown in the following figure. You can see that the server does several things in a packet.

We can learn more about this process in combination with the LiveGo source code.

3.2.2.3 createStream

Once the connection is complete, you can create the stream. The process of creating the flow is relatively simple, requiring only two packages, as shown below:

Packet No. 32 is the CreateStream request made by the client, and Packet No. 34 is the response of the server. The following is the source code of LiveGo to handle the client connection request.

3.2.2.4 push flow

Once the stream is created, you can start to push or pull the stream, as shown in Section 7.3.1 of the RTMP protocol specification. The process of linking and creating a stream has already been described in detail, but let’s focus on the process of Publishing Content.

Before you can use LiveGo to push a stream, you need to get the ChannelKey of the push stream first. We can get the channelKey of “movie” by using the following command. The value of the data field of the Content in the response is the ChannelKey needed to push the stream.

$ curl http://localhost:8090/control/get? room=movie StatusCode : 200 StatusDescription : OK Content : {" status ": 200," data ":" rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575K LkIZ9PYk "} RawContent: HTTP / 1.1 200 OK Content - Length: 72 Content-Type: application/json Date: Tue, 09 Feb 2021 09:19:34 GMT {"status":200,"data":"rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575K LkIZ9PYk"} Forms : {} Headers : {[Content-Length, 72], [Content-Type, application/json], [Date , Tue, 09 Feb 2021 09:19:34 GMT]} Images : {} InputFields : {} Links : {} ParsedHtml : mshtml.HTMLDocumentClass RawContentLength : 72

Use OBS to push the movie channel named live to the LiveGo server. The push stream address is: The RTMP: / / localhost: 1935 / live/rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575KLkIZ9PYk. Again, let’s take a look at Wireshark’s capture package.

Early push flow, the client to launch the publish requests, thirty-six, also is the contents of the package, the request need to bring in the channel, in this package is “rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575KLkIZ9PYk”.

The server will first check if the channel name exists and if the channel name is in use. If it does not exist or is in use, it will reject the client’s push stream request. Since we have already generated the channel name before the stream is pushed, the client can legally use it, so the server responds in packet 38 with “netstream.publish.Start”, which tells the client to Start pushing the stream. The client needs to send the audio and video metadata to the server before it can push the streaming audio and video data. This is what packet 40 does. We can look at the details of this packet. As can be seen from the following figure, there is a lot of metadata information to be sent, including key information such as video resolution, frame rate, audio sampling rate and audio channel.

Once the server is told the audio and video metadata, the client can start sending valid audio and video data, which the server will receive until the client issues the FCUnpublish and deleteStream commands. The main logic of stream.go’s TransStart() method is to receive audio and video data from the push stream client, cache the latest data packet locally, and finally send the audio and video data to each pull stream end. The virReader.read () method in RTMP. Go is used to Read the single audio and video data of the push-stream customer. The relevant codes and comments are shown below.

Attached to the media header parsing part of the source analysis.

Parse audio head

Parse video header

3.2.2.5 pull flow

With the push stream client push stream, the pull stream client can continue to pull the audio and video data through the server. The pull-flow process is described in detail in Section 7.2.2.1 of the RTMP protocol specification. The process of shaking hands, connecting, and creating a stream has already been described. Let’s focus on the process of the play command.

Again, let’s use Wireshark to capture packets for analysis. The client tells the server through packet 640 that I want to play a channel called “movie”.

Why here is called the “movie” rather than “rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575KLkIZ9PYk” push flow, in fact this two points to the same channel, just one for a reactor for flow, We can verify this from the LiveGo source code.

The service side, after receipt of the request pull flow client play will respond “NetStream. Play. Reset”, “NetStream. Play. Start”, “NetStream. Play. PublishNotify” and audiovisual metadata. Once this is done, you can continue to send audio and video data to the pull stream client. We can look at the LiveGo source code to deepen our understanding of this process.

The push stream data is read through CHAN and sent to the pull stream client.

So far, the whole RTMP main process is like this, there is no FLV, HLS and other specific transmission protocol or format conversion source specification, that is to say, the RTMP server receives the push stream client audio and video package will be distributed to the pull stream client intact, and no additional processing is done. But now the major cloud manufacturers pull the stream end support HTTP-FLV, HLS and other transmission protocol support, and also support audio and video recording playback on-demand function, this LiveGo actually also supports.

Due to space limitation, the introduction will not be carried out here. In the future, we will have the opportunity to learn and share LiveGo’s processing of this logic separately.

Four, outlook

At present, RTMP protocol based live broadcast is the benchmark protocol for live broadcast in China, and it is also a live broadcast protocol compatible with all cloud manufacturers. Its excellent features such as multiplexing and subcontracting are also an important reason for major manufacturers to choose it. On this basis, it is also because it is an application layer protocol. Large cloud manufacturers such as Tencent, Ali and Voice Network will also reform the details of their protocols and source code, such as realizing multi-channel audio and video streaming mixed streaming, single-channel recording and other functions.

However, RTMP also has its own disadvantages. High delay is one of the biggest problems of RTMP. In the actual production process, even in a relatively healthy network environment, the RTMP delay will be 3~8s, which is quite different from the theoretical 1-3s delay value given by various cloud manufacturers. What are the problems with delay? We can imagine the following scenarios:

Online education, students ask questions, the teacher talked about the next knowledge point, only to see the students on a question.

E-commerce live broadcast, ask baby information, anchor “look but ignore”.

After the reward is delayed to hear the anchor’s mouth broadcast thanks.

In other people’s shouts know the ball into, you see or live?

Especially now live had formed the industrial chain of environment, a lot of anchor is use it as a profession, a lot of anchor used live under the same network in the company, the company exports of network under the condition of limited bandwidth, RTMP and FLV format of delay will be more serious, high delay live affect the real-time interaction of the user and the host, It also hinders the landing of some special live broadcast scenes, such as live broadcast with goods and educational live broadcast.

The following is a general solution using the RTMP protocol:

According to the actual network situation and some Settings of push stream, such as key frame interval, push stream bit rate and so on, the delay is generally about 8 seconds, and the delay mainly comes from two major aspects:

CDN link delay, which is divided into two parts, one is network transmission delay. There are four segments of network transmission in CDN. Assuming that the delay brought by each segment of network transmission is 20ms, the four segments of delay is 100ms. In addition, the use of RTMP frame as the transmission unit means that each node must receive a full frame before starting the process of forwarding to the downstream; In order to improve the performance of concurrency, CDN will optimize the routing strategy, which will increase some delay. In the case of network jitter, the delay is even more uncontrollable. In the case of reliable transmission protocol, once there is network jitter, the subsequent transmission process will be blocked, and it needs to wait for the retransmission of the previous sequence packet.

The player-side buffer is the main source of latency. The public network environment is very different, and the network jitter in any one of the links such as push stream, CDN transmission and play and receive will affect the player end. In order to combat the jitter of the front link, the usual strategy of the player is to retain about 6s of media buffer.

Through these instructions, we can clearly know, broadcast the largest delay is pull flow side (both playback buffer) delay, so how to eliminate the phase delay, quickly is the problem to be solved major cloud vendors, this is the subsequent major cloud vendors eliminate RTMP protocol delay new products, such as tencent cloud “fast” live, Aliyun’s ultra-low time delay RTS live broadcast and so on. In fact, these live broadcasts have introduced WebRTC technology, and we will have the opportunity to learn relevant knowledge together in the future.

V. Reference materials

1.RTMP official documentation

2.AMF0

3.AMF3

4. Official FLV documentation

5.FLV file format analysis

6. Livego source code

7. Hand tear RTMP protocol special

Author: Vivo Internet Server Team -Xiong Langyu