An overview of the

This document mainly introduces the RTMP protocol format, signaling process, and FLV Tag. The document structure refers to rtmp_specification_1.

The background of the author’s project is the iOS streaming end, so the focus of the study is also on the RTMP streaming process, and it will be further improved when it comes to the streaming end. Thank you for pointing out any mistakes.

It is not easy to play by hand, thank you for your support.

1. Introduction

Adobe’s Real-time Messaging protocol (RTMP) provides two-way message multiplexing on top of reliable stream-transport protocols such as TCP, enabling both ends of communication to transfer parallel audio and video streams, data streams, and associated time information. Through the abstract chunk stream, fine-grained data packets can be sent, and different channels can be allocated for different types of information, so that when the transmission capacity is limited (such as poor network), high-priority data can be transmitted first to reduce network delay.

2. Proper nouns

  • Message Stream: a logical channel in a data stream.
  • Message Stream ID: Each Message in the data flow has a corresponding MSID to identify the logical channel.

The default is 0. The push stream updates this ID during the _result callback of createStream, and then sends the message using the new MSID.

  • Chunk: The message is divided into smaller packets according to the size of Chunk before transmission. Chunk is the most basic packet of the RTMP protocol layer and the smallest unit to be sent to the lower layer. Chunks of audio and video data need to be time-stamped increments (except rollback playback).
  • Chunk Stream: logical channel of Chunk streams.
  • Chunk Stream ID: CSID identifies the corresponding Chunk stream.

Generally, the cSID of the control flow is 2 and the command flow is 3. During the development, it was found that the CSID of audio and video streams can be customized, such as audio stream 4 and video stream 6.

  • Action Message Format (AMF) : A binary Format used to serialize ActionScript object diagrams (functions like JSON but takes fewer bytes than JSON), two versions AMF0 and AMF3.

3. Byte order, alignment, and time format

Unless otherwise specified, the byte order is in big-endian format. Default byte alignment, such as filling data with less than 16 bits in a 16-bit field with 0 padding. The timestamp is an unsigned integer of the millisecond level.

4. Handshake

Establishing an RTMP connection requires a handshake, which consists of three chunks of fixed length.

4.1 Sequence Requirements

  • Client must receive S1 to send C2;
  • The client can send other data, such as connect, only after receiving S2.
  • The server must receive C0 to send s0 and S1;
  • Server must receive C1 to send S2.
  • Server must receive C2 before sending additional data;

As shown in the figure below

In practical development, in order to minimize The Times of communication, the sending sequence can be optimized into three steps:

4.2 c0 and s0 Format

Both c0 and s0 are 8-bit integer fields, as shown in the following figure

Version (8 bit) : Indicates the Version supported by the client or server. The value is usually 3.

4.3 c1 and s1 Format

C1 and S1 are both 1536 bytes, as shown below

Time (4 bytes) : indicates the timestamp, usually 0, which is the reference Time for sending the chunk stream later.

Zero (4 bytes) : The value is always 0.

Random data (1528 bytes) : indicates a Random number.

4.4 c2 and s2 Format

C2 and S2 are also 1536 bytes, as shown below

Time (4 bytes) : For C2, enter the timestamp in the s1 package sent from the peer end. The same is true for S2.

Time2 (4 bytes) : For C2, enter the timestamp of the time when the s1 package was read. For S2, the timestamp can be 0.

Random Echo (1528 bytes) : For C2, fill in the Random number in s1 packet sent by the peer end. The same is true for S2.

At this point, the handshake phase is complete and you can send message.

5. RTMP Chunk Stream

Chunk Stream is an abstraction of the Stream transmitting RTMP Chunk. It can transmit Control, Audio, Video and other streams simultaneously and be logically distinguished from each other.

5.1 the Message Format

A message that can be sent as chunks should contain the following necessary fields:

Timestamp (3 bytes) : Timestamp of message;

Length (3 bytes) : The payload Length of the message, not the Length of the message. For control messages, it is the Length of the command. For audio and video messages, it is the Length of the FLV audio/video tag.

Type Id (1 byte) : indicates the message Type, which contains the reserved subdivision commands, such as Audio and video, of the control message. You can also customize the Id for service protocols above RTMP.

Message Stream ID (4 bytes) : Specifies the unique identifier of a Message (MSID for short). When multiplexing the same chunk Stream, the mSID is used to demultiplexing the same chunk Stream. This ID is used to distinguish whether it is the same chunk of the Message.

5.2 Chunking

The message sent by RTMP needs to be split into chunks according to the size of chunk and can be sent to the next chunk only after the previous chunk is sent. Therefore, chunk is the smallest unit for data packet encapsulation and sending. Each chunk carries (or is indirectly found) an MSID with which the receiver assembles the chunk into a Message.

The advantage of doing this is to avoid blocking some messages with large amount of data but low priority (such as video) and messages with small amount of data but high priority (such as Audio and control). In this way, the lag caused by network fluctuation can be avoided to a certain extent in live broadcast scenes. In addition, the length of chunk headers is variable. Type0-type3 compression headers can be selected based on scenarios to reduce data waste caused by subcontracting.

The default chunk size is 128 bytes. During the transfer process, the Set Chunk size message notifies the peer end of the maximum chunk split size. Larger chunk sizes reduce CPU usage, but at the same time longer write times tend to block later, more important messages on low-bandwidth networks. Smaller chunk size can reduce the blocking problem, but it will introduce more additional information (headers in chunk), and a small number of multiple transfers will also cause transmission interruptions and fail to take full advantage of the high bandwidth network. Therefore, in actual transmission, the chunk size should be dynamically adjusted according to different network bandwidths to achieve the optimal tradeoff between CPU utilization and network bandwidth.

5.2.1 the Chunk Format

Each chunk consists of header and data, as shown in the following figure

Basic Header (1-3 bytes) : This field contains the chunk type and the Chunk Stream ID (CSID for short). FMT determines the type of chunk and the length of the Message header, which is 2 bits long. The length of the Basic header depends on the value of the CSID. At least 1 byte;

Message Header (0,3,7 or 11 bytes) : The length depends on the chunk type of the Basic Header. There are headers of type 0,1,2,3. Note that the message length field refers to the length of the message body (excluding the Header). This field fills in the body length of the message before the split, not the chunk length;

Extended Timestamp (0 or 4 bytes) : Enable this field when timestamp or timestamp delta in message header is greater than 0xFFFFFF to store the direct or indirect timestamp. Note that the real timestamp must be written when storing the direct timestamp.

Chunk Data (variable size) : Load Data of Chunk. The maximum size is Chunk size.

5.2.1.1 Chunk, Basic Header

FMT (2 bits) : this field is the chunk type mentioned above. Its value is 0, 3, indicating the four types of chunk Message headers.

Csid: RTMP supports user-defined CSID in the range [3, 65599]. Because 0, 1, and 2 are reserved for the protocol, the reserved ID can be used to calculate the length of Basic header and small-endian storage.

0 indicates that the Basic Header is 2 bytes in total and the CSID is between [64, 319].

1 represents 3 bytes, and cSID is between [64, 65599].

2 indicates that the chunk is Control information. The CSID of Protocol Control Message and User Control Message must be 2.

Csids in [2, 63] can use 1-byte Basic headers, which can be read directly by the CSID.

Csids in [64, 319] can use 2-byte headers, and the actual CSIDs are calculated from the second byte + 64.

Csids in [64, 65599] can use 3-byte headers, where cSIDs occupy 2 bytes, which involves byte order. Note that this is small endian storage. The actual CSID is calculated by (the third byte) *256 + the second byte + 64.

Although you can customize the CSID as you like, try to use a smaller ID to reduce the Header length.

5.2.1.2 the Chunk Message Header

5.2.1.2.1 Type 0

The Chunk Message Header of Type 0 has a total of 11 bytes. Type 0 must be used if the first Chunk and timestamp of the start of the Chunk stream have rollback (such as rollback playback).

Timestamp (3 bytes) : The direct timestamp (as distinct from the indirect timestamp TS delta mentioned later), if the timestamp is greater than or equal to 16777215 (binary 0xFFFFFF), this field must be equal to 16777215, which is then forwarded to the 4-byte Extended Timstamp field;

Message length (3 bytes) : Note that this is the length of the message payload before it is split into chunks, rather than the length of the data in the chunk.

Message Type ID (1 byte) : indicates the type of the message. For example, 8 represents audio data, 9 represents video, and other values represent subdivided control messages, as described later.

Message Stream ID (4 bytes) : Usually all messages in the same Chunk stream come from the same Message Stream, and while it is possible to transfer different Message Streams into the same chunk stream, this negates the benefits of chunk Message Header compression. So when you’re developing live streams, the CSIDs for audio and video are different, but the MSIDs are generally the same;

5.2.1.2.2 Type 1

The total length is 7 bytes. Compared with Type 0, the MSID field is omitted and the MSID is shared with the previous chunk.

Timestamp delta (3 bytes) : indicates the indirect timestamp. It indicates the difference between the timestamp and the previous chunk. It can be calculated by the timestamp+delta of the previous chunk.

5.2.1.2.3 Type 2

Msid and Message Length are omitted. These fields are shared with the previous chunk and are used after the first chunk when sending data of fixed length.

5.2.1.2.4 Type 3

Chunks of Type 3 do not have Message headers.

When it follows the chunk of Type 0, it means that the timestamp of the chunk before it is the same as that of the previous chunk. In other words, when a Message is split into multiple chunks, the latter chunk and the previous chunk belong to the same Message, so the Message header is not required.

When it follows a chunk of Type 1 or Type2, the difference is the same as the previous timestamp. If the first chunk is Type 0, timestamp = 0, the second chunk is Type 2, timestamp delta =20, indicating that the timestamp is 0+20=20, and the third chunk is Type 3, then timestamp delta =20, Indicates that the timestamp is 20+20=40.

5.2.2 Examples

5.2.2.1 Example 1

The following figure shows a common Audio message flow

Analysis:

  1. The chunk type of the first Message chunk is 0 because there are no previous chunks to refer to. In this case, the chunk length is Basic Header (1 byte) + Message Header (11 bytes) + Payload (32 bytes) = 44 bytes.
  2. The chunk of the second Message can share length, type ID and MSID with the previous one, but the timestamp delta field cannot be omitted. Therefore, the chunk type is 2, and the chunk length is 36 bytes.
  3. The chunk of the third Message can share the previous timestamp Delta field, so the chunk type is 0 and the chunk length is 33 bytes.

As shown in the figure below

5.2.2.2 Example 2

The following figure shows how to split an overly long message into chunks of 128-byte chunk size

Analysis:

  1. The Payload length is 307 > 128 bytes. Therefore, the Message needs to be broken into chunks. The first chunk uses Type 0.
  2. Since the subsequent chunk is split from the same Message, the above fields can be shared, so Type 3 is directly used.
  3. Note that the length of the first chunk requires the payload length of the entire message (307);

As the two examples above show, chunk of Type 3 can be used in two scenarios:

  1. Non-first chunk of a single large Message split;
  2. A Message can share these fields with the previous Message, and its first chunk can also be Type 3.

Scenario 1 is common in development. For example, if a video frame is larger than the chunk size (128 bytes by default), Type 3 can effectively reduce the chunk header length.

5.3 Protocol Control Messages

In the chunk stream of RTMP, special type ids such as 1,2,3,5,6 are used to represent protocol control messages. The msid of such control messages must be 0, the csid must be 2, and the timestamp must be 0.

5.3.1 Set Chunk Size (Message Type ID = 1)

As mentioned above, RTMP messages need to be packaged into chunk packets and sent in the unit of chunk size. Therefore, the receiver needs to unpack the packets according to the chunk size. Therefore, both ends need to record the packet unit of chunk size of the peer end.

This message can be sent to inform the peer end to update the chunk size of the local end. For example, if the client wants to send 131 bytes of audio data, the chunk size is 128 bytes. If the chunk size is not updated, the client needs to split the data into two chunks. The chunk size of the client is updated to 131 bytes, and then a chunk with 131 bytes of data is sent. After receiving the Set chunk size, the server can correctly parse the incoming chunk by updating the chunk size.

The chunk size of the two terminals is maintained independently and can be different.

The first bit of the payload is always 0.

Chunk size (31 bits) : Can represent the interval of [1, 0x7FFFFFFF], but because the chunk size is smaller than the length of Message, and the Message Length field is stored in 3 bytes with a maximum value of 0xFFFFFF, the actual value range is [1, 0xFFFFFF].

5.3.2 Abort Message (2)

During data sending, the sender can send an Abort Message to inform the receiver to discard the Message that has not been received and to ignore subsequent messages. The receiver can discard all subsequent data in the corresponding chunk stream according to the CSID in the Abort Message. For example, when the sender needs to be shut down, the data after sending this message to the peer end can not be processed.

Chunk Stream ID (32 bits) : The payload contains a cSID field, which indicates the chunk stream that can be ignored.

5.3.3 Acknowledge (3)

When the number of bytes received from the peer is equal to the window size, the receiver sends an ACK to inform the peer that the data can continue to be sent. Of course, this window size can also be interpreted as the size of the receiver’s Read buffer, meaning the maximum amount of data the receiver can process before sending an ACK.

Sequence Number (32 bits) : Indicates the total amount of data received by the receiving end starting with Handshake, in bytes.

5.3.4 Window Acknowledge Size (5)

Corresponding to the ACK mentioned above, the sender can send this message to inform the peer to update the window size, usually before the audio and video data.

Unlike the chunk size, the Window size is usually larger to accommodate more data in the buffer, and the window size of both ends is maintained together and remains the same.

5.3.5 Set Peer Bandwidth (6)

Limit the output bandwidth of the peer end. After receiving the message, the receiver limits the output bandwidth by limiting the size of the data that has been sent but has not received ack. If the bandwidth is different from the window size sent to the sender, the receiver needs to send the Window Acknowledge Size message again.

Limit type can be:

0 -hard: The receiver immediately changes the window ACK size to window ACK size.

1-soft: The receiver can update the window size or keep the original value, but the original value should be smaller than the window ACK size.

2-dynamic: If the limit type of the Set Peer Bandwidth message received last time is Hard, press Hard this time as well. Otherwise, the message is ignored.

6. RTMP Command Messages

The RTMP Chunk Stream Protocol (RTMP Chunk Stream Protocol) is a Protocol Control Message that manages Chunk streams. This chapter describes command messages at the RTMP Stream layer.

6.1 the Types of Messages

6.1.1 Command Message (Message Type ID = 20,17)

Command message, which carries the command encoded by AMF format, AMF0 type ID is 20, AMF3 is 17. For example, connect, createStream, publish, play, pause, etc., the peer end will return the status such as _result, _error, etc. The specific format and how to map the command and callback one by one will be described later. The command message contains the Command name, Transaction ID, and params.

6.1.2 Data Message (18,15)

Data messages, which pass audio and video metadata and other user data, AMF0 type ID is 18 and AMF3 type ID is 15.

This is very necessary. For example, before sending the actual audio and video data, the @setDataframe command must be sent to send metadata such as video resolution, frame rate and bit rate, audio sampling rate and audio and video coding mode, so that the receiver can correctly parse the subsequent data after receiving it.

6.1.3 Share Object Message (19,16)

Shared messages, used when multiple clients need to share some key-value pairs, AMF0 is 19 and AMF3 is 16.

6.1.4 Audio Message (8

6.1.5 Video Message (9) Video Message

6.1.6 Aggregate Message (22)

Aggregate messages, which can aggregate multiple sub-messages into one message to reduce the number of chunks sent. And the submessages are stored consecutively in memory, making them more efficient when making system calls.

6.1.7 User Control Message Events (4)

A user control message is sent by the client and server to inform the peer to execute a user control event. The mSID of such messages must be 0 and the CSID must be 2. The payload format (RTMP Chunk Data) is shown below

Event Type (16 bits) : specifies the Event Type.

Event Data (variable length) : Since the underlying layer needs to be packaged as chunk, it is necessary to ensure that Event Type+Event Data <= chunk size in order to avoid being split into multiple chunks.

The most common ones are

Set Buffer Length: Before the server processes the data stream, the client informs the server of the size of the received stream per millisecond.

Ping Request & Response: If the server does not receive data for a long time, it sends a Ping message to the client to check whether the client is reachable. If the client status is normal, the server replies with this message to indicate that the network is reachable.

6.2 the Type of Commands

There are many types of command type messages, all of which, as mentioned above, require AMF encoding and contain command name, transaction ID, and related parameters. For example, the client sends the connect command with the name of the application to be connected as a parameter, and the server replies with the received TRANSACTION ID corresponding to the message. Reply commands have _result, _error, or other method names such as verifyClient, contactExternalServer. The CSID of a command type is usually 3.

Commands fall into two categories, NetConnection and NetStream.

6.2.1 NetConnection Commands

Connect command, maintain the connection status of both ends, can also achieve RPC remote process call, the receiving end of the callback is _result and _error, respectively table success and failure, there are four commands:

  • connect
  • call
  • close
  • createStream

6.2.1.1 connect

The client sends the connect command to request a connection to the application in the following format

The key value pairs used by the Command Object are as follows

The values of audioCodecs are as follows

VideoCodecs as follows

VideoFunction as follows

The encoding type is as follows

The message structure of the server reply is as follows

The connect process is as follows

  1. The client sends a connection command to connect to the application instance of the server.
  2. The server starts the corresponding application and sends the Window Acknowledge Size message to the client to tell him to set the Window Size.
  3. The server sends Set Peer Bandwidth to limit the default output Bandwidth of the client. This value is usually equal to the window size of the previous step.
  4. After receiving the Set Peer Bandwidth, the client also sends a Window Acknowledge Size message, which is usually equal to the previously received Window Size.
  5. The server sends the User Control Message (StreamBegin) to indicate that it is ready to receive the Message stream.
  6. Server response Step 1 _result or _error informs client of the connection result and contains the same transcation ID (i.e. 1), with parameters such as fmsVer for flash Server version, and capabilities for the number of connections available. And level, code, description and other state description;

6.2.1.2 Call

Remote Procedure calls Remote Procedure calls, also known as Remote method calls, have the following format

The response format is as follows

6.2.1.3 createStream

The client sends this command to create a logical channel for releasing audio and video, that is, message Stream channel. As mentioned earlier, the mSID defaults to 0. When the client receives the createStream callback, it resolves the MSID and updates the local. Msids that send data from then on are all with this ID.

The client sends messages in the following format

The format of the server reply is as follows

6.2.2 NetStream Commands

The stream command is used to manage the status of audio and video streams, such as FCPublish/publish publish stream and play stream. The callback on the receiving end is onStatus. The following command is used:

  • play
  • play2
  • deleteStream
  • closeStream
  • receiveAudio
  • receiveVideo
  • publish
  • seek
  • pause

The format of the onStatus callback is roughly the same, as shown below

6.2.2.1 play

The pull stream side requests the server side to play audio and video streams, which can be called multiple times to request multiple video streams. The resSET field is set to true to play and switch multiple streams, and false will clear other streams to play only the current requested stream. Structure is as follows

The play process is as follows

  1. After receiving the createStream _result, the client sends the play command to request playing the video stream.
  2. After receiving the play command, the server sends the Set Chunk Size, and the subsequent Chunk packets are split based on the Size.
  3. The server then sends a User Control Message, such as StreamIsRecorded and StreamBegin, to inform the client of the current flow status.
  4. Server sends onStatus reply message corresponding to the play command, if the flow is the reply NetStream. Play. Start, there is no reply the NetStream. Play. StreamNotFound, if the play command set the reset, The server will also reply to netstream.play.reset.
  5. The server then sends the audio and video data to be played.

6.2.2.2 play 2

Both the play command and the preceding command request to play audio and video streams. The difference is that the server maintains video streams at multiple bit rates. The client can call the play2 command to switch.

The structure is as follows, where NetStreamPlayOptions uses ActionScript 3 format.

The play2 process is similar to play, as follows

6.2.2.3 deleteStream

Corresponding to createStream, the client sends a deleteStream command when it terminates a push. The server does not need to reply to this message, structured as shown below, where the Stream ID is the MSID to close.

6.2.2.4 receiveAudio

The client notifies the server whether to receive audio data

If flag is false, the server does not need to reply. If true, the server replies with netstream.seek. Notify and netstream.play. Start messages.

6.2.2.5 receiveVideo

The client notifies the server whether to receive video data

If flag is false, the server does not need to reply. If true, the server replies with netstream.seek. Notify and netstream.play. Start messages.

6.2.2.6 publish

On receiving the callback, the client sends createStream to create the published MSID. On receiving the callback, the client sends publish to specify the name of the next stream to be pushed. On receiving the callback, the client sends audio and video streams. The streaming end can connect to the server and pull the audio and video stream with the corresponding name. Structure is as follows

The publish process is as follows

6.2.2.7 the seek

When a certain point in the video is located, in milliseconds, the server returns netstream.seek. Notify success and _error failure, with the following structure

6.2.2.8 pause

The client sends a pause notification to the server to suspend or resume play. The server replies that netstream.pause. Notify is suspended successfully, netstream.unpause. Notify is resumed successfully, and _error fails to be paused

7. RTMP Video Payload

With Handshake, Connect, publish/ Play and other processes mentioned above, you need to send and receive real audio and video data. The data is stored in the payload of message and then divided into chunks and placed in the Data field.

Taking FLV format video as an example, the encoded audio and video frames are stored in chunk data in the format of AUDIODATA and VIDEODATA fields of FLV tag.

7.1.1 AUDIODATA

SoundFormat (4 bits) : Audio frame format, usually AAC;

SoundRate (2 bits) : Sampling rate, always 3 for AAC.

SoundSize (1 bit) : Sampling accuracy, constant 1 for compressed format;

SoundType (1 bit) : number of channels, constant 1 for AAC;

SoundData (variable length) : Audio frame data. For AAC, the value is AACAUDIODATA.

For THE AAC format, constant SoundRate and SoundType values do not mean that the sampling rate and number of channels are constant, but rather that these two fields are ignored because the player can obtain them by parsing AudioSpecificConfig in the AAC codestream.

7.1.2 AACAUDIODATA

An AAC stream requires the transmission of two types of frames

AudioSpecificConfig: which contains encoding subdivision category, sampling rate, number of sound channel three necessary parameters for decoding;

Raw frame data: sequence frame output by the AAC encoder;

7.1.3 AudioSpecificConfig

Object type (5 bits) : encoding subdivision, for example, table 1 AAC_Main and table 2 AAC_LC (low complexity).

Frequency index (4 bits) : sampling rate, for example, table 3 48000Hz and table 4 44100Hz.

Channel Configuration (4 bits) : Channel configuration, such as one table channel and two table dual-channel.

Field 5+4+4=13 bits. If the field is less than 2 bytes, 0 must be added to align the bytes

7.2.1 VIDEODATA

FrameType (4 bits) : indicates the FrameType, such as 1 table key frame, 2 table non-key frame, that is, PB frame.

CodecID (4 bits) : Encoding format, such as 7 table avcC format wrapped H264 sequence frame;

H264 is an encoding format, and its output is NAL Unit that can be transmitted on the network. Due to the different lengths of NALUs, multiple NALUs cannot be correctly split without encapsulation. The mainstream packaging formats are avcC and AnnexB.

VideoData (variable length) : Video frame data. For H264 the value is AVCVIDEOPACKET.

7.2.2 AVCVIDEOPACKET

Among them

AVCPacketType (8 bits) : table 0 contains necessary data such as SPS/PPS for decoding. 1 table sequence frame output by encoder;

CompositionTime (3 bytes) : indicates the delay decoding interval for sequence frames. Because B frames have decoding delay, this value is generally 0 if no B frames exist.

Data (longer) : AVCDecoderConfigurationRecord sequences of H264 avcC encapsulation format or frame;

For the transmission of NALU, the output formats of different encoders are different. For example, the VideoToolBox of iOS outputs avcC format, and the MediaCodec of Android outputs AnnexB format. In the RTMP scenario, format conversion is required if necessary.

7.2.3 AVCDecoderConfigurationRecord

The following describes the RTMP packets captured using the Wireshark

Highlighted in the figure part is AVCVIDEOPACKET, 00 said Data field is stored AVCDecoderConfigurationRecord, among them

ConfigurationVersion =01, version = 1;

AVCProfileIndication=42, equal to SPS [1], baseline, H264 encoding level includes baseline/main/high, the baseline efficiency is the highest for real-time communication;

Profile_compatibility =e0, equal to SPS [2];

AVCLevelIndication=1f, equal to SPS [3], table auto;

LengthSizeMinusOne =ff, length of NALU cell header -1, i.e. (lengthSizeMinusOne&0x03) +1=4 bytes;

NumOfSequenceParameterSets = e1, the number of SPS, general is 1, (numOfSequenceParameterSets & 0 x1f) = 1;

00 sequenceParameterSetLength = 13, SPS length;

SequenceParameterSetNALUnit 18 3 c = 27 42 e0 1 f a9 11 d8 0 b 50 23 0 a d7 bd f0 10, 20 20 SPS unit, it can be seen AVCProfileIndication, Profile_compatibility, AVCLevelIndication corresponding to SPS 1,2,3 bytes;

NumOfPictureParameterSets = 01, PPS number, general is 1;

PictureParameterSetLength = 00 04, PPS length;

PictureParameterSetNALUnit = 28 DE 09 88, PPS unit;

SPS and PPS are generally available through API directly, the meaning of its internal field belongs to the scope of encoder, readers interested in their own research 🙂

8. The appendix

8.1 RTMP Messages

RTMP messages have many types and are tedious, which makes it difficult to understand. The author summarizes the following structure to help understand.

8.2 Wireshark Analyzing Traffic Pushing Processes

The figure above shows the actual interaction flow in the push flow project. Although there are fewer commands than the document, it basically covers the main flow and is applicable to RTMP Server on multiple platforms. The details of the flow are as follows:

8.2.1 Handshake

The simplified version of the handshake process, relatively simple not to repeat;

8.2.2 Connect: C to S

The highlighted part in the figure below is the RTMP header, which consists of chunk Basic Header + Chunk Message Header.

Format: 0 Table type 0 chunk, because this is the first chunk;

Csid: Protocol Control Message and User Control Message are 2, and Command Messages are usually 3, so this is 3;

Msid: createStream creates the corresponding MSID for the message flow channel, so the MSID will have a value only after createStream. TransactionId: always 1, followed by _result;

8.2.3 Window Acknowledge Size: S2C

The server informs the client that its buffer window size is 5000000 bytes.

Csid: This Message belongs to the Protocol Control Message, so it is 2.

8.2.4 Set Peer Bandwidth/Set the Chunk Size / _result (‘ NetConnection. Connect. Success) : s2c

You can see that the CSIDs of the three messages are consistent with the rule above, where the transactionId of _result is 1, corresponding to connect.

8.2.5 Window Acknowledgement Size: C2S

Each time the client receives window size data, it returns an ACK with the message Body containing the number of bytes currently received.

8.2.6 Set Chunk Size: C2S

The client informs the server of the size of the chunk subcontract it sends. The chunk sent by the client is split with 8192 and the chunk sent by the server is split with 4096. The two ends maintain different chunk sizes.

8.2.7 createStream: c2s

The second command message sent by the client, transactionId 2.

8.2.8_result () of createStream: s2c

Both ends explicitly create a logical channel with an MSID of 1, and thereafter send messages with an MSID of 1.

In this case, because the underlying TCP is full-duplex communication, the mSID sent immediately after may not be updated to 1, such as FCPublish, which is also allowed.

8.2.9 FCPublish: c2s

FCPublish is not in the official document, you can see that it is similar to publish message, guess for version compatibility, please inform readers if you know :).

8.2.10 onFCPublish: s2c

8.2.11 publish: c2s

This message was updated to the MSID, and the MSID for all messages since then is 1.

8.2.12 onStatus (‘ NetStream. Publish. Start) : s2c

Publish belongs to NetStream Command Message. As mentioned above, the callback of this type of Message is onStatus and transactionId is 0. After receiving this Message, the client cannot use the transactionId to correspond to the Message, but can use the code in it to correspond to the Message.

From there, the release stream is successful, and audio and video related data will be sent.

8.2.13 @ setDataFrame: c2s

Send audio and video raw data, which can be shared in the pull stream.

8.2.14 Audio Data: C2s

Csid: Audio and video data use different chunk channels, that is, different CSIDs are used, and the CSIDs can be customized.

Format: The current chunk is the first chunk of the chunk channel, so type 0 chunk is used.

Note that the first byte of Audio data is 00, indicating that the chunk contains the AAC sequence header.

Format: indicates that type 1 chunk is not the first chunk of the chunk channel.

Note that the first byte of Audio data is 01, indicating that the chunk contains an Audio sequence frame.

8.2.15 Video Data: C2s

Format: The current chunk is the first chunk of the chunk channel, so type 0 chunk is used.

Note that the first byte of Video data is 00, indicating that the chunk contains AVC sequence header.

Format: indicates that type 1 chunk is not the first chunk of the chunk channel.

Notice that the first byte of Video data is 01, indicating that the CHUNK contains NAL Unit.

If the video frame is larger than the chunk size, it will be divided into multiple chunks. In order to reduce the amount of data, the chunks of Type 3 should be used instead of the first chunk. However, the actual Wireshark does not display chunks of Type 3, so it is inferred that the internal implementation integrates Type 3 into Type 1. Readers don't have to wonder when they grab a bag.

8.2.16 FCUnpublish: c2s

8.2.17 onFCUnpublish: s2c

8.2.18 closeStream: c2s

8.2.19 onStatus (‘ NetStream. Unpublish. Success) : s2c

The resources

1. RTMP

Rtmp_specification_1. 0. PDF

2. Action Message Format

AMF 0

AMF 3

3. FLV

video_file_format_spec_v10.pdf

4. AAC

Understanding_AAC

AudioSpecificConfig

5. H.264

H.264

AVC