Before Coding, we would like to briefly introduce RTP(Real-time Transport Protocol), as its name suggests, a network Protocol for transmitting audio and video over IP networks.

Developed by the Audio and Video Transmission Working Group and first published in 1996, it proposes the following ideas for use.

  1. Simple multicast audio conferencing

Voice communication using IP multicast services. The multicast group address and port pair are obtained through some allocation mechanism. One port is used for audio data and the other for control (RTCP) packets, where address and port information is distributed to the intended participants. If encryption is required, it can be encrypted in a specific format.

  1. Audio and video conference

If both audio and video media are used in a meeting, they are transmitted as separate RTP sessions. The audio and video media use different UDP port pairs to transmit separate RTP and RTCP array packets. The multicast address may be the same or different. The motivation for this separation is that participants can choose if they want only one medium.

  1. The Mixer and the Translator

We need to consider a situation in which most people in a meeting are connected to a high-speed network, while a small number of people somewhere can only connect at a low rate. To prevent everyone from using low bandwidth, an RTP-class repeater Mixer can be placed in a low bandwidth area. Mixer resynchronized the received audio packets to the sender at a constant interval of 20 ms, reconstructed the audio as a single class, encoded the audio as low-bandwidth audio, and forwarded it to the bandwidth packet stream on the low-speed link.

  1. Layered coding

Multimedia applications should be able to adjust the transfer rate to match receiver capacity or to accommodate network congestion. The task of adjusting the rate can be realized by combining layered coding with a layered transmission system. In the context of RTP based on IP multicast, each RTP session is hosted on its own multicast group. The receiver can then adjust the receiving bandwidth only by joining an appropriate subset of the multicast group.

RTP header field

A list of CSRC identifiers exists only if the Mixer exists. These fields have the following meanings. The first 12 8-bit groups are present in each package.

  • version (V): 2 bits

RTP version.

  • Padding (P): 1 bit

If the padding bit is set, the packet contains at least one padding 8-bit group. The other padding bits are not Payload.

  • Extension (X): 1 bit

Exists if extension bits are set.

  • CSRC number (CC): 4 bits

The number of CSRC is contained in the fixed header, the number of CSRC identifiers.

  • Mark (M): 1 bit

Tags are defined by configuration files. Used to flag important events in the packet stream such as frame boundaries.

  • Payload type (PT): 7 bits

This field indicates the RTP payload format, which is interpreted by the application. The receiver must ignore packets of payload types that are not understood.

  • Serial number: 16 bits

Added each time an RTP packet is sent, it may be used by the receiver to detect packet loss and restore the packet sequence.

  • Timestamp: 32 bits

This field reflects the sampling time of the first 8-bit group in the RTP packet.

  • SSRC: 32 bits

Identify the synchronization source. This identifier should be selected randomly. Two synchronization sources in the same RTP conversation should have different synchronization identifiers.

  • CSRC list: 0 to 15 entries, each 32 bits

This field indicates all SSRC’s that contribute to the payload data.

Related implementation of Golang

There are some implementations of RTP, but there are some benefits to implementing it through Go.

  • Easy to test

Here easy to test is not only reflected in easy to write, can quickly pass the source code, function directly generated corresponding test function. More importantly, the ability to provide benchmarks, timing, parallel execution, memory statistics, and other parameters for developers to adjust accordingly.

  • Strong Web development capabilities at the language level

Fast JSON parsing and field encapsulation based on language level. There is no need to introduce tripartite libraries.

  • Excellent performance

Faster than interpreted languages like Python and Ruby, and easier to write than node and Erlang. If concurrency is required in a service, the built-in keyword go can quickly create multiple Goroutines.

Go community RTP has rTP-related implementation, corresponding tests are also more comprehensive, a brief introduction.

Package_test.go (Basic test)

func TestBasic(t *testing.T) { p := &Packet{} if err := p.Unmarshal([]byte{}); err == nil { t.Fatal("Unmarshal did not error on zero length packet") } rawPkt := []byte{ 0x90, 0xe0, 0x69, 0x8f, 0xd9, 0xc2, 0x93, 0xda, 0x1c, 0x64, 0x27, 0x82, 0x00, 0x01, 0x00, 0x01, 0xFF, 0xFF, 0xFF, 0xFF, 0x98, 0x36, 0xbe, 0x88, 0x9e, } parsedPacket := &packet {// Fixed Header Header: Header{Marker: true, Extension: true, ExtensionProfile: 1, Extensions: []Extension{ {0, []byte{ 0xFF, 0xFF, 0xFF, 0xFF, }}, }, Version: 2, PayloadOffset: 20, PayloadType: 96, SequenceNumber: CSRC: []uint32{},}, // Payload: rawPkt[20:], Raw: rawPkt, } // Unmarshal to the used Packet should work as well. for i := 0; i < 2; i++ { t.Run(fmt.Sprintf("Run%d", i+1), func(t *testing.T) { if err := p.Unmarshal(rawPkt); err ! = nil { t.Error(err) } else if ! reflect.DeepEqual(p, parsedPacket) { t.Errorf("TestBasic unmarshal: got %#v, want %#v", p, parsedPacket) } if parsedPacket.Header.MarshalSize() ! = 20 { t.Errorf("wrong computed header marshal size") } else if parsedPacket.MarshalSize() ! = len(rawPkt) { t.Errorf("wrong computed marshal size") } if p.PayloadOffset ! = 20 { t.Errorf("wrong payload offset: %d ! = %d", p.PayloadOffset, 20) } raw, err := p.Marshal() if err ! = nil { t.Error(err) } else if ! reflect.DeepEqual(raw, rawPkt) { t.Errorf("TestBasic marshal: got %#v, want %#v", raw, rawPkt) } if p.PayloadOffset ! = 20 { t.Errorf("wrong payload offset: %d ! = %d", p.PayloadOffset, 20) } }) } }Copy the code

In the basic test, Golang’s own Unmarshal was used to quickly transform Byte slices into corresponding structures. Reduce the related package, unpack code workload. In network transmission, can also be directly completed in the language level of big end, small end coding conversion, reduce coding trouble.

h.SequenceNumber = binary.BigEndian.Uint16(rawPacket[seqNumOffset : seqNumOffset+seqNumLength])
h.Timestamp = binary.BigEndian.Uint32(rawPacket[timestampOffset : timestampOffset+timestampLength])
h.SSRC = binary.BigEndian.Uint32(rawPacket[ssrcOffset : ssrcOffset+ssrcLength])
Copy the code

The operations related to slicing are very convenient and can obtain a certain section of data in the array. The operation is flexible. During the transmission of protocol data, a certain section of data can be obtained through slicing for corresponding processing.

m := copy(buf[n:], p.Payload)
p.Raw = buf[:n+m]
Copy the code

When the implementation is complete, Golang’s child tests can be nested. Especially useful for executing specific test cases, the parent test returns only after the child test completes.

func TestVP8PartitionHeadChecker_IsPartitionHead(t *testing.T) { checker := &VP8PartitionHeadChecker{} t.Run("SmallPacket", func(t *testing.T) { if checker.IsPartitionHead([]byte{0x00}) { t.Fatal("Small packet should not be the head of a new partition") } }) t.Run("SFlagON", func(t *testing.T) { if ! checker.IsPartitionHead([]byte{0x10, 0x00, 0x00, 0x00}) { t.Fatal("Packet with S flag should be the head of a new partition") } }) t.Run("SFlagOFF", func(t *testing.T) { if checker.IsPartitionHead([]byte{0x00, 0x00, 0x00, 0x00}) { t.Fatal("Packet without S flag should not be the head of a new partition") } }) }Copy the code

More relevant implementation can go to GitHub(github.com/pion/rtp) to see the implementation source code.

At the end

It can take a lot of time at the bottom if you focus on the transport details. There are many implementations on the market, both open source and some provided by companies. At present, through industry practice, momo and Xiaomi have adopted the SDK of Sonnet to carry out relevant business time, and some companies have even handed over their core business, which shows its stability. I have tested their cloud classroom related services. The functions of playback and online presentation are very convenient, which can save a lot of development time.