SRTP: Secure Real-Time Transport (SRTP)

After negotiation of DTLS, the two parties of RTC communication complete the negotiation of MASTERKEY and MASTERSALT. Next, we continue to analyze how to use the exchanged keys to encrypt RTP and RTCP in WEBRTC, so as to realize the secure transmission of data. At the same time, this paper will answer the questions encountered in the use of libsrtp, such as, what is ROC and why is ROC 32-bits? Why is error\_code=9 and error\_code=10 returned? Do the exchanged keys have a life cycle, and if so, how long? It is recommended to read the DTLS negotiation before reading this article. Combine the two for better results!

The author is |

Verify | Ty 1

The problem to solve

The RTP/RTCP protocol does not provide any protection for its payload data. Therefore, if an attacker grabs audio and video data through a packet capture tool such as Wireshark, the tool can directly play the audio and video stream out, which is very scary thing.

In WebRTC, to prevent this from happening, instead of using RTP/RTCP directly, we use SRTP/SRTCP, the secure RTP/RTCP protocol. WebRTC uses the well-known libsrtp library to convert RTP/RTCP data into SRTP/SRTCP data.

Problems to be addressed by SRTP:

· Encrypt the RTP/RTCP payload to ensure data security;

· Guarantee the integrity of RTP/RTCP packets and prevent replay attacks.

SRTP/SRTCP structure

SRTP structure

As can be seen from the SRTP structure diagram:

1. Encrypted part-Encrypted Portion, consisting of Payload, RTP Padding and RTP Pad Count. This is what we usually mean by encrypting only the RTP payload data.

2. The Authenticated Portion is required to be verified, which consists of RTP Header, RTP Header Extension and Encrypted Portion.

Normally, only the RTP load data is required to be encrypted. If you need to encrypt the RTP header extension, RFC6904 provides a detailed solution, which is also implemented in libsrtp.

SRTCP structure

As can be seen from the SRTCP structure diagram:

1. Encrypted part of the Encrypted Portion, which is the part after the RTCP Header, is also the same to the Compound RTCP.

2. E-flag explicitly indicates whether the RTCP packet is encrypted or not. (PS: How can an RTP packet be encrypted?)

3. SRTCP Index shows the serial number of the RTCP packet, which is used to prevent replay attacks. PS: Does the sequence number of a 16-bits RTP packet protect against replay attacks?

4. Authenticated Portion to be verified consists of RTCP Header and Encrypted Portion. With an initial understanding of the structure of SRTP and SRTCP, let’s look at how Encrypted and Authenticated Serverings were obtained.

The Key management

In the SRTP/SRTCP protocol, the SRTP/SRTCP Session of a communication participant is identified by a tuple, called the SRTP/SRTCP Session.

In the SRTP protocol, triples are used to identify a stream, and an SRTP/SRTCP Session consists of multiple streams. A description of the encryption and decryption parameters for each stream is called the Cryptographic Context.

The Cryptographic Context of each stream contains the following parameters:

· SSRC: SSRC used by Stream.

· Cipher Parameter: key, salt, algorithm description (type, Parameter, etc.) used for encryption and decryption.

· Authentication Parameter: integrity of the Key, salt, algorithm description (type, Parameter, etc.) used.

· Anti-replay Data: Prevent Replay of the Data information in the cache, such as ROC, maximum sequence number, etc.

During the SRTP/SRTCP Session, each Stream uses its own encryption Key and Authentication Key. These keys are all used within the same Session, called Session keys. These Session keys are derived by using KDF (Key Derivation Function) for Master keys.

KDF is used to export Session keys. By default, KDF is used to encrypt and decrypt Session keys. For example, after the completion of DTLS, the negotiated SRTP encryption algorithm Profile is:

         cipher: AES_128_CM
         cipher_key_length: 128
         cipher_salt_length: 112
         maximum_lifetime: 2^31
         auth_function: HMAC-SHA1
         auth_key_length: 160
         auth_tag_length: 80

The correspondingKDFAES128_CM.Session KeyThe export process of is shown in the figure below:

Session KeyThe export of depends on the following parameters: •key_label: Depending on the type of Key exported,key_labelValues are as follows:

· master\_key: the Key negotiated after completion of DTLS.

· master\_salt: The negotiated Salt after DTLS is completed.

· packet\_index: RTP/RTCP packet number. SRTP uses the 48-bits implicit package required, whereas SRTCP uses the 31-bits packet sequence number. Refer to serial number management.

· Key \ _Derivation \ _Rate: Key export rate, denoted as KDR. The default value is 0, and the Key export is performed once. Value range {{1,2,4… , 2 ^ 24}. In the case of key_derivation_rate>, a key export is performed before encryption, followed by a key export at packet\_index/key\_derivation\_rate >0.

r = packet_index / kdr
key_id = label || r
x = key_id XOR master_salt
key = KDF(master_key, x)

C = A/B=0

||: Represents the meaning of the connection. A, B, C using the network byte order said, C = A | | B, C of the high byte as A, low byte A to B.

XORXOR: is an XOR operation, which is aligned according to the low byte bits.

The following uses AES128_CM to illustrate the export process of the Session Key, assuming that DTLS negotiation is obtained:

master_key:  E1F97A0D3E018BE0D64FA32C06DE4139   // 128-bits
master_salt: 0EC675AD498AFEEBB6960B3AABE6           // 112-bits

Export cipher Key (cipher Key):

packet_index/kdr:              000000000000
label:                       00
master_salt:   0EC675AD498AFEEBB6960B3AABE6
xor:           0EC675AD498AFEEBB6960B3AABE6     (x, KDF input)
x*2^16:        0EC675AD498AFEEBB6960B3AABE60000 (AES-CM input)
cipher key:    C61E7A93744F39EE10734AFE3FF7A087 (AES-CM output)

Export SALT Key (cipher SALT):

packet_index/kdr:              000000000000
label:                       02
master_salt:   0EC675AD498AFEEBB6960B3AABE6
xor:           0EC675AD498AFEE9B6960B3AABE6     (x, KDF input)
x*2^16:        0EC675AD498AFEE9B6960B3AABE60000 (AES-CM input)
               30CBBC08863D8C85D49DB34A9AE17AC6 (AES-CM ouptut)
cipher salt:   30CBBC08863D8C85D49DB34A9AE1

Export check Key (auth Key), auth Key length is 94 bytes:

packet_index/kdr:                000000000000
label:                         01
master salt:     0EC675AD498AFEEBB6960B3AABE6
xor:             0EC675AD498AFEEAB6960B3AABE6     (x, KDF input)
x*2^16:          0EC675AD498AFEEAB6960B3AABE60000 (AES-CM input)
auth key                           AES input blocks
CEBE321F6FF7716B6FD4AB49AF256A15   0EC675AD498AFEEAB6960B3AABE60000
6D38BAA48F0A0ACF3C34E2359E6CDBCE   0EC675AD498AFEEAB6960B3AABE60001
E049646C43D9327AD175578EF7227098   0EC675AD498AFEEAB6960B3AABE60002
6371C10C9A369AC2F94A8C5FBCDDDC25   0EC675AD498AFEEAB6960B3AABE60003
6D6E919A48B610EF17C2041E47403576   0EC675AD498AFEEAB6960B3AABE60004
6B68642C59BBFC2F34DB60DBDFB2       0EC675AD498AFEEAB6960B3AABE60005

For the introduction of AES-CM, refer to AES-CM.

So far, we have obtained the Session Key required for SRTP/SRTCP encryption and authentication: cipher Key, auth Key, salt Key.

SRTP serial number management

16-bit is used in the RTP packet structure definition to describe the sequence number. Considering the need of anti-replay attack, message integrity verification, data encryption and Sessionkey export, in the SRTP protocol, the serial number of the SRTP packet is recorded in an implicit way, packet_index is recorded, and I is used to mark packet\_index.

For the sender, I is calculated as follows:

i = 2^16 * ROC + SEQ

Where SEQ is the 16-bit packet sequence number described in the RTP packet. ROC (rollover couter) is the RTP packet number (SEQ) flip count, that is, whenever SEQ/2^16=0, the ROC count is increased by 1. The initial ROC value is 0.

For the receiver, considering the influence of packet loss and out-of-order factors, in addition to maintaining ROC, it also needs to maintain a currently received maximum packet number S_L. When a new packet arrives, the receiver needs to estimate the actual SRTP packet number corresponding to the current packet. The initial value of ROC is 0, and the initial value of S \ _L is the SEQ for receiving the first SRTP packet. The received SRTP serial number I is then estimated by the following formula:

i = 2^16 * v + SEQ

Where, the possible value of V is {ROC-1, ROC, ROC+1}, ROC is the ROC maintained locally by the receiver, and SEQ is the serial number of receiving SRTP. V is calculated by ROC-1, ROC, ROC+1, and then compared with 2^16*ROC + s_l, which is closer, so V is calculated by the corresponding value. After the completion of SRTP decryption and integrity check, update ROC and S \ _L in the following three cases:

1. V = ROC-1, ROC and S \ _L are not updated.

2. If SEQ > s\_1, update s\_l = SEQ.

3. V = ROC + 1, ROC = V = ROC + 1, S \ _L = Seq.

A more intuitive description of the code:

if (s_l < 32768)
    if (SEQ - s_l > 32768)
        set v to (ROC-1) mod 2^32
        set v to ROC
    if (s_l - 32768 > SEQ)
        set v to (ROC+1) mod 2^32
        set v to ROC
return SEQ + v*65536

SRTCP serial number management

RTCP has no field describing sequence number. SRTCP sequence number is described in the SRTCP package, using 31-bits. See SRTCP format for details.

Serial number and communication duration

You can see that SRTP has a maximum sequence number of 2^48 and SRTCP has a maximum sequence number of 2^16. In most applications (assuming at least one RTCP packet for every 128000 RTP packets), the SRTCP serial number will reach the upper limit first. At 200 SRTCP packets per second, SRTCP’s 2^31 sequence number space is sufficient to ensure approximately 4 months of communication.

Prevent replay attacks

The attacker saves the intercepted SRTP/SRTCP packet, and then sends it back to the network, thus realizing the replay of the packet. SRTP recipients protect against this attack by maintaining a ReplayList. Theoretically, the Replay List should hold the index of the sequence number of all packets received and verified. In reality, ReplayList uses a sliding window to prevent replay attacks. Use srtp-window-size to describe the SIZE of the sliding WINDOW.

SRTP protects against replay attacks

In the section of serial number management, we detail the method by which the receiver can estimate the packet_index of SRTP packets according to the SEQ, ROC, s\_l of the received SRTP packets. At the same time, the maximum serial number that the receiver has received for the SRTP packet is denoted as LOCAL_PACKET_INDEX. Calculate the difference delta:

delta =  packet_index - local_packet_index

There are three cases as follows:

1. Delta > 0: New packet received.

2. Delta <-(srtp-window-size-1) < 0: The sequence number of the received packet is less than the minimum number required for the replay WINDOW. When libSRTP receives such a packet, it returns srtp_err_status_replay_old=10, indicating that it received the old playback packet.

3. Delta <0, delta>= -(srtp-window-size-1): Receive packet in replay WINDOW If a corresponding package is found in the ReplayList, it is a replay package with an index duplicate. When libSRTP receives such a packet, it returns srtp_err_status_replay_fail=9. Otherwise, an out-of-order packet is received.

The following figure more visually illustrates the three areas of replay protection:

Srtp-window-size = 64 The application can be set to larger values as needed, and libsrtp will round up to an integer multiple of 32. For example, in WebRTC
SRTP-WINDOW-SIZE= 1024. Users can be adjusted according to the need, but to achieve the purpose of preventing replay attacks.

SRTCP protects against replay attacks

In SRTCP, the packet index is explicitly given. In libsrtp, SRTCP has a replay-proof window size of 128. Use WINDOW_START to record the start sequence number for anti-replay attacks. The check steps for SRTCP against replay attack are as follows:

1. INDEX > WINDOW \ _START + 128: Receive new SRTCP packet

2. Index < window\_start: Serial number of received packets is on the left side of the replay window, so we can assume that we received older packets. When libsrtp receives such a packet, it returns to srtp_err_status_replay_old=10.

3. Replay \_list\_index = index-windwo \_start: In ReplayList, the identifier bit corresponding to replay\_list\_index is 1, indicating that the packet has been received, and libsrtp returns srtp_err_status_replay_fail=9. The corresponding identifier bit is 0, indicating that the packet is received out of order.

Encryption and validation algorithms

In SRTP, AES encryption algorithm is used in CTR (Counter Mode) mode. CTR mode generates a continuous key stream by incrementing an encryption Counter. The Counter can be any key that can guarantee no repeated output for a long time. According to the different counting methods, it can be divided into the following two types:

• AES-ICM: ICM Mode (Integer Counter Mode), using Integer counting operations.

• AES-GCM: GCM Mode (Galois Counter Mode, based on Galois field counting Mode). Counting operations are defined in the Galois field.

In SRTP, AES-ICM is used to complete the encryption algorithm, while HMAC-SHA1 is used to complete the MAC calculation. The integrity of the data is verified. Encryption and MAC calculation need to be completed in two steps. AES-GCM is based on the idea of AEAD (Authenticated-Encryption with Associated Data). It calculates MAC value while encrypting Data and realizes one step to complete the calculation of Encryption and verification information. The use of this AES-ICM and AES_GSM is described below.


The figure above describes the encryption and decryption process of AES-ICM. K in the figure is the SessionKey exported through KDF. Encryption and encryption are both done by encrypting Counter and XOR with clear text P to get encrypted data C, and vice versa, XOR with ciphertext C to get clear text data P. For security reasons, Counter generates SSRCs that depend on the Session Salt, the packet index, and the packet. Counter is a count of 128-bits, which is defined as follows:

one byte <--> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |00|00|00|00| SSRC  | packet index | b_c |---+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ v | salt (k_s) |00|00|->(+) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | v +-------------+ encryption key (k_e) -> | AES encrypt | +-------------+ | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | keystream block |<--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+


Hash-based Message Authentication Code (HMAC) is a specially computed message authentication code (MAC) that uses a cryptographic Hash function in combination with an encryption key to ensure data integrity. It can also be used to authenticate a message. HMAC uses a standard algorithm to mix the key into the hash calculation. The HMAC encryption implementation is as follows:

HMAC(K,M) = H ( (K XOR opad ) + H( (K XOR ipad ) + M ) )

· H: Hash algorithms, such as MD5, SHA-1, SHA-256.

· B: The length of a block byte, which is the basic unit of hash operation. B = 64 here.

· L: The length of bytes calculated by the hash algorithm. (L=16 for MD5, L=20 for sha-1).

· K: Shared key. The length of K can be arbitrary, but for security reasons, the length of K is recommended as >b.

When the length of K is greater than B, the hash algorithm will be executed on K first, and the result of the length of L will be used as the new shared key. If the length of K

· M: Content to be certified.

· OPAD: The external fill constant is 0x5C repeated B times.

· Ipad: Internal fill constant is 0x36 repeated B times.

· XOR: XOR operation.

· + : represents the “connect” operation.

The calculation steps are as follows:

1. Pile 0x00 after K until its length is equal to B.

2. Make XOR of the result of Step 1 with iPad.

3. The information to be encrypted is attached to the result of Step 2.

4. Call the H method.

5. XOR the result of Step 1 with OPAD.

6. Attach the results from Step 4 to the results from Step 5.

7. Call the H method.

SRTP and SRTCP are used to calculate Authentication Tag. The K used corresponds to the RTP Auth Key and RTCP Auth Key described in the Key management section. The Hash algorithm used is SHA-1. The Authentication Tag is 80 bits long.

When calculating SRTP, the content M to be verified is:

M = Authenticated Portion + ROC

Among them, + represents the “join” operation, and Authenticated Portion is given in the structure diagram of SRTP.

When calculating SRTCP, the content M to be authenticated is:

M=Authenticated Portion

Among them, Authenticated Portion is given in the structure diagram of SRTCP.

The Encrypted Portion of SRTP/SRTCP is calculated by using the Authenticated Serverization algorithm.


AES-GCM uses the counter mode to encrypt data, which can be effectively pipelined, and GCM authentication uses operations that are particularly well suited for effective implementation in hardware. The theoretical knowledge of GCM is detailed in GCM-Spec, and the Hardware implementation is detailed in Section4.2 Hardware.

The application of AES-GCM in SRTP encryption is described in detail in RFC7714. Key management and serial number management are the same as described in this article, with the following cavetions:

  1. AES-GCMAs an AEAD (Authenticated Encryption with Associated Data) Encryption algorithm, what are the inputs and outputs that correspond toSRTP/SRTCPThe package structure is understood.
  1. CounterThe calculation method is different from that described in AES-ICM, which needs to be paid more attention to.

The libsrtp has implemented AES-GCM. If you are interested, you can read it in combination with the code.

The use of libsrtp

LibSRTP is a widely used open source project for SRTP/SRTCP encryption. The most commonly used APIs are as follows:

1. Srtp_init, initialize the SRTP library, initialize the internal encryption algorithm, before using SRTP, must be called.

2. Srtp_create, srtp_session, srtp_session, srtp_session, srtp_session, srtp_session, srtp_session, srtp_session, srtp_session, srtp_session

Srtp_unprotect /srtp_protect, RTP packet encryption and decryption interface.

Srtp_protect_rtcp /srtp_unprotect_rtcp, RTCP packet encryption and decryption interface.

Srtp_set_stream_roc /srtp_get_stream_roc, which sets and retrievesthe ROC of a stream. These two interfaces have been added in the latest version 2.3.

The important structure srtp_policy_t, used to initialize the encryption and decryption parameters, is used in srtp_create. The following parameters need to be concerned:

1. The MasterKey and MasterSalt obtained after DTLS negotiation are transmitted to LIBSRTP through this structure for the generation of session key.

2. WINDOW_SIZE, which corresponds to the window size of the SRTP anti-replay attack we described earlier.

3. Allow_repeat_tx, whether to allow retransmission of packets with the same number.

SRS is a new generation of real-time communication server. If you are interested in libsrtp, you can quickly set up a debugging environment on the machine, carry out relevant tests, and have a deeper understanding of the relevant algorithms.


Through the in-depth and detailed interpretation of SRTP/SRTCP related principles, this paper answers the problems encountered in the use of LIBSRTP, hoping to help the students in the field of real-time audio and video communication.


RFC3711:  SRTP

RFC6904: Encrypted SRTP Header Extensions

Integer Counter Mode

RFC-6188: The Use of AES-192 and AES-256 in Secure RTP

RFC7714:  AES-GCM for SRTP

RFC2104:  HMAC

RFC2202: Test Cases for HMAC-MD5 and HMAC-SHA-1


“Video cloud technology” is your most noteworthy public account of audio and video technology. Every week, you will push practical technical articles from the front line of Ali Cloud, where you can exchange ideas with first-class engineers in the field of audio and video. Public number backstage reply [technology] can join Ali cloud video cloud technology exchange group, and the author together to discuss audio and video technology, access to more industry latest information.

Copyright notice: The content of this article is contributed by Aliyun real-name registered users. The copyright belongs to the original author. Aliyun developer community does not own the copyright and does not bear the corresponding legal liability. For specific rules, please refer to User Service Agreement of Alibaba Cloud Developer Community and Guidance on Intellectual Property Protection of Alibaba Cloud Developer Community. If you find any suspected plagiarism in the community, fill in the infringement complaint form to report, once verified, the community will immediately delete the suspected infringing content.