through
DTLS consultationAfter the RTC communication is completed between the two parties
MasterKey
MasterSaltThe negotiations. Next, we continue to analyze how to use the exchanged keys to encrypt RTP and RTCP in WEBRTC, so as to realize the secure transmission of data. At the same time, this paper will answer the questions encountered in the use of libsrtp, such as, what is ROC and why is ROC 32-bits? Why return error_code=9 and error_code=10? Do the exchanged keys have a life cycle, and if so, how long? Suggested reading before reading this article
DTLS consultation paper, the combination of the two, the effect is better!

The author is |

Verify | Ty 1

The problem to solve

The RTP/RTCP protocol does not provide any protection for its payload data. Therefore, if an attacker grabs audio and video data through a packet capture tool such as Wireshark, the tool can directly play the audio and video stream out, which is very scary thing.

In WebRTC, to prevent this from happening, instead of using RTP/RTCP directly, we use SRTP/SRTCP, the secure RTP/RTCP protocol. WebRTC uses the well-known libsrtp library to convert RTP/RTCP data into SRTP/SRTCP data.

Problems to be addressed by SRTP:

  • rightRTP/RTCPThe payload of the data is encrypted to ensure data security;
  • ensureRTP/RTCPPacket integrity while protecting against replay attacks.

SRTP/SRTCP structure

SRTP structure

As can be seen from the SRTP structure diagram:

  1. Encryption partEncrypted PortionBy thepayload.RTP paddingRTP pad countPart of it. This is what we usually mean by encrypting only the RTP payload data.
  1. Part to be checkedAuthenticated PortionBy theRTP Header.RTP Header extensionEncrypted PortionPart of it.

Normally, only the RTP load data is required to be encrypted. If you need to encrypt the RTP header extension, RFC6904 provides a detailed solution, which is also implemented in libsrtp.

SRTCP structure

As can be seen from the SRTCP structure diagram:

  1. Encryption partEncrypted PortionforRTCP HeaderAnd then the next part, rightCompound RTCPSame thing.
  2. E-flag explicitly gives whether the RTCP packet is encrypted or not. (PS: How can an RTP packet be encrypted?)
  3. SRTCP indexThe display gives the serial number of the RTCP packet to protect against replay attacks. PS: Does the sequence number of a 16-bits RTP packet protect against replay attacks?
  4. Part to be checkedAuthenticated PortionBy theRTCP HeaderEncrypted PortionPart of it.

With an initial understanding of the structure of SRTP and SRTCP, let’s look at how Encrypted and Authenticated Serverings were obtained.

The Key management

In the SRTP/SRTCP protocol, the SRTP/SRTCP Session of a communication participant is identified by using the binary

, which is called the SRTP/SRTCP Session.

An SRTP/SRTCP Session consists of multiple streams, using triples

. A description of the encryption and decryption parameters for each stream is called the Cryptographic Context.
,>

The Cryptographic Context of each stream contains the following parameters:

  • SSRC: The SSRC used by Stream.
  • Cipher Parameter: Key, salt, algorithm description (type, Parameter, etc.) used for encryption and decryption.
  • Authentication Parameter: Integrity using Key, salt, algorithm description (type, Parameter, etc.).
  • Anti-replay Data: Prevent Replay of cached Data information, such as ROC, maximum sequence number, etc.

During the SRTP/SRTCP Session, each Stream uses its own encryption Key and Authentication Key. These keys are all used within the same Session, called Session keys. These Session keys are derived by using KDF (Key Derivation Function) for Master keys.

KDF is used to export Session keys. By default, KDF is used to encrypt and decrypt Session keys. For example, after the completion of DTLS, the negotiated SRTP encryption algorithm Profile is:

SRTP_AES128_CM_HMAC_SHA1_80
         cipher: AES_128_CM
         cipher_key_length: 128
         cipher_salt_length: 112
         maximum_lifetime: 2^31
         auth_function: HMAC-SHA1
         auth_key_length: 160
         auth_tag_length: 80

The corresponding KDF is AES128_CM. The export process of Session Key is shown in the figure below:

The export of the Session Key depends on the following parameters:

  • key_label: Depending on the type of Key exported,key_labelValues are as follows:

  • MASTER_KEY: The Key obtained by negotiation after DTLS is completed.
  • MASTER_SALT: After DTLS is completed, the negotiated Salt is obtained.
  • Packet_index: Package number of RTP/RTCP. SRTP uses the 48-bits implicit package required, whereas SRTCP uses the 31-bits packet sequence number. Refer to serial number management.
  • Key_Derivation_Rate: Key export rate, denoted as KDR. The default value is 0, and the Key export is performed once. Value range{{1, 4-trichlorobenzene,... 24}, 2 ^. inkey_derivation_rate>0In this case, a key export is performed before encryption, followed by a key exportpacket_index/key_derivation_rate > 0Execute the Key export.
r = packet_index / kdr
key_id = label || r
x = key_id XOR master_salt
key = KDF(master_key, x)

C = A/B=0


The meaning of | | : it means the connection. A, B, C using the network byte order said, C = A | | B, C of the high byte as A, low byte A to B.


XOR: is an XOR operation, which is aligned according to the low byte bits.

The following uses AES128_CM to illustrate the export process of the Session Key, assuming that DTLS negotiation is obtained:

master_key:  E1F97A0D3E018BE0D64FA32C06DE4139   // 128-bits
master_salt: 0EC675AD498AFEEBB6960B3AABE6           // 112-bits

Export cipher Key (cipher Key):

packet_index/kdr:              000000000000
label:                       00
master_salt:   0EC675AD498AFEEBB6960B3AABE6
-----------------------------------------------
xor:           0EC675AD498AFEEBB6960B3AABE6     (x, KDF input)
x*2^16:        0EC675AD498AFEEBB6960B3AABE60000 (AES-CM input)
cipher key:    C61E7A93744F39EE10734AFE3FF7A087 (AES-CM output)

Export SALT Key (cipher SALT):

packet_index/kdr:              000000000000
label:                       02
master_salt:   0EC675AD498AFEEBB6960B3AABE6
----------------------------------------------
xor:           0EC675AD498AFEE9B6960B3AABE6     (x, KDF input)
x*2^16:        0EC675AD498AFEE9B6960B3AABE60000 (AES-CM input)
               30CBBC08863D8C85D49DB34A9AE17AC6 (AES-CM ouptut)
cipher salt:   30CBBC08863D8C85D49DB34A9AE1

Export check Key (auth Key), auth Key length is 94 bytes:

packet_index/kdr:                000000000000
label:                         01
master salt:     0EC675AD498AFEEBB6960B3AABE6
-----------------------------------------------
xor:             0EC675AD498AFEEAB6960B3AABE6     (x, KDF input)
x*2^16:          0EC675AD498AFEEAB6960B3AABE60000 (AES-CM input)

auth key                           AES input blocks
CEBE321F6FF7716B6FD4AB49AF256A15   0EC675AD498AFEEAB6960B3AABE60000
6D38BAA48F0A0ACF3C34E2359E6CDBCE   0EC675AD498AFEEAB6960B3AABE60001
E049646C43D9327AD175578EF7227098   0EC675AD498AFEEAB6960B3AABE60002
6371C10C9A369AC2F94A8C5FBCDDDC25   0EC675AD498AFEEAB6960B3AABE60003
6D6E919A48B610EF17C2041E47403576   0EC675AD498AFEEAB6960B3AABE60004
6B68642C59BBFC2F34DB60DBDFB2       0EC675AD498AFEEAB6960B3AABE60005

The introduction of AES-CM, for reference
AES-CM.

So far, we have obtained the Session Key required for SRTP/SRTCP encryption and authentication: cipher Key, auth Key, salt Key.

Serial number management

SRTP serial number management

16-bit is used in the RTP packet structure definition to describe the sequence number. Considering the need of anti-replay attack, message integrity check, encrypt data and export of Sessionkey, in SRTP protocol, the serial number of SRTP packet is recorded in an implicit way packet_index, and the serial number of packet_index is identified by I.

For the sender, I is calculated as follows:

i = 2^16 * ROC + SEQ

Where SEQ is the 16-bit packet sequence number described in the RTP packet. ROC (rollover couter) is the RTP packet number (SEQ) flip count, that is, whenever SEQ/2^16=0, the ROC count is increased by 1. The initial ROC value is 0.

For the receiver, considering the influence of packet loss and out-of-order factors, in addition to maintaining ROC, it also needs to maintain a currently received maximum packet number S_L. When a new packet arrives, the receiver needs to estimate the actual SRTP packet number corresponding to the current packet. The initial value of ROC is 0, and the initial value of S_L is SEQ when the first SRTP packet is received. The received SRTP serial number I is then estimated by the following formula:

i = 2^16 * v + SEQ

Where, the possible value of V is {ROC-1, ROC, ROC+1}, ROC is the ROC maintained locally by the receiver, and SEQ is the serial number of receiving SRTP. V is calculated by ROC-1, ROC, ROC+1, and then compared with 2^16*ROC + s_l, which is closer, so V is calculated by the corresponding value. After SRTP decryption and integrity check is completed, ROC and SL are updated in the following three situations:

  1. V = ROC-1, ROC and S_L are not updated.
  2. V = ROC, if SEQ > s_1, then update s_l = SEQ.
  3. V = ROC + 1, ROC = V = ROC + 1, S_L = SEQ.

A more intuitive description of the code:

if (s_l < 32768)
    if (SEQ - s_l > 32768)
        set v to (ROC-1) mod 2^32
    else
        set v to ROC
    endif
else
    if (s_l - 32768 > SEQ)
        set v to (ROC+1) mod 2^32
    else
        set v to ROC
    endif
endif
return SEQ + v*65536

SRTCP serial number management

RTCP has no field describing sequence number. SRTCP sequence number is described in the SRTCP package, using 31-bits. See SRTCP format for details.

Serial number and communication duration

You can see that SRTP has a maximum sequence number of 2^48 and SRTCP has a maximum sequence number of 2^16. In most applications (assuming at least one RTCP packet for every 128000 RTP packets), the SRTCP serial number will reach the upper limit first. At 200 SRTCP packets per second, SRTCP’s 2^31 sequence number space is sufficient to ensure approximately 4 months of communication.

Prevent replay attacks

The attacker saves the intercepted SRTP/SRTCP packet, and then sends it back to the network, thus realizing the replay of the packet. SRTP recipients protect against this attack by maintaining a ReplayList. Theoretically, the Replay List should hold the index of the sequence number of all packets received and verified. In reality, ReplayList uses a sliding window to prevent replay attacks. Use srtp-window-size to describe the SIZE of the sliding WINDOW.

SRTP protects against replay attacks

In the section of serial number management, we detail the method that the receiver can estimate the packet_index of SRTP packets according to the SEQ, ROC, and s_l of the received SRTP packets. At the same time, the maximum serial number that the receiver has received for the SRTP packet is denoted as LOCAL_PACKET_INDEX. Calculate the difference delta:

delta =  packet_index - local_packet_index

There are three cases as follows:

  1. Delta > 0: New packet received.
  2. Delta <-(srtp-window-size-1) < 0: Indicates that the sequence number of the received packet is less than the minimum number required for the replay WINDOW. When libSRTP receives such a packet, it returns itsrtp_err_status_replay_old=10, indicating receipt of an old playback package.
  3. delta <0, delta>= -(SRTP-WINDOW-SIZE - 1): Represents the package received in the replay window. If a corresponding package is found in the ReplayList, it is a replay package with an index duplicate. When libSRTP receives such a packet, it returns itsrtp_err_status_replay_fail=9. Otherwise, an out-of-order packet is received.

The following figure more visually illustrates the three areas of replay protection:

Srtp-window-size = 64 The application can be set to larger values as needed, and libsrtp will round up to an integer multiple of 32. For example, in WebRTC
SRTP-WINDOW-SIZE = 1024. Users can be adjusted according to the need, but to achieve the purpose of preventing replay attacks.

SRTCP protects against replay attacks

In SRTCP, the packet index is explicitly given. In libsrtp, SRTCP has a replay-proof window size of 128. Use WINDOW_START to record the start sequence number for anti-replay attacks. The check steps for SRTCP against replay attack are as follows:

  1. INDEX > WINDOW_START + 128: New receivedSRTCPThe package.
  2. INDEX < WINDOW_START: The serial number of received packets is on the left side of the playback window, so we can assume that we received older packets. When libsrtp receives such a packet, it returns tosrtp_err_status_replay_old=10.
  3. Replay_list_index = index-windwo_start: In ReplayList, the identifier bit corresponding to replay_list_index is 1, indicating that the packet has been received, and libsrtp returnssrtp_err_status_replay_fail=9. The corresponding identifier bit is 0, indicating that the packet is received out of order.

Encryption and validation algorithms

In SRTP, AES encryption algorithm is used in CTR (Counter Mode) mode. CTR mode generates a continuous key stream by incrementing an encryption Counter. The Counter can be any key that can guarantee no repeated output for a long time. According to the different counting methods, it can be divided into the following two types:

  • AES-ICM: ICM Mode (Integer Counter Mode), using Integer counting operation.
  • AES-GCM: GCM Mode (Galois Counter Mode, based on Galois field counting Mode), counting operation is defined in Galois field.

In SRTP, AES-ICM is used to complete the encryption algorithm, while HMAC-SHA1 is used to complete the MAC calculation. The integrity of the data is verified. Encryption and MAC calculation need to be completed in two steps. AES-GCM is based on the idea of AEAD (Authenticated-Encryption with Associated Data). It calculates MAC value while encrypting Data and realizes one step to complete the calculation of Encryption and verification information. The use of this AES-ICM and AES_GSM is described below.

AEC, ICM

The figure above describes the encryption and decryption process of AES-ICM. K in the figure is the SessionKey exported through KDF. Encryption and encryption are both done by encrypting Counter and XOR with clear text P to get encrypted data C, and vice versa, XOR with ciphertext C to get clear text data P. For security reasons, Counter generates SSRCs that depend on the Session Salt, the packet index, and the packet. Counter is a count of 128-bits, which is defined as follows:

one byte <--> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |00|00|00|00| SSRC  | packet index | b_c |---+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ v | salt (k_s) |00|00|->(+) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | v +-------------+ encryption key (k_e) -> | AES encrypt | +-------------+ | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | keystream block |<--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Where, b_c is the count of Counter. The initial value of b_c is 0, corresponding to Counter 0. For every 128-bits encrypted data, b_c is increased by 1 as the next Counter. Based on the index of an RTP packet, the SSRC calculates the Counter to form a keystream. Each Counter is a keystream block. The Encrypted Portion of the RTP/RTCP load is obtained by using AES-ICM algorithm.

HMAC – SHA1

Hash-based Message Authentication Code (HMAC) is a specially computed message authentication code (MAC) that uses a cryptographic Hash function in combination with an encryption key to ensure data integrity. It can also be used to authenticate a message. HMAC uses a standard algorithm to mix the key into the hash calculation. The HMAC encryption implementation is as follows:

HMAC(K,M) = H ( (K XOR opad ) + H( (K XOR ipad ) + M ) )
  • H: Hash algorithms, such as MD5, SHA-1, SHA-256.
  • B: The length of a block byte, which is the basic unit of hash operation. B = 64 here.
  • L: The length of bytes calculated by the hash algorithm. (L=16 for MD5, L=20 for sha-1).
  • K: Shared key. The length of K can be arbitrary, but for security reasons, it is recommended that the length of K be >b. When the length of K is greater thanB, the hash algorithm is first executed on K, and the resulting L length result is used as the new shared key. If the length of K is less thanB, then K is padded with 0x00 until it is equal to length B.
  • M: The content to be certified.
  • Opad: The external fill constant is 0x5C repeated B times.
  • Ipad: Internal fill constant is 0x36 repeated B times.
  • XOR: Exclusive-OR operation.
  • + : represents the “join” operation.

The calculation steps are as follows:

  1. Pile 0x00 after K until its length is equal to B.
  2. XOR the result of step 1 with the iPad.
  3. The information to be encrypted is attached to the result of Step 2.
  4. Call the H method.
  5. XOR the result of step 1 with opad.
  6. Attach the result of step 4 to the result of step 5.
  7. Call the H method.

SRTP and SRTCP are used to calculate Authentication Tag. The K used corresponds to the RTP Auth Key and RTCP Auth Key described in the Key management section. The Hash algorithm used is SHA-1. The Authentication Tag is 80 bits long.

When calculating SRTP, the content M to be verified is:

M = Authenticated Portion + ROC

Among them, + represents the “join” operation, and Authenticated Portion is given in the structure diagram of SRTP.

When calculating SRTCP, the content M to be authenticated is:

M=Authenticated Portion

Among them, Authenticated Portion is given in the structure diagram of SRTCP.

The Encrypted Portion of SRTP/SRTCP is calculated by using the Authenticated Serverization algorithm.

AES – GCM

AES-GCM uses the counter mode to encrypt data, which can be effectively pipelined, and GCM authentication uses operations that are particularly well suited for effective implementation in hardware. The theoretical knowledge of GCM is detailed in GCM-Spec, and the Hardware implementation is detailed in Section4.2 Hardware.

The application of AES-GCM in SRTP encryption is described in detail in RFC7714. Key management and serial number management are the same as described in this article, with the following cavetions:

  1. AES-GCMAs a kind ofAEAD (Authenticated Encryption with Associated DataWhat are the inputs and outputs that correspond toSRTP/SRTCPThe package structure is understood.
  2. CounterThe calculation method is different from that described in AES-ICM, which needs to be paid more attention to.

The libsrtp has implemented AES-GCM. If you are interested, you can read it in combination with the code.

The use of libsrtp

LibSRTP is a widely used open source project for SRTP/SRTCP encryption. The most commonly used APIs are as follows:

  1. srtp_init, initializes the SRTP library, initializes the internal encryption algorithm, before using SRTP, must be called.
  2. srtp_create, create srtp_session, can be combined with this article introduced session, session key and other concepts together to understand.
  3. srtp_unprotect/srtp_protect, RTP packet encryption and decryption interface.
  4. srtp_protect_rtcp/srtp_unprotect_rtcp, RTCP packet encryption and decryption interface.

    5. srtp_set_stream_roc/srtp_get_stream_roc, set and retrieve the ROC of the stream. These two interfaces have been added in the latest 2.3 release.

The important structure srtp_policy_t, used to initialize the encryption and decryption parameters, is used in srtp_create. The following parameters need to be concerned:

  1. After DTLS negotiationMasterKeyMasterSaltThis structure is passed to libsrtp for session key generation.
  2. window_size, corresponding to the window size of the SRTP anti-replay attack we described earlier.
  3. allow_repeat_txWhether to allow retransmission of packets with the same serial number.

SRS is a new generation of real-time communication server. If you are interested in libsrtp, you can quickly set up a debugging environment on the machine, carry out relevant tests, and have a deeper understanding of the relevant algorithms.

conclusion

Through the in-depth and detailed interpretation of SRTP/SRTCP related principles, this paper answers the problems encountered in the use of LIBSRTP, hoping to help the students in the field of real-time audio and video communication.

reference

  1. RFC3711: SRTP
  2. RFC6904: Encrypted SRTP Header Extensions
  3. Integer Counter Mode
  4. RFC-6188: The Use of AES-192 and AES-256 in Secure RTP
  5. RFC7714: AES-GCM for SRTP
  6. RFC2104: HMAC
  7. RFC2202: Test Cases for HMAC-MD5 and HMAC-SHA-1
  8. GCM-SPEC: GCM

“Video cloud technology” is your most noteworthy public account of audio and video technology. Every week, you will push practical technical articles from the front line of Ali Cloud, where you can exchange ideas with first-class engineers in the field of audio and video. Public number backstage reply [technology] can join Ali cloud video cloud technology exchange group, and the author together to discuss audio and video technology, access to more industry latest information.