UDP, short for User Datagram Protocol, is a connectionless transport layer Protocol in the Open System Interconnection (OSI) reference model. It provides a transaction-oriented simple unreliable information transfer service, located at the transport layer of the TCP/IP protocol model, as shown below:

That is, UDP is built on top of the IP protocol (network layer), which is used to distinguish between different hosts on the network (IP protocol source analysis), while UDP is used to distinguish between different processes on the same host sending (receiving) network data, as shown in the following figure:

As shown in the figure above, UDP distinguishes packets of different processes by port numbers.

UDP header

Let’s take a look at the UDP protocol header, as shown below:

The figure shows that the UDP header consists of four fields: source port, destination port, packet length, and checksum.

The source port is used to indicate the local process, while the destination port is used to indicate the remote process. The packet length indicates the total length of the UDP packet (including the UDP header and the data length), and the checksum is used to verify whether the packet is corrupted during transmission.

Let’s see how the UDP header is represented in the kernel, as follows:

struct udphdr { __u16 source; // Source port __u16 dest; // Destination port __u16len; // Packet length __u16 check; // checksum};Copy the code

As you can see, the fields in the UDPHDR structure correspond to the fields in the UDP header structure diagram. Finally, let’s look at the position of the UDP header in the packet, as shown below:

The following we mainly through UDP packet sending and receiving two processes to analyze the UDP in the kernel implementation principle.

UDP Packet sending

The application layer invokes send() or write() system call to transmit data to the transport layer protocol for processing, as shown in the following figure:

As you can see from the figure above, a user-mode application calling the send() system call triggers a call to the kernel-mode sys_send() kernel function, which ultimately calls the inet_sendmsg() function to send data.

The inet_sendmsg() function selects a different data sending interface based on the transport layer protocol used by the user, such as UDp_sendmsg ().

Let’s analyze the implementation of udp_sendmsg() function of UDP protocol, the code is as follows (because the implementation of UDp_sendmsg () function is complicated, so we analyze it in sections) :

int udp_sendmsg(struct sock *sk, struct msghdr *msg, int len)
{
    int ulen = len + sizeof(struct udphdr);
    struct ipcm_cookie ipc;
    struct udpfakehdr ufh;
    struct rtable *rt = NULL;
    int free = 0;
    int connected = 0;
    u32 daddr;
    u8  tos;
    int err;
Copy the code

The udp_sendmsg() function has the following parameters:

  • Sk: indicates the Socket object.

  • MSG: Data entity to be sent, of type MSGHDR structure.

  • Len: Length of data to be sent.

The above code mainly defines some local variables such as:

  • The ULen variable is the total length of the data to be sent (the sum of the UDP header length and data length).

  • The RT variable represents the routing information of data transmission, and its type is Rtable.

  • The UFH variable is the context when the IP_build_xmit () function at the IP layer is called. It is mainly used to build the UDP protocol header, which is of udpFAkehDR type.

We interface the udp_sendmsg() function:

If (MSG ->msg_name) {struct sockaddr_in *usin = (struct sockaddr_in *usin = (struct sockaddr_in *usin = (struct sockaddr_in *usin) sockaddr_in*)msg->msg_name; if (msg->msg_namelen < sizeof(*usin)) return -EINVAL; if (usin->sin_family ! = AF_INET) { if (usin->sin_family ! = AF_UNSPEC) return -EINVAL; } // Set the destination IP address and port for receiving data ufh.daddr = usin->sin_addr.s_addr; ufh.uh.dest = usin->sin_port; if (ufh.uh.dest == 0) return -EINVAL; } else { if (sk->state ! = TCP_ESTABLISHED) return -ENOTCONN; ufh.daddr = sk->daddr; Ufh.uh. Dest = sk->dport; Connected = 1; }Copy the code

The above code does the following:

  • If the user provides the destination IP address and port when sending data, the user-provided destination IP address and port are copied into the UFH variable.

  • Otherwise, copy the destination IP address and port bound to the Socket object into the UFH variable and set connected to 1.

Let’s continue analyzing the udp_sendmsg() function as follows:

if (connected) rt = (struct rtable*)sk_dst_check(sk, 0); Err = ip_route_output(&rt, daddr, err = ip_route_output(&rt, daddr, err = ip_route_output(&rt, daddr, err) ufh.saddr, tos, ipc.oif); if (err) goto out; err = -EACCES; if (rt->rt_flags & RTCF_BROADCAST && ! sk->broadcast) goto out; if (connected) sk_dst_set(sk, dst_clone(&rt->u.dst)); // Set the routing information object cache}Copy the code

The above code is relatively simple, first calling sk_dst_check() to see if the routing information object is cached, and if so, to use the routing information object directly. Otherwise, call ip_route_output() to get the routing information object, and call sk_dst_set() to set the routing information object cache.

The routing information object specifies the information about the next hop host (usually a gateway) during data transmission. With the information about the next hop host, the data can be forwarded to the next hop host, and then the next hop host can continue sending the data.

Let’s continue analyzing the udp_sendmsg() function as follows:

ufh.saddr = rt->rt_src; // Set the source IP address if (! Ipc.addr) // If the destination IP address is not provided, use the destination IP address ufh.daddr = ipc.addr = rt->rt_dst; ufh.uh.len = htons(ulen); ufh.uh.check = 0; ufh.iov = msg->msg_iov; ufh.wcheck = 0; Err = ip_build_xmit(sk, (sk->no_check == UDP_CSUM_NOXMIT? udp_getfrag_nosum : udp_getfrag), &ufh, ulen, &ipc, rt, msg->msg_flags); out: ip_rt_put(rt); . return err; }Copy the code

The above code basically copies the source IP address of the routing information object into the UFH variable and then calls the ip_build_xmit() function to do the rest of the data sending. The first parameter of the ip_build_xmit() function is used to copy the UDP header and payload data to the function pointer of the packet. The IP layer calls this function to copy the UDP header and payload data to the packet.

The ip_build_xmit() function is the implementation of the IP protocol layer, which is not explained here. You can refer to this article: IP protocol source code analysis.

In general, the udp_sendmsg() function builds the UDP header for the packet to be sent, and then sends the packet to the IP layer for the following operation, so the sending process of UDP protocol is relatively simple.

UDP Packet Receiving

When the nic device receives the packet, it submits it to the kernel protocol stack for processing. The kernel protocol stack processes packets from bottom to top, as shown in the figure below:

In other words, the physical layer processes the packet and sends it to the link layer for processing, which in turn sends it to the network layer for processing, and so on.

Therefore, after the network layer (IP protocol) processes the data packets, the data packets are transferred to the transport layer for processing. In this document, the transport layer protocol is UDP, so this document mainly describes how to process data packets using UDP.

After the IP protocol layer processes the packet, the udp_rcv() function is called if the protocol field in the IP header indicates UDP. Udp_rcv ();

int udp_rcv(struct sk_buff *skb, unsigned short len) { struct sock *sk; struct udphdr *uh; unsigned short ulen; struct rtable *rt = (struct rtable*)skb->dst; U32 saddr = SKB ->nh.iph->saddr; U32 daddr = SKB ->nh.iph->daddr; // Local IP address (destination IP address) uh = SKB -> h.h; // UDP header... Sk = udp_v4_lookup(saddr, uh->source, daddr, uh->dest, SKB ->dev->ifindex); if (sk ! = NULL) { udp_queue_rcv_skb(sk, skb); // Add the packet to the receive_queue queue of the Socket object sock_PUT (sk); return 0; }... }Copy the code

The udp_rcv() function does two things:

  • Call udp_v4_lookup() to get the Socket object corresponding to the target port.

  • Call udp_queue_rcv_skb() to add the packet to the Socket object’s receive_queue queue.

UDP uses a hash table called the UDp_hash to hold all Socket objects bound to ports. When an application calls the bind() system call to bind ports to the Socket object, the Socket object is added to the UDp_hash table.

The udp_v4_lookup() function obtains the corresponding Socket object from the UDp_hash table based on the destination port. The udp_v4_lookup() function is implemented as follows:

__inline__ struct sock * udp_v4_lookup(u32 saddr, u16 sport, u32 daddr, u16 dport, int dif) { struct sock *sk; read_lock(&udp_hash_lock); Sk = udp_v4_lookup_longway(saddr, sport, daddr, dport, dif); if (sk) sock_hold(sk); read_unlock(&udp_hash_lock); return sk; }Copy the code

The udp_v4_lookup() function first locks the UDp_hash table and then calls the udp_v4_lookup_longway() function to obtain the Socket object corresponding to the target port from the UDp_hash table. The udp_v4_lookup_longway() function is implemented as follows:

Struct sock * udp_v4_lookup_longway(u32 saddr, // Source IP address (remote IP address) u16 sport, // Source port (remote IP address) u32 daddr, Struct sock *sk, *result = NULL; struct sock *sk, *result = NULL; unsigned short hnum = ntohs(dport); int badness = -1; For (sk = udp_hash[hnum&(UDP_HTABLE_SIZE - 1)]; sk ! = NULL; Sk = sk->next) {if (sk->num == hnum) {int score = 0; If (sk->rcv_saddr) {// If (sk->rcv_saddr! // Check whether the destination IP address matches continue; score++; } if(sk->daddr) {// If (sk->daddr! = saddr) // Check whether source IP addresses match continue; score++; } if(sk->dport) {// If (sk->dport! = sport) // Check whether the source port matches continue; score++; } if(sk->bound_dev_if) {// If the Socket is set to receive a fixed network device if(sk->bound_dev_if! = dif) // Compare whether receiving devices match continue; score++; } if (score == 4) {result = sk; break; } else if(score > badness) {result = sk; badness = score; } } } return result; }Copy the code

The main logic of the udp_v4_lookup_longway() function is to get the Socket object from the UDp_hash table based on the destination port.

Because multiple Socket objects can be bound to the same port, the udp_v4_lookup_longway() function uses optimal matching when looking for Socket objects, that is, it may match the source IP address, source port, and destination IP address in addition to the destination port.

After finding the Socket object corresponding to the destination port, you can call the udp_queue_rcv_skb() function to add packets to the receive_queue queue of the Socket object. The udp_queue_rcv_skb() function is implemented as follows:

static int udp_queue_rcv_skb(struct sock * sk, struct sk_buff *skb)
{
    ...
    if (sock_queue_rcv_skb(sk, skb) < 0) {
        ...
        return -1;
    }
    return 0;
}
Copy the code

Sock_queue_rcv_skb (); sock_queue_rcv_skb(); sock_queue_rcv_skb(); sock_queue_rcv_skb();

static inline int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) { ... Skb_queue_tail (&sk->receive_queue, SKB); if (! sk->dead) sk->data_ready(sk, skb->len); // Wake up the process waiting for the Socket object to be ready. }Copy the code

Sock_queue_rcv_skb () adds SKB packets to the receive_queue queue of the Socket object by calling the skb_queue_tail() function and wakes up the process waiting for the Socket object to be ready.

After the packets are added to the receive_queue queue of the Socket object, receiving UDP packets is complete.