Linux IP protocol source code analysis

The IP protocol is the most important part of the network. It is no exaggeration to say that it is because of the IP protocol that the Internet exists. The most important thing about IP protocol is IP address. IP address is just like our home address, which is used for others to find us easily.

Of course, this article is not to introduce the principle of IP protocol, the principle of IP protocol can refer to the classic book “TCP/IP protocol details”, but this article mainly introduces the Linux kernel how to implement IP protocol.

IP Protocol Introduction

Although we won’t go into details about the IP protocol, let’s make a brief introduction, otherwise it would be a bit abrupt to analyze the code directly.

What if we have computer A and computer B, and computer A wants to communicate with computer B?

If computer A is directly connected to computer B, then computer A can send information directly to computer B, as shown below:

So, if it’s a direct connection, the problem is easy to solve. But, for the most part, the computers on the Internet are not directly connected. Just think about it. If a computer in China wants to send a message to a computer in the United States, it can’t be connected directly through an Internet cable.

So, how do computers on the Internet connect to each other? The answer is routers. The diagram below:

Since computers are not directly connected to each other on the Internet, they cannot send messages directly to each other because the different computers do not know where the other is.

So, what’s the solution? In our real life, the house has a fixed address, such as: No. 98 Linhexi Road, Tianhe District, Guangzhou City, Guangdong Province, we can find the corresponding house through this address. So, for computers on the Internet, we can also artificially address them, called IP addresses.

The IP address is represented by a 32-bit integer number. Those of you who have studied computer science have specified that a 32-bit integer number can represent the range from 0 to 4294967295. So, IP addresses are theoretically capable of supporting 4294967296 computers (but far less than that, since many addresses are used for special purposes).

However, 32-bit integers are not very memory-friendly, so we split the 32-bit integers into four 8-bit integers and connect them with dots, as shown in the figure below:

Therefore, the IP address represents the following range:

0.0.0.0 ~ 255.255.255.255

We can view the IP address of the machine in the network Settings of Windows system, as shown in the figure below:

Once you have the IP address, you can set the IP address for every computer on the Internet, as shown in the figure below:

In this way, after each computer has an IP address set up, different computers can communicate with each other via the IP address. For example, if computer A wants to communicate with computer D, A message can be sent to an address with IP address 11.11.1.1.

The communication process is as follows:

So let’s figure out what A’s going to beIP address (source IP address)And computer B’sIP address (target IP address)Encapsulated as A packet and then sent to router A.
Router A receives the packet from computer A and finds the targetThe IP addressNot in the same network segment (i.e., connecting to different routers), will be fromThe routing tableTo find the right next jumpThe router, forward the packet out (Route B).
After receiving the data from Router A, Router B gets the target from the packetThe IP address 为 11.11.1.1, I know thisThe IP addressIt’s from computer DThe IP address, so the data is forwarded to computer D.

Since each router knows the IP addresses of all the computers connected to it (called the routing table), information about the routing table can be exchanged between routers through the routing protocol, so that the next-hop router corresponding to the IP address can be found in the routing table information.

IP protocol is introduced here, more detailed principle can refer to other information.

The IP header

Since when sending data to a computer in the network, the IP address of the other party (the destination IP address) and the IP address of the local machine (the source IP address) must be specified, so it is necessary to add the IP protocol header to the sent packet. The format of the IP protocol header is as follows:

As you can see from the figure above, in addition to the target IP address and the source IP address, there are a number of other fields that are defined to implement the IP protocol. Here’s what each field in the IP header does:

version: occupy 4 places. saidTCP/IP protocolIf soIPv4If you do that, you fix it at 4.
The length of the head: occupy 4 places. saidThe IP headerIs the length of, in words (that is, 4 bytes). And since it’s at a maximum of 15, soThe IP headerMaximum is 60 bytes (15 * 4).
Service type: It takes 8 places. Define different service types to provide different services for IP packets. This article does not cover this field, so I will not cover it in detail.
The total length: 16 seats. Represents the length of the entire IP packet, including data andThe IP header, so the maximum length of the IP packet is 65535 bytes.
ID: 16 seats. Used to identify different IP packets. The field andFlags 和 Shard offsetField is used together to perform sharding operation on large IP packets. The sharding function of IP packets will be introduced later.
Flags: occupies three places. The first field is not used and the second field isDF (Don't Fragment)positionDFBit setting to 1 indicates that the router cannot segment the packet. The third isMF (More Fragments)A bit indicating whether the current shard is the last shard of the IP packet. If it is the last shard, it is set to 0; otherwise, it is set to 1.
Shard offset: occupy 13 places. Represents the position where the current shard is located in the IP packet shard group, and the receiving end relies on this to assemble and restore IP packets.
Survival (TTL): It takes 8 places. When an IP packet is sent, a specific value is assigned to this field. When an IP packet passes through each router along the route, each router along the route will reduce the TTL value of that IP packet by 1. If the TTL is reduced to zero, the IP packet is discarded. This field prevents IP packets from being forwarded across the network due to routing loops.
Upper-layer protocol: It takes 8 places. Identifies the protocols used by the upper layer, such as commonly used TCP, UDP, etc.
The checksum: 16 seats. Used to check the correctness of the IP header, but does not include the data part. Because each router changes the TTL value, the router recalculates this value for each passing IP packet.
Source IP address and destination IP address: Both of these fields are 32 bits. Identifies the IP packetThe source IP address 和 Target IP address.
IP options: Variable in length, up to 40 bytes. Option fields are rarely used, so they are not covered in this article.

The IP header structure is defined in the kernel as follows:

struct iphdr {
    __u8    version:4,
            ihl:4;
    __u8    tos;
    __u16   tot_len;
    __u16   id;
    __u16   frag_off;
    __u8    ttl;
    __u8    protocol;
    __u16   check;
    __u32   saddr;
    __u32   daddr;
    /*The options start here. */
};

The fields of the IP header structure correspond to the fields shown in the figure above.

While the IP header may seem complicated, it becomes apparent when you analyze it in terms of the functionality supported by each field. A packet with IP header added is shown in the figure below:

Of course, in addition to IP headers, there may be some other protocol headers in a network packet, such as TCP headers, Ethernet headers, etc., but since only IP protocols are analyzed here, only IP headers are marked.

Next, we analyze how the Linux kernel realizes the IP protocol through the source code. We mainly analyze the process of sending and receiving IP packets.

The sending of IP packets

To send an IP packet, this can be done through two interfaces: ip_queue_xmit() and ip_build_xmit(). The first is mainly used for TCP, while the second is mainly used for UDP.

The interface ip_queue_xmit() is mainly analyzed. The code of ip_queue_xmit() is as follows:

int ip_queue_xmit(struct sk_buff *skb) { struct sock *sk = skb->sk; struct ip_options *opt = sk->protinfo.af_inet.opt; struct rtable *rt; struct iphdr *iph; rt = (struct rtable *)__sk_dst_check(sk, 0); If (RT == NULL) {u32 daddr; u32 tos = RT_TOS(sk->protinfo.af_inet.tos)|RTO_CONN|sk->localroute; daddr = sk->daddr; if(opt && opt->srr) daddr = opt->faddr; If (ip_route_output(&RT, daddr, sk->saddr, tos, sk->bound_dev_if)) goto no_route; __sk_dst_set(sk, &rt->u.dst); } skb-> DST = dst_clone(&rt->u.dst);} skb-> DST = dst_clone(&rt->u.dst); // The routing information for the binding packet... IP = (struct iphdr *)skb_push(SKB, sizeof(struct iphdr)+(opt? opt->optlen:0)); / / set version + + service type head length * (iph) (__u16 *) = htons ((4 < < 12) | (5 < < 8) | (sk - > protinfo. Af_inet. Tos & 0 XFF)); iph->tot_len = htons(skb->len); Iph-> frag_off = 0; iph->frag_off = 0; // Shard offset iph-> TTL = sk->protinfo.af_inet.ttl; // Shard offset iph-> TTL = sk->protinfo.af_inet.ttl; // IPH-> PROTOCOL = SK-> PROTOCOL; Iph-> saddr = rt->rt_src; // Saddr = rt->rt_src; // source IP address iph->daddr = rt->rt_dst; // Target IP address skb->nh.iph = iph; . // call ip_queue_xmit2() return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, SKB, NULL, rt-> u.st. dev, ip_queue_xmit2); }

The ip_queue_xmit() function takes a packet of type sk_buff as an argument to be sent. In the kernel protocol stack, all data to be sent is carried by the sk_buff structure. The ip_queue_xmit() function mainly does the following work:

First call__sk_dst_check()If the routing information is not already cached, then theTarget IP addressCall as a parameterip_route_output()Function to get routing information and set the routing information cache. Routing information typically includes the device object that sent the data (the network card device) and the next hop routeThe IP address.
Routing information for the bound packet.
Get the packet ofThe IP headerPointer, and then setThe IP headerThe values of the various fields, as shown in the code comments, can be comparedThe IP headerStructure diagram to analyze.
callip_queue_xmit2()Proceed to the next send operation.

We then analyze the implementation of the ip_queue_xmit2() function, which is as follows:

static inline int ip_queue_xmit2(struct sk_buff *skb) { struct sock *sk = skb->sk; struct rtable *rt = (struct rtable *)skb->dst; struct net_device *dev; struct iphdr *iph = skb->nh.iph; . If (skb->len > rt->u.dst.pmtu) goto fragment; if (skb->len > rt->u.dst.pmtu) goto fragment; If (ip_dont_fragment (sk, & rt - > u.d st)) / / if the packet cannot be shard iph - > frag_off | = __constant_htons (IP_DF); // Set the DF flag bit to 1 ip_select_ident(iph, &rt->u.dst); IP_Send_Check (IPH); IP_Send_Check (IPH); IP_Send_Check (IPH); skb->priority = sk->priority; return skb->dst->output(skb); // Send data out (normally dev_queue_xmit) fragment:... ip_select_ident(iph, &rt->u.dst); return ip_fragment(skb, skb->dst->output); // do a sharding operation}

The ip_queue_xmit2() function mainly completes the following work:

Determine whether the length of the packet is greater than the maximum transmission unit (The Maximum Transmission Unit, MTUIs the maximum length of a message allowed during the transmission of data. If it is greater than the maximum transmission cell, then callip_fragment()The sharding function performs the sharding operation on the packet.
Set if the packet cannot be shardedDF (Don't Fragment)A 1.
Sets the ID (identifier) of the IP packet.
To calculateThe IP headerCheck sum of.
To send a packet through a network card device, usually by callingdev_queue_xmit()Function.

The ip_queue_xmit2() function goes on to set the values of the other fields in the IP header, and then calls dev_queue_xmit() to send the packet.

Of course, it is necessary to determine whether the length of the sent packet is greater than the maximum transmission unit. If it is greater than the maximum transmission unit, then it is necessary to slice the packet. Data sharding refers to dividing the data packet to be sent into multiple data packets with the maximum length of the maximum transmission unit, and then sending these data packets out.

Receiving IP packets

The IP packet is received by the ip_rcv() function. After the packet is received by the network card, it is sent to the link layer of the kernel protocol stack, and the link layer parses the packet according to the link layer protocol (such as Ethernet protocol). Then the parsed packets are sent to the IP protocol of the network layer by calling the ip_rcv() function. The implementation of the ip_rcv() function is as follows:

int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt) { struct iphdr *iph = skb->nh.iph; If (skb->pkt_type == PACKET_OTHERHOST) // If the packet is not sent to the native, drop the packet goto drop; . / legal if/whether the length of the packet (SKB - > len < sizeof (struct iphdr) | | SKB - > len < (iph - > ihl < < 2)) goto inhdr_error; / / 1. Whether the head length legal / / 2. The IP protocol version is legal / / 3. IP header checksum is correct if (iph - > ihl < 5 | | iph - > version! = 4 || ip_fast_csum((u8 *)iph, iph->ihl) ! = 0) goto inhdr_error; { __u32 len = ntohs(iph->tot_len); If (SKB - > len < len | | len < (iph - > ihl < < 2)) / / the length of the packet is legal goto inhdr_error; __skb_trim(skb, len); } // Continue calling ip_rcv_finish() to process the packet return NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, SKB, dev, NULL, ip_rcv_finish); inhdr_error: IP_INC_STATS_BH(IpInHdrErrors); drop: kfree_skb(skb); out: return NET_RX_DROP; }

The main job of the ip_rcv() function is to verify that the values of each field in the IP header are valid. If not, the packet will be discarded. Otherwise, the ip_rcv_finish() function will be called to continue processing the packet.

The ip_rcv_finish() function is implemented as follows:

static inline int ip_rcv_finish(struct sk_buff *skb) { struct net_device *dev = skb->dev; struct iphdr *iph = skb->nh.iph; If (skb-> DST == NULL) {if (ip_route_input(SKB, iph->daddr, iph->saddr, iph-> toS) {if (ip_route_input(SKB, iph->daddr, iph->saddr, iph-> toS) {if (ip_route_input(SKB, iph->daddr, iph->saddr, iph-> toS); dev)) goto drop; }... // IP_LOCAL_DELIVER () will be called for the return skb->dst->input(SKB); }

The ip_rcv_finish() function is simple to implement by first calling the ip_route_input() function to find the corresponding routing information, taking the source IP address, destination IP address, and service type as parameters.

The packet is then processed by calling the input() method of the routing information, which points to the ip_local_deliver() function if the packet is sent to the native.

We then examine the ip_local_deliver() function:

int ip_local_deliver(struct sk_buff *skb) { struct iphdr *iph = skb->nh.iph; / / if it is an IP packet fragmentation of the if (iph - > frag_off & htons (IP_MF | IP_OFFSET)) {SKB = ip_defrag (SKB); // Assembble the shard into the actual packet, and return the assembled packet if (! skb) return 0; } return NF_HOOK(PF_INET, NF_IP_LOCAL_IN, SKB, skb->dev, NULL, ip_local_deliver_finish); }

The ip_local_deliver() function first determines whether the packet is a shard and, if so, calls the ip_defrag() function to reorganize the shard. If the recombination is successful, the reorganized packet will be returned. Then call ip_local_deliver_finish() to process the packet.

The ip_local_deliver_finish() function is implemented as follows:

static inline int ip_local_deliver_finish(struct sk_buff *skb) { struct iphdr *iph = skb->nh.iph; skb->h.raw = skb->nh.raw + iph->ihl*4; // Set the TCP/UDP header {int hash = iph->protocol & (max_inet_protos-1); // struct sock *raw_sk = raw_v4_htable[hash]; struct inet_protocol *ipprot; int flag; . Ipprot = (struct inet_protocol *)inet_protos[hash]; flag = 0; if (ipprot ! = NULL) {if (RAW_SK == NULL &&IPPROT-> NEXT == NULL &&IPPROT-> PROTOCOL == IPH-> PROTOCOL) {// Call the packet handler of the transport layer to process the packet return ipprot->handler(skb, (ntohs(iph->tot_len) - iph->ihl*4)); } else { flag = ip_run_ipprot(skb, iph, ipprot, (raw_sk ! = NULL)); }}... } return 0; }

The main job of the ip_local_deliver_finish() function is to find the corresponding packet handler from the inet_protos array based on the type of the upper layer (transport layer), and then process the data through this packet handler.

In other words, the IP layer will send the packet to the transport layer for processing after the correctness verification and reorganization of the packet is completed. For TCP, the packet handling function corresponds to tcp_v4_rcv().

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Linux IP protocol source code analysis

IP Protocol Introduction

The IP header

The sending of IP packets

Receiving IP packets

Our official account

Linux IP protocol source code analysis

IP Protocol Introduction

The IP header

The sending of IP packets

Receiving IP packets

Our official account

Related Posts

Ubuntu, Debian and Fedora are all affected by the Linux kernel privilege promotion vulnerability

Configfs – User space controlled kernel object configuration

Linux copy-on-write mechanism principle