The original link:
Mp.weixin.qq.com/s/U9lmkrTba…

In this paper, the point

  • Introduction to the chapter of TCP/IP Volume 2

  • First network programming example

  • Network hierarchy

  • The descriptor

  • mbuf

  • Network output and input

  • Interrupts and concurrency

  • Source code organization structure

Introduction to the chapter of TCP/IP Volume 2

The original book consists of 32 chapters (as shown in Figure 1), which introduce the implementation principle and details of TCP/IP protocol stack in detail. In terms of network hierarchy, it can be divided into interface layer, protocol layer and socket layer. The protocols include Ethernet, ARP, SLIP, IP, ICMP, IGMP, UDP, TCP and so on.

Figure 1 chapter summary of TCP/IP Volume 2


First network programming example

As is customary for learning a new programming language, learn your first “Hello World” program. Here is an example of your first web program to get started, as shown in Figure 2.Figure 2 sample source code

1. Create a datagram interface

The 19-20 socket function creates a UDP socket and returns a descriptor to the process, stored in the variable sockfd.

2. Save the server addresssocketaddr_in

21-24 Set the server IP address (140.252.1.32) and port (13), htons will convert a host byte sequence (may be low byte after) to network byte sequence (high byte after), do not understand can search for big/small endian related problems.

3. Send data packets to the server

25-27 The program calls Sendto to send a 150-byte datagram to the server. The 150-byte data is unknown because the server returns the time and date of whatever data it receives.

4. Read the datagram returned from the server

The program reads the data sent back from the server by calling recvFROM, and then prints it.


Network hierarchy

The kernel network code is organized into three layers, as shown in Figure 3:Figure 3. Network hierarchy

The socket layer

The protocol layer

The interface layer


The descriptor

The return value of the socket function is a descriptor that has all the features of other Unix descriptors: it can be used to call read and write; It is possible to call FCNTL to change its properties; It can also be used as the first argument to sento and recvfrom functions.

The following describes the position of the descriptor in a process and the associated data structure, as shown below:

Figure 4 Socket position in the process

Each process table has a record entry that contains a table of open file descriptors, each representing a descriptor entry, such as p_fd in Figure 4, which points to the fileDESC structure of the process. Included in this structure are fd_ofileFlags (a character array pointer with a descriptor flag for each descriptor) and fd_ofiles (a pointer to an array of Pointers to file table structures).

The data structure to which fd_ofiles refers is represented by *file{}[]. It is an array of Pointers to the file structure. The array and descriptor indicate that the subscripts of the array are the descriptors themselves: 0,1,2, and so on.

The f_TYPE member of the structure file indicates that the descriptors are of type DTYPE_SOCKET and DTYPE_VNODE. V-node is a general-purpose mechanism that allows the kernel to support different types of file systems. V-nodes are not of concern in this article because TCP/IP sockets are always of type DTYPE_SOCKET.

The f_data member of the file structure refers to either a socket structure or a Vnode structure, depending on the descriptor type. The member f_OPS points to a vector with five Pointers to the function. These function Pointers are used in read, readV, write, writeV, IOCTL, SELECT, and close system calls that require a socket descriptor or non-socket descriptor. Functions such as fo_read in FileOPS in Figure 4 represent the name of the structure member, and the corresponding soo_read/vn_read functions on the left represent the actual contents of the structure member.

Let’s look at the general lookup flow for a network system call (assuming fTYPE is DTYPE_SOCKET) :

  1. When a process makes a system call such as sendto, the kernel starts with the descriptor value and uses the fd_ofiles index vector to the file structure corresponding to the descriptor, which points to the socket structure. The structure socket has Pointers to the structure INPCB (Internet Protocol Control Block).

  2. When a UDP datagram reaches a network interface, the kernel searches all UDP protocol control blocks for an appropriate one, at least based on the destination UDP port number, and possibly also the destination IP address, source IP address, and source port number. Once the inPCB is located, the kernel finds the socket structure using the inP_socket pointer.

Members inp_faddr and inp_laddr contain the remote and local IP addresses, and members inp_fport and inp_lport contain the remote and local port numbers. The IP address and port number form a socket. The UDP in Figure 54 is a global structure that is the header of a linked list of all udp PCBS (protocol control blocks). Members INp_Next and INp_Pre put all UDP PCBS into a two-way circular linked list.


mbuf

Mbuf is a memory cache that is used to store information throughout network code. It is a core data structure that we will cover in more detail in the next section, but we will briefly look at its common usage here.

1. Mbuf containing the address structure of the socketFigure 5 mBUF with socketf address

In Figure 5, the first 20 bytes of the MBUF are the header, which contains some basic information about the MBUF. It consists of four 4-byte fields and two 2-byte fields, and the TOTAL mBUF is 128 bytes.

M_next and m_nextPKt are used to join multiple MBUFs together; M_data points to the data in the MBUF. In this case, the data is the address of the socket (sockaddr_in{}), and m_len indicates its length. M_type indicates the data type in MBUF, in this case MT_SONAME (the socket name), and m_flags is the flag bit.

2. Mbuf containing data

Figure 6 mBUF containing data

The structure in Figure 6 consists of two MBUFs, with the m_next pointer pointing to the next MBUF, called the MBUF linked list. The biggest difference from the MBUF in Figure 5 is that the first MBUF adds two members to its header: M_PKTHDr. len and m_PKthDr. rcvif. These two members form a group head and are used only in the first MBUF of the linked list. The value of m_FLAGS is M_PKTHDR, indicating that the MBUF contains a packet head. M_pkthdr. len contains the total length of the data in the entire MBUF chain (150 in this case), and m_PKthDr. rcVIf contains a pointer to the receive interface structure of the receive group.

3. Add the IP address and UDP header

Figure 7 Adding IP and UDP headers

At the socket layer, the target socket address structure is typically copied into an MBUF and the data is copied into the MBU chain, and then the protocol layer corresponding to this socket descriptor is invoked. Specifically, the UDP output routine is called and a pointer to the MBUF is passed as a parameter. The routine adds an IP header and a UDP header to the 150 bytes of data, and then passes the MBUF chain to the IP output routine.

In Figure 7, the IP header and UDP header are placed at the end of the first MBUF, which contains m_PKTHDr. len and m_PKTHDr. rcvif with the m_PKTHDR flag set. The space in the original group head (in the second MBUF) is now unused, and the M_PKTHDR flag is cleared.


Network output and input

1. Output process

Figure 8 shows a process that calls Sendto to transmit a UDP datagram.

The IP output routine should fill in the remaining fields in the IP header, including IP check and; Determine from which output interface the data is sent; Fragment IP packets if necessary. Assuming that the output interface is an Ethernet interface, again pass the pointer to the MBUF chain as a parameter to the Ethernet output function.

The first function of the Ethernet output function is to convert a 32-bit IP address into a 48-bit Ethernet address through ARP address resolution. The 14-byte Ethernet header (6-byte Ethernet target address, 6-byte Ethernet source address, and 2-byte Ethernet frame type) is then added to the first MBUF in the linked list, immediately before the IP header, and the MBUF linked list is added to the output queue of this interface to be sent.

Figure 8. Layer 3 processing a simple UDP output execution

2. Input process

Unlike output processing, where input processing is asynchronous and receives a packet through an interrupt, output processing is a system call to the process.

When an interrupt occurs when the data arrives at the interface (suppose Ethernet), the Ethernet device driver processes reading the data from the device into the MBUF linked list and submitting the MBUF linked list to the upper layer protocol based on the frame type. Assuming the type field is an IP datagram, the MBUF chain is added to the IP output queue.

At this point, a soft interrupt is generated to execute the IP input routine, so the IP input is also asynchronous. The IP input routine loops through each IP datagram in its input queue until it returns.

The IP input routine will continue to transfer data to the cache queue of the upper-layer protocol based on the header protocol field. The input routine of the upper-layer protocol will be called to process the data and finally submit the data to the application process.


Interrupts and concurrency

Input processing is asynchronous and interrupt-driven. First, a device interrupt triggers interface layer code execution, and then it generates a soft interrupt that triggers protocol layer code execution. When the kernel completes these levels of interrupts, it executes the socket code.

Each hardware and software interrupt is assigned a priority here. As shown in Figure 9 below:Figure 9 Interrupt priority

For different priorities, the concern is how to deal with data structures that are shared between processes at different levels. The data structures shared between three different priority processes — socket queues, interface queues, and protocol queues — are shown in Figure 3. For example, when an IP input routine is fetching a receive packet from its input queue, a device interrupt occurs, preempting the protocol layer, and that device driver may add another packet to the IP input queue. If their access to data is not coordinated, data integrity can be compromised.

So how to solve this problem? Net/3 code often calls the functions splIMP and splNET. These two calls are always paired with SPLX, which returns the processor to its original priority.

As shown in the following code:

struct mbuf *m

int s;

s = splimt();

IF_DEQUEUE(&ipintrq, m);

splx(s);

if (m == 0):

return

When the protocol layer IP input function is executed, splIMP is first called to increase the CPU priority to the network device driver level to prevent any network device driver interruption. The original priority is stored in the variable S. We then execute the macro IF_DEQUEUE to delete the second block in the ipintrq header and store the pointer to the MBUF list in the variable M. Finally, use SPLX to restore the CPU priority. Since all network device driver interrupts between SPLIMP and SPLX are prohibited, there should be as little code between the two calls as possible.


Source code organization structure

All the source code for this article series is based on the NET/3 version, and its source code organization is shown in Figure 10:

Figure 10 source code organization structure

This series focuses on the directory Netinet, which contains all the TCP/IP source code j.

More latest articles in the public account: Baymax love mountain climbing, welcome to pay attention to!