Those who have been exposed to network development know that the upper application uses send function to send data, and uses RECV to receive data. How do send and RECV work?

As we mentioned in previous articles, TCP is a reliable, full-duplex protocol. Its flow control or congestion control relies on sliding Windows and congestion Windows, and the sliding implementation of these two Windows relies on two buffers in TCP. The two buffers are the TCP socket’s send buffer and recV buffer in the kernel.

In this article, we will first take a brief look at the functions of TCP send and receive buffers (important for understanding SEND and RECV), and then explain how TCP sends and receives data on Linux.

The buffer

A buffer, you can think of as a temporary cache.

For the sender, the socket copies the data into the sending temporary buffer and immediately goes back to the application layer to do other things, while the rest of the data is sent through the kernel to the peer end, which is TCP’s business.

On the receiving end, the kernel copies data from the network into a buffer and waits for upper-layer applications to read it.

Send buffer

In the simplest (and most common) case, when a process calls send(), it copies the data into the socket’s kernel send buffer and sends back immediately.

In other words, when the application layer returns send(), the data is not necessarily sent to the peer (similar to write files). Send () simply copies the application buffer into the socket’s kernel buffer.

The TCP socket has two modes, that is, blocking mode and non-blocking mode.

  • In blocking mode, the send function copies the data requested by the application to the send cache for acknowledgement and then returns it. If the size of the send cache is larger than the size of the request sent, the send function returns immediately and sends data to the network at the same time. Otherwise,send sends the data that cannot be contained in the cache to the network, and waits for the peer end to confirm the data. (The receiver confirms the data as soon as it receives the data from the receive cache, but does not need to wait for the application program to call recV.)
  • In non-blocking mode, the send function simply copies the data to the cache of the protocol stack. If the cache space is insufficient, the send function copies the data as much as possible and returns the size of the successful copy. If the available cache space is 0, -1 is returned and errno is set to EAGAIN.

In the Linux kernel, there are two ways to view the TCP buffer size.

1. Check the net.ipv4.tcp_wmem value in /etc/sysctl.ronf

2. Run the cat /proc/sys/net/ipv4/tcp_wmem command.

cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 4194304
Copy the code

As can be seen from the above, on the server where the author lives, the TCP Send buffer has three values: 4096 16384 4194304.

  • The first value is the minimum number of bytes allocated by the socket’s send cache,
  • The second value is the default value (overridden by net.core.wmem_default) to which the cache can grow without heavy system load
  • The third value is the maximum number of bytes in the send cache space (this value is overridden by net.core.wmem_max)

You can programmatically change the size of the current TCP socket send buffer. Note that the following code changes only the current specific socket.

int buffer_len = 10240;
setsockopt(fd, SOL_SOCKET, SO_SNDBUF, (void*)&buffer_len, buffer_len);
Copy the code
Receiving buffer

The receive buffer is used by TCP to cache data from the network until the application process reads it.

For TCP, if the application process has not read, and the receive buffer is full, the action occurs: the receiver notifies the originator, and the receive window closes (win=0). This is the implementation of sliding Windows. Ensure that the TCP socket receive buffer does not overflow, thus ensuring reliable TRANSMISSION of TCP. The other side is not allowed to send data larger than the advertised window size. This is TCP traffic control. If the other party sends data that exceeds the window size regardless of the window size, the receiving TCP will discard it.

The receive buffer is viewed in the same way as the size of the send buffer. 1. Check the net.ipv4.tcp_rmem value in /etc/sysctl.ronf

2. Run the cat /proc/sys/net/ipv4/tcp_rmem command.

cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 4194304
Copy the code

The TCP receive buffer has three values: 4096 87380 4194304.

  • The first value is the minimum number of bytes in the socket’s receive cache,
  • The second value is the default value (overridden by net.core.rmem_default) to which the cache can grow without heavy system load
  • The third value is the maximum number of bytes in the received cache space (this value is overridden by net.core.rmem_max)

Similarly, you can change the size of the receive buffer with the following code.

int buffer_len = 10240;
setsockopt(fd, SOL_SOCKET, SO_RCVBUF, (void*)&buffer_len, buffer_len);
Copy the code

Realize the principle of

In order to understand the whole process of TCP transmission, let’s take a look at the TCP four-layer model and the flow of four-volume model in data transmission. We will look at what the send and RECV functions do in each layer from the perspective of the four-layer model.

Send the principle
NAME
       send, sendto, sendmsg - send a message on a socket

SYNOPSIS
       #include <sys/types.h>
       #include <sys/socket.h>

       ssize_t send(int sockfd, const void *buf, size_t len, int flags);

DESCRIPTION
       The system calls send(), sendto(), and sendmsg() are used to transmit a message to another socket.
Copy the code

The send function compares len, the length of the data to be sent, with the length of the available send buffer for socket sockfd

  • If len is larger than the length of the send buffer, the data is sent multiple times

  • If len is less than or equal to the length of the sockFD buffer, send checks whether the protocol is sending data from the SockFD send buffer

    • If so, wait for the protocol to finish sending the data

    • Otherwise, if the protocol has not yet started sending data in s’s send buffer or there is no data in S’s send buffer, send compares the remaining space of sockFD’s send buffer with Len

      • If len is larger than the free space, send waits for the protocol to finish sending data from s’s send buffer
      • If len is less than the free space, send simply copies the data from buF into the free space. If send successfully copies the data, it returns the actual number of bytes that were copied. If SEND failed to copy the data, send returns SOCKET_ERROR. Send also returns SOCKET_ERROR if the network is disconnected while waiting for the protocol to send data. Note that the send function successfully copies data from buF into the remaining space of s’s send buffer and then returns it, but the data is not necessarily sent to the other end of the connection immediately. The next socket function returns a SOCKET_ERROR if a network error occurs during a subsequent transmission. (Each socket function, except SEND, must initially wait for the socket’s send buffer to be sent before continuing. If a network error occurs while waiting, the socket function returns a SOCKET_ERROR. The socket function returns SOCKET_ERROR.

If you are not very interested in the implementation, you can go directly to this section

Analyze the SEND implementation from a four-tier model perspective.

The application layer

With TCP, after creating the socket, the application calls the connect() function to establish a connection between the client and the server through the socket. You can then call the send function to send the data.

The transport layer

Data is processed at the transport layer. Taking TCP as an example, it has the following functions:

  • 1. Construct a TCP segment
  • 2. Calculate the checksum
  • 3. Send an ACK packet
  • 4. Sliding windown and other operations ensure reliability.

Different protocols have different sending functions. TCP calls the tcp_sendmsg function and UDP calls the sock_sendmsg function.

The main job of tcp_sendmsg() is to transfer user-level data and put it into SKB. Then tcp_push() is called to send, tcp_push function is called to tcp_write_xmit(), and tcp_transmit_skb is called in turn. After SKB wraps TCP header, ip_queue_xmit is called.

The network layer

Ip_queue_xmit (SKB) provides route lookup verification, ENCAPSULATION of IP headers and IP options, and finally sends packets through IP_LOCAL_OUT.

Data link layer

The data link layer provides reliable transport over unreliable physical media. The functions of this layer include: physical address addressing, data framing, flow control, data error detection, retransmission, etc. The data units at this level are called frames.

This is a diagram of the call logic of the send function source code. If you are interested in the source code, you can find the corresponding implementation in NET/TCP. c.

Recv principle
NAME
       recv, recvfrom, recvmsg - receive a message from a socket

SYNOPSIS
       #include <sys/types.h>
       #include <sys/socket.h>

       ssize_t recv(int sockfd, void *buf, size_t len, int flags);

       ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags,
                        struct sockaddr *src_addr, socklen_t *addrlen);

       ssize_t recvmsg(int sockfd, struct msghdr *msg, int flags);

DESCRIPTION
       The recvfrom() and recvmsg() calls are used to receive messages from a socket, and may be used to receive data on a socket whether or not it is connection-oriented.

       If  src_addr  is not NULL, and the underlying protocol provides the source address, this source address is filled in.  When src_addr is NULL, nothing is filled in; in this case, addrlen is not used, and should also be NULL.  The argument
       addrlen is a value-result argument, which the caller should initialize before the call to the size of the buffer associated with src_addr, and modified on return to indicate the actual size of the source address.  The returned address is
       truncated if the buffer provided is too small; in this case, addrlen will return a value greater than was supplied to the call.

       The recv() call is normally used only on a connected socket (see connect(2)) and is identical to recvfrom() with a NULL src_addr argument.
Copy the code

When this function is called:

  • First check the receive buffer of socket SOckFD
  • If there is no data in the SockFD receive buffer or the protocol is receiving data, recV waits until the protocol finishes receiving data.
  • When the protocol has received the data, recv copies the data from sockFT’s receive buffer to BUF. Recv returns the number of bytes it actually copied.
  • If recV fails to copy, it returns SOCKET_ERROR;
  • If the recV function is down while waiting for the protocol to receive data, it returns 0.
  • The graceful closing of the socket does not affect the normal reception of data by the local RECV.
  • If there is no data in the protocol buffer, recV returns 0, indicating that the other party is closed;
  • If there is data in the protocol buffer, the corresponding data is returned (multiple recVs may be required), and on the last RECV, 0 is returned to indicate that the other party is closed.

If you are not very interested in the implementation, you can go directly to this section

The RECV implementation is analyzed from the perspective of a four-tier model.

Data link layer

An interrupt is triggered when a packet arrives on the machine’s physical nic, and the interrupt handler allocates the SKB_buff data structure, copies the data frames received from the NIC I/O to the SKB_buff buffer, and sets the parameters corresponding to the SKB_buff.

A soft interrupt is then issued to inform the kernel to receive a new data frame. Enter the soft interrupt processing flow and call the net_rx_action function. The netif _receive_SKB process is displayed.

Netif_receive_skb sends packets to different network-layer protocol receiving functions (ip_RCV and ARP_RCV are the main INET domains) based on the network-layer datagram types registered in the global arrays ptype_ALL and ptype_BASE.

The network layer

The ip_rCV function is the entry function of the network layer. The first thing this function does is validate the data and then call the function ip_rcv_finish.

The ip_rcv_finish function calls the ip_route_input function to update the route, then looks for the route and decides whether the message should be sent to the local machine, forwarded or discarded.

If sent to native, the ip_LOCAL_DELIVER function is called, which can defragment (merge multiple packages), and ip_LOCAL_deliver_finish is called. Finally, the interfaces of the next layer are called, including tcp_V4_RCV (TCP), UDP_RCV (UDP), ICMP_RCV (ICMP), and IGMP_RCV (IGMP). If forwarding is required, enter the forwarding process and call dev_queue_xmit to enter the link layer processing process. If it is not sent to the local machine, it should be forwarded. Call ip_forward for forwarding.

The transport layer

At this level, we do some integrity checks and throw packets if we find problems. If it is TCP, tcp_v4_do_rCV is called.

Then sk-> SK_state == TCP_ESTABLISHED, call tcp_RCv_builted and call the tcp_data_queue method to put the message into the queue. The message is then inserted and received to Queued using the tcp_ofo_queue method.

The application layer

When an application calls read or RECV, the call is mapped to the sys_RECV system call in /net/socket.c, which then calls the sock_recvmsg function.

TCP calls tcp_recvmsg. This function copies data from the socket buffer to the buffer.

The above process can be summarized as follows: The hard interrupt notifies the CPU. 4. The CPU responds to the hard interrupt, and the soft interrupt process processes the soft interrupt. 6. Frames are removed from the Ring Buffer and stored in SKB. 7. The protocol layer processes network frames and puts the data after processing into the socket’s receive Buffer

The figure above shows the function call process of the whole network data receiving. For the receiving end of the month, when there is data coming, it is through the terminal to notify the kernel, and finally through the callback, call the system function.

Below are the complete function calls to send and RECV

Q&A

In practical application, if the sender is blocked to send, processing due to network congestion or the receiving end too slow, usually, the sending application looks sent 10 k of data, but only send the 2 k to the cache to end, there are 8 k in the native cache (not send or has not been confirmed at the receiving end). At this time, the receiving application can receive 2k data. If the receiving application calls the recV function to obtain 1K data in processing, at this moment, one of the following situations occurs, both of which are shown as follows:

  1. The sending application thinks it has finished sending 10K data and closes the socket:

As the sending host acts as the active close of TCP, the connection will be in FIN_WAIT1 semi-closed state (waiting for the ack of the other party), and the 8K data in the sending cache will not be cleared and will still be sent to the peer. If the receiving application is still in RECV, it will receive the remaining 8K of data (the receiver will receive the remaining 8K of data before the sender’s FIN_WAIT1 state expires). And get a message that the peer socket was closed (recv returns 0). At this point, you should shut it down.

  1. The sending application calls send again to send 8K of data:

If the send cache space is 20K, then the available space of the send cache is 20-8= 12K, which is larger than the 8K sent by the request. Therefore, the send function copies the data and immediately returns 8192.

Send () returns 4096. If the value returned is smaller than the size of the request sent, the cache is considered full and must block (or wait for the next socket writable signal through SELECT) The program immediately calls send again, which returns a value of -1, errno=EAGAIN under Linux.

  1. The receiving application closes the socket after processing 1K of data: the receiving host acts as the active close, and the connection is in a FIN_WAIT1 half-closed state (waiting for an ACK from the other party). The sending application then receives a signal that the socket is readable (usually by a select call that returns socket readable), but on reading it finds that the recv function returns 0, so it should call close to close the socket(to send ack).

If the sending application does not process this readable signal, but instead is sending, there are two scenarios. If send is called after the sender receives the RST flag,send returns -1, and ECONNRESET is set to ECONNRESET, indicating that the peer network is disconnected IPE signal, whose default response action is to exit the process. If this signal is ignored, send returns -1 and errno is EPIPE(unverified). If it is before the sender receives the RST flag, send works as usual.

In the non-blocking send case, if SEND is a blocking call and the socket on the other end is closed (such as sending a large buF at once, exceeding the send cache), send will return the number of bytes successfully sent. If SEND is called again, it will return the same number of bytes successfully sent.

  1. Switch or router network disconnection:

After processing the 1K received, the receiving application continues to read the remaining 1K from the cache and then behaves as if there is no data to read, requiring the application to handle the timeout. The general practice is to set a maximum time for the select to wait. If no data can be read after this time, the socket is considered to be unavailable.

The sending application keeps sending the rest of the data to the network, but it never gets acknowledged, so the available space in the cache continues to be zero, which the application needs to deal with.

If the timeout is not handled by the application, you can use TCP itself to handle the timeout. For details, see the sySCtl entry: net.ipv4.tcp_keepalive_intvl net.ipv4.tcp_keepalive_probes net.ipv4.tcp_keepalive_time

conclusion

  • TCP protocol itself is to ensure reliable transmission, does not mean that applications using TCP to send data must be reliable, must be fault-tolerant;
  • Send () only copies and returns when it is copied to the kernel
  • The program error triggered by this send () call may be returned either this time or the next time the network IO function is called.
  • During TCP transmission, pay attention to the characteristics of data transmission. Recv and SEND do not necessarily correspond one to one (usually one to one). That is, a RECV may be sent once, a RECV may be sent several times, or a RECV may be sent several times. A RECV is received. The TCP protocol ensures an orderly and complete transmission of data, but it is up to the developer to process each piece of information correctly and completely.

The server circulates recV with a buffer size of 100 bytes, and the client circulates SEND with 6 bytes of data each time. Therefore, the RECV may receive 6, 12, or 18 bytes of data each time. This is random.

reference

Slidetodoc.com/network-app… www.programmersought.com/article/819… www.programmersought.com/article/749… www.fatalerrors.org/a/0dl00z0.h… Blog.csdn.net/w839687571/… Lkml.iu.edu/hypermail/l… The Linux kernel – LABS. Making. IO/refs/heads /… Git.kernel.org/pub/scm/lin… tree/net/ipv4/tcp.c#n1581

For more content, please pay attention to the public account: High-performance Architecture Exploration