Concept:

TCP(Transmission Control Protocol) is one of the major network protocols. It enables two hosts to establish a connection and exchange data flows.

TCP ensures the delivery of data and maintains the order in which packets are sent.

TCP was designed in the 1970s by two DARPA scientists, Vint Cerf and Bob Kahn, who were then called the fathers of the Internet.

Source:

The bottom layer Ethernet protocol (Ethernet) specifies how the electronic signals are composed of packets (packet), to solve the point-to-point communication within the subnet, but Ethernet protocol can not solve how to communicate with multiple LAN, which is solved by IP protocolCopy the code

(Photo caption: THE IP protocol can connect to multiple Lans.)

The IP protocol defines its own set of address rules, called IP addresses. It implements routing, allowing host A on one LAN to send A message to host B on another LAN. Routers are based on the IP protocol. LAN to connect to the router)

The principle of routing is simple. All routers in the market have a lot of network ports on the back, to connect to multiple network cables. The router has an internal routing table that specifies that IP addresses in segment A go to egress 1, and IP addresses in Segment B go to egress 2,…… Through this set of “guiding signs”, the packet forwarding is realized

(Picture description: the routing table of the host indicates the packets of different IP destinations to be sent to which interface.)

IP protocol is only an address protocol and does not guarantee the integrity of packets. If the router loses packets (such as when the cache is full and incoming packets are lost), it needs to find out which packet is missing and how to resend it. This depends on the TCP protocol.

In short, THE function of TCP is to ensure the integrity and reliability of data communication and prevent packet loss.

TCP packet size

The size of Ethernet packets is fixed, initially at 1518 bytes and later increased to 1522 bytes. 1500 bytes is the payload and 22 bytes is the head.

IP packets in the Ethernet packet load, it also has its own header information, at least 20 bytes, so IP packet load is 1480 bytes at most

(Image description: IP packets in Ethernet packets, TCP packets in IP packets.)

TCP packets are in the load of IP packets. It also requires at least 20 bytes of header information, so the maximum load of TCP packets is 1480-20 = 1460 bytes. Because IP and TCP tend to have extra header information, the TCP payload is actually around 1400 bytes.

Therefore, a 1500-byte piece of information requires two TCP packets. One of the major improvements to HTTP/2 is the compression of HTTP headers so that an HTTP request can be placed in a single TCP packet, rather than divided into multiple packets, thus increasing speed.

(Image caption: The Ethernet packet load is 1500 bytes, and the TCP packet load is around 1400 bytes.)

TCP packet number (SEQ)

A packet is 1400 bytes, so sending a large amount of data at once has to be split into multiple packets. For example, for a 10MB file, you need to send more than 7100 packages.

When sending packets, TCP sets the sequence number (SEQ) of each packet so that the receiving party can restore packets in sequence. In case a packet is lost, you can also know which packet is missing.

The number of the first packet is a random number. Let’s call it package number one for the sake of understanding. Assuming that the payload length of this packet is 100 bytes, the next packet should be numbered 101. This means that each packet gets two numbers: its own number and the number of the next packet. The recipient knows in what order they should be restored to their original files.

(Image caption: The current packet number is 45943, the next packet number is 46183, so we know that this packet load is 240 bytes.)

4. Assembly of TCP packets

After receiving the TCP packet, the assembly and restore is done by the operating system. The application does not process TCP packets directly.

For the application, you don’t care about the details of data communication. Complete data is always received unless the line is abnormal. The data that an application needs is stored in TCP packets that have their own format (such as HTTP protocol).

TCP does not provide any mechanism for representing the size of the original file, which is specified by the application layer protocol. For example, HTTP has a content-length header that represents the size of the message body. For the operating system, it is the continuous receipt of TCP packets and the assembly of them in sequence, each packet many.

The operating system does not process the data contained in TCP packets. Once the TCP packets are assembled, they are transferred to the application. TCP packets have a port parameter, which specifies the application that is listening on the port.

(Picture description: TCP packet format)

The application receives the assembled raw data and, in the case of the browser, reads the correct pieces of data according to the CONTent-Length field of the HTTP protocol. This also means that a single TCP communication can include multiple HTTP communications.

Slow start and ACK

The server sends packets, of course, as soon as possible, preferably all at once. However, if the packet is sent too fast, it may be lost. Packet loss can be caused by many factors such as low bandwidth, overheated router, and cache overflow. With a bad line, the faster you send it, the more you lose.

Ideally, the maximum speed is achieved as the line allows. But how do we know what the ideal rate of the other line is? The answer is to try slowly.

In order to unify efficiency and reliability of TCP protocol, a slow start mechanism is designed. At first, the transmission speed is slow. Then, the transmission rate is adjusted according to packet loss. If no packet is lost, the transmission speed is accelerated. If the packet is lost, slow down the transmission speed.

Linux kernel set (constant TCP_INIT_CWND), the first communication, the sender sent 10 packets at a time, that is, the “send window” size is 10. Then stop, wait for the receiver’s confirmation, and then continue sending.

By default, the recipient sends an acknowledgement message for every two TCP packets it receives. “Acknowledgement” is the English word for “ACK.”

An ACK carries two messages.

  • Expected number of next packet (Ack NUM = 361)
  • The remaining capacity of the receiver’s receive Window (Window = 120)

With these two pieces of information, and the latest numbers of the packets it has sent, the sender can predict how fast the receiver is going to receive them, and slow or increase the rate at which they are sent. This is called a “send window” and the size of this window is variable.

Image caption: Each ACK carries the number of the next packet and the remaining capacity of the receiving window. Both parties send an ACK.)

Note that since TCP communication is bidirectional, both parties need to send AN ACK. The window size of the two sides is likely to be different. And the ACK is just a few simple fields, usually combined with the data in a packet sent.

(Photo caption: above four communications. In the first communication, the packet number from host A to host B is 1 and the length is 100 bytes. Therefore, in the second communication, the ACK number of host B is 1 + 100 = 101. In the third communication, the packet number of host A is also 101. Similarly, the number of the packet from host B to host A in the second communication is 1 and the length is 200 bytes. Therefore, the ACK of host A in the third communication is 201, and the number of the packet from host B in the fourth communication is also 201.)

Even for high-bandwidth, well-wired connections, TCP will always start with 10 packets and then, over time, reach the highest transmission rate. This is the slow start of TCP.

6. Packet loss processing

The TCP protocol can guarantee the integrity of data communication. How does this work?

As mentioned earlier, each packet carries the number of the next packet. If the next packet is not received, then the ACK number does not change.

For example, you now receive packet number 4, but no packet number 5. ACK will record, expecting to receive packet number 5. After some time, packet 5 is received, and the next ACK will update the number. If packet 5 is still not received, but packet 6 or 7 is received, then the number in ACK will not change and will always show packet 5. This leads to a lot of ACK’s of duplicate content.

If the sender finds that it has received three consecutive ACK’s, or if it has timed out and has not received any ACK’s, it will acknowledge the loss, that is, the loss of packet 5, and send the packet again. Through this mechanism, TCP guarantees that no packets will be lost.

(Image description: Host B does not receive the data packet with the number of 100, and sends the same ACK continuously, triggering Host A to resend the data packet with the number of 100.)