Please accept this TCP summary

preface

Nice to meet you

TCP these things, basically every program ape is more or less mastered. Although the feeling does not have what use in the actual development place, but, the interview he wants to ask

And the recent big guys after the New Year, are also preparing for spring recruitment, I am the same. After reading some okHttp source code, I came back and relearned TCP and HTTP. OkHttp is basically an implementation of these protocols, and understanding the source code is based on understanding TCP, HTTP.

After a review of TCP, I put these things together and this article is the result.

The knowledge characteristic of computer network is: triviality. By reciting “interview eight-part essay” estimate not long forgotten. TCP is a protocol of the transport layer of the computer network, so we must have a certain understanding of the stratification structure of the network and the transport layer. Then I’ll cover the four key points of TCP: connection-oriented, reliable transport, flow control, and congestion control, and finally add a little bit of sticky and unpack.

Network stratified structure

Consider the simplest case: communication between two hosts. In this case, only a network cable is required to connect the two devices and specify their hardware interfaces, such as USB, 10V voltage, and 2.4ghz frequency. This layer is the physical layer, and these specifications are the physical layer protocols.

We are certainly not satisfied with just two computers connected, so we can use switches to connect multiple computers, as shown below:

The network thus connected is called a LAN or an Ethernet (Ethernet is a type of LAN). In this network, we need to identify each machine so that we can specify which machine to communicate with. This identifier is the hardware address MAC. Hardware addresses are determined with the production of the machine and are permanently unique. On a LAN, when we need to communicate with another machine, we only need to know its hardware address, and the switch will send our message to the corresponding machine.

Here we can separate the physical layer and create a new layer on top of it, which is the data link layer, regardless of how the Ethernet interface at the bottom is sent.

We are still not satisfied with the size of the LAN, we need to connect all lans, at this time we need to use the router to connect two Lans:

However, if we still use hardware addresses as the unique identification of communication objects, then when the network scale becomes larger and larger, it is unrealistic to remember the hardware addresses of all machines; At the same time, a network object may change devices frequently, which makes the hardware address table more complex to maintain. Here a new address is used to mark a network object: the IP address.

Consider a simple example of sending a letter to understand IP addresses.

I live in Beijing and my friend A lives in Shanghai. I want to write to friend A:

After writing the letter, I will write my friend A’s address on the letter and put it in the Beijing Post Office (attach the target IP address to the message and send it to the router).

The post office will help me transport the letter to the local post office in Shanghai (the information will be routed to the router on the target IP LAN).

The local router in Shanghai will help me deliver the letter to my friend A (Intranet communication)

Therefore, the IP address is A network access address (friend A’s address), I just need to know the destination IP address, the router can send me the message. On a LOCAL area network (LAN), you can dynamically maintain a mapping between MAC addresses and IP addresses, and send MAC addresses based on destination IP addresses.

So we don’t have to manage how the machine is selected at the bottom, we just need to know the IP address to communicate with our target. This layer is the network layer. The core function of the network layer is to provide logical communication between hosts. In this way, all the hosts in the network are logically connected. The upper layer only needs to provide the target IP address and data, and the network layer can send the message to the corresponding host.

A host has multiple processes that communicate with each other, such as chatting on wechat with your girlfriend while opening a black phone with your friends. My phone is communicating with two different machines at the same time. So when my mobile phone receives data, how to distinguish wechat data, or the king’s data? Then you must add another layer on top of the network layer: transport layer:

The transport layer further separates network information through sockets. Different application processes can make network requests independently without interfering with each other. This is the essence of the transport layer: it provides logical communication between processes. The process can be between hosts or the same host, so in Android, socket communication is also a way of process communication.

Now that application processes on different machines can communicate independently, we can develop various applications on the computer network: HTTP for web pages, FTP for file transfer and so on. This layer is called the application layer.

The application layer can be further split into the presentation layer and the session layer, but the essence of both remains the same: fulfilling specific business requirements. In contrast to the following four layers, they are not required and can be grouped into the application layer.

Finally, summarize the network stratification:

The lowest physical layer, responsible for direct hardware communication between two machines;
The data link layer uses hardware address in LAN to realize LAN communication.
At the network layer, logical communication between hosts is realized through abstract IP addresses.
Based on the network layer, the transport layer separates the data and realizes the independent network communication of application process.
Application layer based on the transport layer, according to the specific needs of the development of the form of the function.

It is important to note that layering is not physical layering, but logical layering. Through the encapsulation of the underlying logic, the development of the upper layer can directly rely on the underlying functions without paying attention to the specific implementation, which facilitates the development.

This layered approach, also known as the responsibility chain design mode, can separate different responsibilities through layers of encapsulation, which is more convenient for development and maintenance. The interceptor design pattern in okHttp is the same chain of responsibility pattern.

Transport layer

This article focuses on TCP, but you need to add some knowledge of the transport layer.

Essence: Provides process communication

The network layer, below the transport layer, does not know which process the packet belongs to and is only responsible for receiving and sending the packet. The transport layer is responsible for receiving data from different processes to the network layer and splitting the data from the network layer to different processes. Converging from the top down to the network layer is called multiplexing, and splitting from the bottom up is called multiplexing.

The performance of the transport layer is limited by the network layer. This makes sense; the network layer is the underlying support for the transport layer. Therefore, the transport layer cannot determine the upper limit of its bandwidth, delay, etc. But you can develop more features based on the network layer: reliable transport, for example. The network layer is only responsible for trying to get packets from one end to the other, without ensuring that the data is accessible and complete.

Underlying implementation: Socket

As mentioned earlier, the simplest transport layer protocol provides independent communication between processes, but the underlying implementation is independent communication between sockets. In the network layer, the IP address is a host logical address, while in the transport layer, the socket is a process logical address. Of course, a process can have multiple sockets. An application process can listen on a socket to obtain messages received by the socket.

Take an example to understand sockets. The following figure

Each host can create multiple sockets to receive messages. For example, if the wechat process of host A wants to send A message to the wechat of host B, it only needs to send the message to the socketC of host B, and the wechat of host B will get the message from the socketC. (Of course, the actual process is not like this. Our messages need to go through the wechat background server. This is just an example.)

Similarly, if host B’s QQ wants to send A message to host A’s QQ, it only needs to send the message to socketB, and host A’s QQ can get the message.

The socket is not a real thing, but an object abstracted from the transport layer. The transport layer adds the concept of ports to differentiate between different sockets. A port refers to the number of network communication ports on a host. Each port has a port number. The number of ports is determined by transport layer protocols.

Different transport layer protocols define sockets in different ways. In UDP, a socket is defined by the destination IP address plus the destination port number. TCP defines a socket using the target IP address + target port number + source IP address + source port number. We just need to attach this information to the header of the transport layer packet, and the target host will know which socket we want to send to, and the corresponding process listening on the socket will get the information.

Transport layer agreement

The transport-layer protocols are known as TCP and UDP. UDP is the most simplified transport layer protocol, which only implements inter-process communication. On the basis of UDP, TCP achieves reliable transmission, flow control, congestion control, connection-oriented and other features, but also more complex.

Of course, there are many better transport-layer protocols, but the most widely used are TCP and UDP. UDP will also be covered later.

TCP header

The TCP protocol, expressed in the packet, is attached to a TCP header before the data transmitted from the application layer. This header is attached with TCP information. Let’s take a look at the structure of this header as a whole:

This picture is from my university teacher’s courseware, very easy to use, so I have been used to study. The lowest part represents the relationship between packets. The TCP data part is the data transmitted from the application layer.

The TCP header has a fixed length of 20 bytes, followed by an optional 4 bytes. There are a lot of things, but some of them are familiar: source ports, destination ports. Huh? Socket does not also need IP to locate? IP addresses are attached at the network layer. The rest of the content will be explained slowly, as a summary of the article, here is released for review:

The head parameters	The number of bytes	role
Source port and destination port fields	Two bytes each	A socket is defined by port number and IP number. The socket indicates the host port that sends messages and the host port that receives messages
The serial number field	4 bytes	Each byte in the data stream transmitted over a TCP connection is numbered. The value of the ordinal field refers to the ordinal number of the first byte of the data sent in the paragraph. The length is 4 bytes, so the serial number range is 0,2 ^ 32-1.
Confirmation number field	4 bytes	Is the sequence number of the first byte of the data expected to be received in the next packet segment from the peer.
Data offset (i.e., header length)	four	Indicates the TCP packet segmentStart of dataDistance TCPThe beginning of the packet segmentHow far is it. The unit of “data offset” is 32 bits (4 bytes)
Keep field	six	Reserved for future use, but should be set to 0 for now
Emergency URG	1 a	When URG =1, the emergency pointer field is valid. It tells the system that there is urgent data in this message segment and that it should be transmitted as soon as possible (equivalent to high-priority data)
Confirm ACK	1 a	The confirmation number field is valid only if ACK=1. When ACK = 0, the confirmation number is invalid. Set the flag bit to 1 when receiving a packet and sending an acknowledgement message to the sender.
Push PSH	1 a	Receiving TCP receives a packet segment with PSH = 1 and delivers it to the receiving application as soon as possible, rather than waiting for the entire cache to fill up.
Reset RST	1 a	When RST =1, a serious error occurs in the TCP connection (for example, due to a host crash or other reasons) and the connection must be released before the transport connection is re-established.
Synchronous SYN	1 a	Synchronous SYN = 1 indicates that this is a connection request or connection accept message.
Termination of the FIN	1 a	Used to release a connection. FIN = 1 Indicates that the data on the sender end of the packet segment has been sent and the transport connection is released
Window field	2 –	The sender receives what is left of the cachebyteNumber, note in bytes.
Inspection and	2 –	The scope of validation and field validation includes the header and data parts. When calculating the checksum, a 12-byte dummy header is added to the front of the TCP packet segment. Check whether an error occurs in the packet, for example, a ‘1’ is changed into a ‘0’.
Emergency pointer field	2 –	Specifies the number of bytes of emergency data in the column (emergency data is placed first in the column)
Option field	The length of the variable	TCP initially provides only one option, the maximum packet segment length (MSS). The MSS tells the peer TCP: “This is the segment of the packet that my cache can receiveThe data fieldsThe maximum length is MSS bytes.
Populate the fields	indefinite	This is so that the entire header length is a multiple of 4 bytes.

The options field contains the following additional options:

options	role
Window enlargement options	It takes 3 bytes, one of which represents the shift value S. The new window value is equal to the number of window bits in the TCP header increased to (16 + S), which is equivalent to moving the window value S bits to the left to obtain the actual window size
Timestamp option	It takes 10 bytes, among which the most important fields are timestamp value field (4 bytes) and timestamp echo reply field (4 bytes), which are mainly used to calculate the round-trip time of datagram transmission in the network.
Select confirm option	The receiver has received two byte blocks that are not contiguous with the previous byte stream and needs to tell the sender the range of datagrams it has received so far. Each segment requires two boundaries, one of which requires 4 bytes to represent, and the option field is 40 bytes long, so it can represent up to 4 received fields.

Come back to these fields and you’ll be familiar with them.

TCP Byte stream oriented feature

Rather than header the data from the application layer and send it to the target, TCP treats the data as a stream of bytes, numbers them and sends them in parts. This is the byte stream oriented feature of TCP:

TCP reads data from the application layer as a stream and stores it in its send cache, numbering the bytes
TCP selects an appropriate number of bytes from the sender buffer to compose TCP packets and sends them to the destination through the network layer
The target reads the bytes and stores them in its own receiver buffer and delivers them to the application layer when appropriate

Byte stream oriented too much advantage is no need for a storage of data takes up too much memory, the downside is that can’t know the meaning of these bytes represent, for example, the application layer to send an audio file, and a text file, for TCP is a string of bytes, make no sense, which leads to sticky packets and unpacking, speak later.

Principle of reliable transmission

As I said, TCP is a reliable transport protocol, which means that if you give him a piece of data, he’ll be able to send it to the destination without any problem, unless the network blows up. The network model he implemented is as follows:

For the application layer, it is a reliable transmission of the underlying support services; The transport layer adopts the unreliable transmission of the network layer. Although protocols can be used at the network layer or even the data link layer to ensure the reliability of data transmission, the design of the network will be more complicated and the efficiency will be reduced. It would be more appropriate to place the reliability assurance of data transmission at the transport layer.

The principle of reliable transmission is summarized as follows: sliding window, timeout retransmission, cumulative confirmation, selection confirmation and continuous ARQ.

Stop waiting protocol

To achieve reliable transmission, the easiest way is: I send a packet to you, then you reply with me received, I continue to send the next packet. The transmission model is as follows:

This “come, go” approach to ensuring reliable transmission is known as stop-and-wait. The TCP header has an ACK field. When it is set to 1, it indicates that the packet is an acknowledgement packet.

Then consider the case of packet loss. The network environment is not reliable, so the data packet sent by machine A may be lost. If machine A sends the data packet and loses it, machine B will never receive the data, and machine A will always wait. The solution to this problem is: timeout retransmission. When machine A sends A packet, it starts to time. When it does not receive A confirmation reply, it can consider that packet loss has occurred and send again, that is, retransmission.

However, retransmission will cause another problem: if the original packet is not lost but stays in the network for a long time, machine B will receive two packets. Then, how can machine B tell whether the two packets belong to the same data or different data? This requires the method described earlier: numbering the data bytes. In this way, the receiver can determine whether the data is coming or retransmitted according to the byte number of the data.

There are two fields in the TCP header: the serial number and the acknowledgement number, which indicate the number of the first byte of the sender’s data and the number of the first byte of the receiver’s expected next data. TCP is byte oriented, but instead of sending byte by byte, it intercepts a whole segment at a time. The length of interception is affected by many factors, such as the size of the cache data, the frame size limited by the data link layer, etc.

Continuous ARQ protocol

The stop-wait protocol can satisfy reliable transmission, but it has a fatal shortcoming: low efficiency. The sender sends a packet and then goes to wait, wasting resources while doing nothing. The solution is to send packets continuously. The model is as follows:

The biggest difference with stop waiting is that it will send continuously. After receiving data continuously, the receiver will confirm and reply one by one. This greatly improves efficiency. But it also raises some additional questions:

Can I send indefinitely until all the data in the buffer is sent? Can’t. Because you need to consider the receiver buffer and the ability to read the data. If the packet is sent too fast for the receiver to accept, the packet is retransmitted frequently, wasting network resources. Therefore, the range of data sent by the sender needs to take into account the situation of the receiver buffer. This is TCP traffic control. The solution: sliding Windows. The basic model is as follows:

The sender needs to set the size of the window that can be sent according to the buffer size of the receiver. Data in the window can be sent, and data outside the window cannot be sent.
When the data in the window receives an acknowledgement reply, the entire window moves forward until all the data has been sent

In the header of TCP, there is a window size field that indicates the size of the remaining buffer for the receiver, allowing the sender to adjust its own send window size. By sliding Windows, TCP traffic is controlled so that too much data is not lost.

The second problem with continuous ARQ is that the network is flooded with acknowledgement replies of the same volume as the sent packets, because for every sent packet, there must be an acknowledgement reply. The way to improve network efficiency is: cumulative validation. The receiver does not need to reply one by one, but after accumulating a certain number of packets, tells the sender that all the data before this packet has been received. For example, if 1234 is received, the receiver only needs to tell the sender that I received 4, and the sender knows that 1234 was received.

The third problem is how to deal with packet loss. In the stop-wait protocol it is very simple, just a timeout retransmission is resolved. But it’s not the same in a continuous ARQ. For example, the receiver receives 123 567, six bytes, and the byte number 4 is missing. With cumulative confirmation, only 3 acknowledgements can be sent, and 567 must be discarded because the sender will retransmit. That’s the GBN (Go-back-n) idea.

However, we find that we only need to retransmit 4, which is not a waste of resources, so here we have: select confirm SACK. In the option field of TCP packets, you can set the received packet segment. Each packet segment requires two boundaries. This allows the sender to retransmit only lost data based on the option field.

Summary of reliable transmission

So far, the principle of TCP reliable transmission has been introduced. Finally, a summary:

Each packet is guaranteed to reach the receiver through continuous ARQ protocol and send – acknowledge reply mode
Indicates whether each piece of data is retransmitted or new by numbering the bytes
The packet loss problem is solved by timeout retransmission
Flow control is achieved through sliding Windows
The efficiency of confirmation reply and retransmission can be improved by means of cumulative confirmation + selective confirmation

Of course, this is just the tip of the iceberg for reliable delivery, and you can dig a little deeper (chatting with the interviewer is pretty much enough).

Congestion control

Congestion control takes into account another problem: to avoid serious packet loss and reduced network efficiency caused by overcrowding.

Take the real world of transportation:

The number of cars that can pass on the expressway at the same time is certain, when holidays, there will be serious traffic jams. In TCP, the packet times out and is retransmitted. In other words, more cars come in and traffic is blocked. Finally, the result is: packet loss – retransmission – packet loss – retransmission. Eventually the whole network went down.

Congestion control here is not the same thing as the previous flow control. Flow control is a means of congestion control: to avoid congestion, traffic must be controlled. Congestion control aims to limit the amount of data sent by each host to prevent network congestion. Just like in Guangzhou and other places, it is a reason to limit the number of cars to travel. Otherwise everyone’s stuck in traffic, and no one’s leaving.

The solution of congestion control is flow control, and flow control is realized by sliding window, so congestion control ultimately limits the flow by limiting the size of the sender’s sliding window. Of course, the means of congestion control is not only traffic control, congestion factors include: router cache, bandwidth, processor speed and so on. Improving hardware capacity (from 4 lanes to 8 lanes) is one of the methods, but after all, there is a bottleneck in hardware improvement, which cannot be continuously improved. We still need to add algorithms from TCP itself to solve congestion.

Congestion control focuses on four aspects: slow start, fast recovery, fast retransmission, and congestion avoidance. There are still powerpoint pictures of university teachers:

The Y-axis represents the size of the sender window, and the X-axis represents the number of rounds sent (not the byte number).

At the beginning, the window is set to a smaller value, and then doubled in size each round. It’s a slow start.
When the window value reaches the SSthRESH value, which is a window limit value that needs to be set according to the real-time network condition, it starts to enter congestion avoidance, and raises the window value by 1 each round, slowly testing the bottom line of the network.
If a data timeout occurs, congestion is highly likely, and then go back to the slow start and repeat the previous steps.
If you receive three identical replies, it indicates that the network is not in good condition. Set ssthRESH to half of the original value to continue congestion avoidance. This part is called fast recovery.
If the packet loss message is received, the lost packet should be retransmitted as soon as possible. This is fast retransmission.
Of course, the final limit of a window cannot go up indefinitely; it cannot exceed the size of the receiver’s cache.

With this algorithm, network congestion can be avoided to a large extent.

In addition, the router can tell the sender when the cache is about to be full, instead of waiting for a timeout to occur, which is active queue management AQM. There are many other methods, but the above algorithm is the main one.

connection-oriented

This section is all about the TCP three-way handshake and four-way wave. After the previous section, it’s actually pretty easy to understand.

TCP is connection-oriented. What is a connection? The connection here is not a real connection, but a record between the two communicating parties. TCP is a full-duplex communication, that is, data can be sent to each other, so both parties need to record each other’s information. According to the principle of reliable transmission, both parties of TCP communication need to prepare a receiving buffer for each other to receive data from each other, remember the socket of the other party to know how to send data, remember the buffer of the other party to adjust the size of their Windows, and so on. These records are a connection.

In the section of transport layer, the address of communication between transport layer and TCP is defined by socket, and TCP is no exception. A TCP connection can only have two objects, that is, two sockets, not three. Therefore, the definition of a socket requires the source IP address, source port number, destination IP address, and destination port number to avoid confusion.

If TCP and UDP define a socket using only the destination IP address and destination port number, multiple senders may simultaneously send packets to the same destination socket. In this case, TCP cannot distinguish whether the data is from different senders, which results in an error.

Since it is a connection, there are two key points: establish a connection and disconnect a connection.

Establish a connection

The purpose of a connection is to exchange information and then remember each other’s information. So both parties need to send each other information:

However, the previous reliable transmission principle tells us that data transmission in the network is not reliable, and we need a confirmation reply from the other party to ensure the correct arrival of the message. The diagram below:

The confirmation received by machine B and the information received by machine B can be combined to reduce the number of times; And sending machine B to machine A itself means that machine B has received the message, so the final example graph is:

The steps are as follows:

Machine A sends A SYN packet to machine B for establishing A TCP connection and adds its own receive buffer information. Machine A enters the SYN_SEND state, indicating that the request has been sent and is waiting for A reply.
After receiving the request, machine B records it according to the information of machine A and creates its own receiving cache to send syn+ ACK composite packet to machine A. At the same time, machine B enters the SYN_RECV state, indicating that it is ready and can send data to A after waiting for machine A’s reply.
After receiving the reply, machine A records machine B’s message and sends an ACK message. Machine A enters the ESTABLISHED state, indicating that it is ready to send and receive.
Machine B enters the ESTABLISHED state after receiving the ACK data.

A three-way message is called a three-way handshake.

disconnect

Disconnecting is similar to the three-way handshake.

After sending data, machine A requests to disconnect from machine B and enters the FIN_WAIT_1 state, indicating that data has been sent and A FIN packet has been sent (FIN flag bit is 1).
After receiving the FIN packet, machine B replies with the ACK packet, indicating that it has received the packet. However, machine B may still have data to send and enters the CLOSE_WAIT state, indicating that the other party has sent and requests to close the connection. After sending the packet, machine B can close the connection.
After sending data, machine B sends a FIN packet to machine B and enters the LAST_ACK state, indicating that the connection is closed after receiving an ACK packet.
After machine A receives the FIN packet, machine B replies with an ACK packet and enters the TIME_WAIT state
The TIME_WAIT state is special. When machine A receives the FIN packet from machine B, it can close the connection directly under ideal conditions. But:
1. We know that the network is unstable, maybe machine B sent some data that has not arrived (slower than FIN packet);
2. At the same time, the ACK packet may be lost, and machine B retransmits the FIN packet.
If machine A closes the connection immediately, data is incomplete and machine B cannot release the connection. Therefore, machine A needs to wait for the maximum duration of two packets to ensure that there are no remaining packets on the network before closing the connection
Finally, after machine A waits for the maximum survival time of two packets, machine B closes the connection and enters CLASED state after receiving ACK packets

The two parties send messages to each other four times to disconnect, which is called four waves of the hand.

Now, why three handshakes and four waves, does it have to be three/four, why stay 2msl before closing the connection, etc., are all solved.

UDP protocol.

Transport layer protocol in addition to TCP, there are well-known UDP. If TCP is unique because of its perfect and stable function, UDP is the old master of simplification.

UDP implements the least of the transport layer’s functions: interprocess communication. For data passed down from the application layer, UDP simply adds a header and passes it directly to the network layer. The UDP header is very simple, with only three parts:

Source port, destination port: The port number is used to distinguish different processes on the host
Check code: Used to verify that the packet does not have any errors during transmission. For example, a 1 changes to a 0
Length: indicates the length of the packet

Therefore, UDP has only two functions: checking whether datagrams have errors and distinguishing between different processes.

However, TCP functions although many, but also to pay the corresponding price. Connection-oriented features, for example, have overhead in establishing and disconnecting connections; Congestion control features can limit the upper limit of transmission and so on. Here are the pros and cons of UDP:

The disadvantage of UDP

UDP is an unreliable transport protocol that cannot guarantee the complete and correct arrival of messages.
Lack of congestion control leads to network system breakdown due to competing resources

The advantages of UDP

Faster efficiency; There is no need to establish connections and congestion control
Connect more customers; No connection state, no need to create a cache for each customer, etc
Small packet header bytes, small overhead; The TCP header has a fixed header of 20 bytes, while UDP has only 8 bytes. A smaller header means a larger portion of the data
It can be used in some scenarios that require high efficiency and allow limited error. For example, in the live broadcast scenario, it is not necessary to ensure the complete arrival of each packet, allowing a certain packet loss rate. In this case, the reliability of TCP becomes a liability. Streamlined UDP for greater efficiency is a more suitable choice
Can broadcast; UDP is not connection-oriented and can send packets to multiple processes at the same time

UDP Application Scenarios

UDP applies to scenarios where the transmission model needs to be highly customized at the application layer, packet loss is allowed, high efficiency is required, and broadcasting is required. For example,

Live video
DNS
RIP Routing protocol

Other supplementary

Block transmission

It can be found that when transmitting data, the transport layer does not directly send the whole packet by adding a header, but divides it into multiple packets and sends them separately. Then why did he do it?

As you might imagine, the data link layer limits the data length to 1460. So why is the data link layer so restricted? His essential reason is: the network is unstable. If the packet is too long, it is very likely to be interrupted suddenly in the middle of transmission. In this case, the whole data has to be retransmitted, which reduces the efficiency. Split the data into multiple datagrams, so that when a datagram is lost, it only needs to be retransmitted.

Is it possible to break it down as much as possible? If the length of the data field in the packet is too low, the proportion of the header is too large. In this case, the header becomes the biggest burden of network transmission. For example, the header of a 1000-byte packet is 40 bytes. If the packet is divided into 10 packets, only the 400-byte header needs to be transmitted. If you split it into 1,000, then you need to transfer 40,000 bytes of data at the head, which is a huge loss of efficiency.

Routing conversion

Take a look at the picture below:

Normally, host A’s data packets can be transmitted through the 1-3-6-7 path
If route 3 is broken, then it can be transmitted from 1-4-6-7
If the four is also broken, it can only be transmitted from 2-5-6 to 7
If the 5 is broken, then the line is disconnected

It can be seen that the benefits of routing and forwarding are as follows: the fault tolerance rate of the network is improved. The essential reason is that the network is still unstable. Even if a few routers went down, the network was still open. However, if router 6 fails, host A and host B cannot communicate directly. Therefore, the existence of such A core router should be avoided.

Other benefits of routing include: routing. If one line is too congested, it can be transferred from another route to improve efficiency.

Sticking and unpacking

As described in the byte stream oriented section, TCP does not understand the meaning of these data streams. It only knows how to take the data streams from the application layer, cut them into packets, and send them to the target. This is more likely if the application layer is sending two packets:

The application layer needs to send two copies of data, one audio and one text, to the target process
TCP only knows that it receives a stream and splits the stream into four segments to send
In the middle, the data of the second packet appears and the data of the two files are mixed together, which is sticky packet
After the target process application layer receives the data, it needs to split the data into two correct files, which is called unpacking

Wrapping and unwrapping are problems to be solved at the application layer. Special bytes can be appended at the end of each file, such as newline characters. Or control each packet to contain only one file of data, insufficient with 0 supplement and so on.

Malicious attacks

TCP’s connection-oriented features can be exploited by malicious individuals to attack servers.

As we know, when we send a SYN packet to a host to create a connection, the server creates a buffer for us, etc., and then sends us a SYN + ACK packet. If we falsify IP and port and flood a server with requests, the server will create a large number of half-created TCP connections, making it unable to properly respond to user requests and causing the server to crash.

Solutions include limiting the number of connections created by IP, allowing half-created TCP connections to close themselves in a shorter time, delaying the allocation of receive buffer memory, and so on.

A long connection

Each request we make to the server creates a TCP connection, which is closed when the server returns data; Closing TCP connections frequently can be a waste of resources if there are a large number of requests in a short period of time. So we can keep the TCP connection open and make requests during this period to improve efficiency.

Pay attention to the duration and creation conditions of long connections to prevent a large number of long connections from being used maliciously and consume server resources.

The last

When I used to study, I felt that these things were useless and seemed to be used for exams. As a matter of fact, it is difficult to have a deeper understanding of these knowledge when they are not applied. For example, when I read the summary above, most of them are just superficial cognition without knowing the real meaning behind them.

But as I learn more extensively and deeply, I will have a deeper and deeper understanding of these knowledge. There were a few moments when I thought: Oh, that thing is used like this, that thing is like this, so it is really useful to learn.

Now you may have no feeling after learning, but when you use or learn relevant applications, you will have a sense of insight and instantly harvest a lot.

Please leave a thumbs-up to encourage the author

Full text here, the original is not easy, feel help can like collection comments forward. The author is uneducated, the article has the error or has the different viewpoint welcome the comment area exchange. If need to reprint please comment area or private letter to inform.

And welcome to my blog: Portal