Recently, I am making up for the knowledge of computer network. Before, I was vague about the three handshakes and four breakups of TCP, and I have no idea about the details. Recently, I have read a lot of knowledge about this aspect, and I am also studying computer network systematically to deepen my CS skills. Just read some good things and some of their own understanding of the second processing organization and then exchange and share, learning progress together, right this interview seems to often asked.

The original text is included in my GitHub blog (github.com/jawil/blog), like you can pay attention to the latest trends, we communicate and learn together, common progress, to the identity of learners to write blog, record bit by bit.

Popular understanding:

But why do you have to do three handshakes to make sure the connection is duplex, not once? Can’t you do it twice? Let’s take a real-life example of verbal communication between two people to simulate a three-way handshake.

Quoting some easy to understand examples on the Internet, although not quite correct, will be pointed out later, but does not hinder our understanding, is generally such a method of understanding.

First conversation:

Wife lets a go out dozen soy sauce, half way encounter a friend b, a asked 1: elder brothers did you have a meal yao?

Results b with headphones listening to the song, did not hear, no response. A heart think: talk to you also have no sound, don’t say with you, communication failure. Note If user B cannot receive the message sent by user A, the communication must fail.

If b hears what A says, the first conversation is successful, and the second conversation follows.

Second dialogue:

B heard what a said, but he is a foreigner, Chinese is not good, do not know what a said also do not know how to answer, so casually answered a learned Chinese: I went to the toilet. A listen to immediately laugh gush, “go to the toilet to have a meal”? Different ways are not mutually agreed, stay away from you, communication failure. The communication fails when user B cannot make a correct response.

If b heard a, made a correct response, and also asked: I have a meal, you? Then the second handshake is successful.

Through the first two dialogues, it is proved that B can understand what A says and make a correct response. Let’s move on to the third conversation.

Dialogue 3:

A just dozen a greeting with b, suddenly wife shout him, “you a dead ghost, dozen a soy sauce zha so along while, see I go home zha tidy up you”, a is a wife strict, listen to frighten to do not say second words ran home, put b oneself air that. B wish: this what person, get, I also go home, communication failure. The communication fails when user A cannot respond.

If a also makes the correct response: I also eat. So the third conversation is successful, the two have established a smooth communication channel, and then continue to chat.

Through the second and third conversations, it is proved that A can understand what B said and make a correct response.

It can be seen that the process of three dialogues is necessary for effective language communication between two people.

The first two (first and second) handshakes are used to ensure that the server receives the information from the client and responds correctly. The last two (second and third) handshakes are used to ensure that the client receives the information from the client and responds correctly.

That’s a good example. However, I feel why it is the third time instead of the second time. It is not to prove that PARTY A can understand and respond to Party B (that party B can correctly respond to Party A the second time indicates that the communication between them has been barrier-free), but to avoid wasting feelings in the following situations. The situation is like this (if the example is a little impractical) : Person A says hello to person B on the road, the sentence is blown away due to the wind or something, then person A says hello again, and person B hears and responds. The two people were able to communicate happily whether they were shaking hands three times or two times. 0.1 seconds later they waved goodbye four times. At this time, the words were blown away by the wind and spread to B’s ears, B thought that A wanted to communicate with him again, so made a response response. (The problem appears) If two handshakes are used, b will assume that A wants to communicate with him, so he will not stop waiting, wasting feelings. But if it is to use 3 handshakes, b waited for a while and found that A did not respond to him, he thought a left and left himself!

In fact, the third step is to prevent B from wasting his time by waiting, rather than to ensure that A can correctly respond to B’s message… We’ll talk about that later.

To paraphrase an answer quoted by others on Zhihu, from another perspective:

It was interesting to see a post on Google Groups’ TopLanguage discussing TCP’s “three-way handshake”. The post asks “why is TCP a three-way handshake?” “The nature of the problem is that the channel is unreliable, but dual communication needs to agree on something. To solve this problem, triple communication is the theoretical minimum, no matter what information you include in the message. So the three-way handshake is not a requirement of TCP itself, but rather a result of the need to reliably transmit information over unreliable channels. Notice the essential requirement here, the channel is unreliable, the data transfer is reliable. If you do it three times, then it doesn’t matter if you want to shake hands, send data, or have reliable information transmission. So, if the channel is reliable, meaning that whenever a message is sent, it will be received, or you don’t care if the message is received, you can just send it directly like UDP.” . This can be regarded as another solution to the purpose of “three handshakes”.

The above is pure plain English entertainment explanation, there may be deviation, the example may be a little inappropriate. Before we can really understand TCP’s three handshakes and four breakups, we must understand some basic concepts, and finally compare them with this simple example.

HTTP connection

HTTP, also known as Hypertext Transfer Protocol, is the foundation of Web networking and one of the protocols commonly used in mobile networking. HTTP is an application based on TCP. The most remarkable feature of HTTP connections is that each request sent by the client requires a response from the server, and when the request is complete, the connection is released. The process from establishing a connection to closing a connection is called one connection. 1) In HTTP 1.0, each request from the client required a separate connection, which was automatically released after the request was processed.

2) In HTTP 1.1, multiple requests can be processed in a single connection, and multiple requests can be overlapped without waiting for one request to finish before sending the next.

Because HTTP actively releases the connection at the end of each request, an HTTP connection is a “short connection” that requires continuous connection requests to the server to keep the client program online. The usual practice is that even though no data is needed, the client keeps sending a “stay connected” request to the server at regular intervals, and the server replies to the client after receiving the request, indicating that it knows the client is “online.” If the server cannot receive requests from the client for a long time, the client is considered offline. If the server cannot receive any reply from the server for a long time, the network is disconnected.

Principle of the SOCKET

Socket concept

Socket is the cornerstone of communication and the basic operation unit of network communication that supports TCP/IP protocol. It is an abstract representation of the endpoint in the process of network communication, and contains five necessary information for network communication: the protocol used for connection, the IP address of the local host, the protocol port of the local process, the IP address of the remote host, and the protocol port of the remote process. When the application layer communicates data through the transport layer, TCP encounters the problem of providing concurrent services for multiple application processes. Multiple TCP connections or multiple application processes may need to transfer data over the same TCP protocol port. To distinguish between different application processes and connections, many computer operating systems provide Socket interfaces for applications to interact with TCP/IP. The application layer can distinguish the communication from different application process or network connection with the transmission layer through Socket interface, and realize the concurrent service of data transmission.

Establishing a Socket Connection

Establishing a Socket connection requires at least one pair of sockets, one of which runs on the client, called ClientSocket, and the other on the server, called ServerSocket. The connection process between sockets is divided into three steps: server listening, client request, and connection confirmation. Server listening: The server socket does not locate the specific client socket, but is in the state of waiting for the connection, monitoring the network status in real time, waiting for the connection request of the client. Client request: a connection request is made by the client socket to the server socket. To do this, the client-side socket must first describe the socket of the server to which it is connecting, specifying the address and port number of the server-side socket, and then make a connection request to the server-side socket. Connection confirmation: When the server socket listens to or receives a connection request from the client socket, it responds to the request of the client socket, creates a new thread, and sends the description of the server socket to the client. Once the client confirms the description, the two sides formally establish a connection. The server socket continues to listen, receiving connection requests from other client sockets.

SOCKET connection and TCP connection

When creating a Socket connection, you can specify the transport layer protocol to be used. The Socket supports different transport layer protocols (TCP or UDP). When TCP is used for connection, the Socket connection is a TCP connection.

Socket connection and HTTP connection

Generally, a Socket connection is a TCP connection. Once a Socket connection is established, the communication parties can send data to each other until the connection is disconnected. But in the practical application of networks, communications between the client to the server often need through multiple intermediate nodes, such as routers, gateways, firewalls, etc., most of the default firewall will be shut down for a long time in the inactive state of the connection to the Socket connection disconnected, so you need to told network by polling, the connection is active. However, HTTP connection uses the “request-response” mode, which not only needs to establish a connection during the request, but also needs the client to send a request to the server before the server can reply to the data. In many cases, the server needs to actively push data to the client to keep real-time and synchronous data between the client and the server. At this time, if the two parties establish a Socket connection, the server can directly send data to the client; If both sides is HTTP connections, the server needs to wait until the client sends a request to the data back to the client, therefore, the client sends a connection request to the server regularly can not only keep online, is also in the “ask” if there is a new data server, if there is to transmit the data to the client. Transmission Control Protocol (TCP)

TCP is a host-to-host transmission control protocol that provides reliable connection services and uses the three-way handshake to confirm the establishment of a connection:

The bit code is the TCP flag bit. There are six identifiers :SYN(synchronous establishing connection) ACK(Acknowledgement) PSH(Push transmission) FIN(Finish) RST(Reset reset) URG(Urgent urgent) Sequence number = Acknowledge number

What is TCP?

Transmission Control Protocol (TCP) is a connection-oriented, reliable, byte stream – based transport layer communication Protocol.

I’m not going to go into the details of what TCP is; By the time you read this article, I’m sure you know the concept of TCP, and would like to learn more about TCP’s work, let’s move on. It’s just a super cumbersome protocol, and it’s the foundation of the Internet, and it’s a basic skill every programmer must have. Let’s start with OSI’s seven-tier model:



We need to know that TCP works in the Network OSI seven-layer model of the fourth layer – Transport layer, IP in the third layer – Network layer, ARP in the second layer – Data Link layer; The data at layer 2 is called Frame, the data at layer 3 is called Packet, and the data at layer 4 is called Segment. At the same time, we need to be simple to know that data is sent from the application layer, each layer is wrapped with header information, and then sent to the data receiver. The basic process that you need to know is that every piece of data is encapsulated and unencapsulated. In the OSI seven-layer model, the role of each layer and its corresponding protocol are as follows:



TCP is a protocol, how is this protocol defined, what is its data format? To go deeper, you need to understand, or even memorize, the meaning of each TCP field. Oh, come on.

The TCP header

The THREE parts of ACK SYN sequence number will be used below, and their introduction is also described below.

The above is the format of the TCP header, and since it is so important to understand everything else, the following is a detailed description of each field.

  • Source Port and Destination Port: 16-bit Source Port and Destination Port respectively. The IP address is used to distinguish different hosts. The source port number and destination port number combined with the source IP address and destination IP address in the IP header can uniquely determine a TCP connection.

  • Sequence Number: identifies the byte stream sent from the TCP source to the TCP receiver. It indicates the Sequence Number of the first byte in the packet segment in the data stream. It is mainly used to solve the problem of out-of-order network reports.

  • This Acknowledgment Number contains the next Acknowledgment that the end sending the Acknowledgment is expected to receive, so the Acknowledgment Number should be the last successfully received data byte Acknowledgment Number plus 1. However, this confirmation sequence number field is valid only if the ACK flag in the flag bit (described below) is 1. It is mainly used to solve the problem of not losing packets.

  • Offset: Gives the number of 32-bit words in the header. This value is needed because the length of the optional fields is variable. This field takes up to 4 bits, so TCP has a maximum of 60 bytes of header. However, there are no optional fields and the normal length is 20 bytes;

  • TCP Flags: There are 6 flag bits in the TCP header, many of which can be set to 1 at the same time. They are used to control the TCP state machine: URG, ACK, PSH, RST, SYN, and FIN. Each flag bit has the following meaning:

URG: This flag indicates that the TCP packet’s emergency pointer field (more on that later) is valid, and is used to ensure that the TCP connection is not interrupted and to urge the mid-tier device to process the data as quickly as possible.

ACK: This flag indicates that the reply field is valid, that is, the TCP reply number mentioned above will be included in the TCP packet. There are two values: 0 and 1. If the value is 1, the response field is valid; otherwise, it is 0.

PSH: This flag bit represents the Push operation. The Push operation means that the data packet is sent to the application immediately after it arrives at the receiving end, instead of queuing in the buffer.

RST: This flag indicates a connection reset request. It is used to reset faulty connections and reject faulty and invalid packets.

SYN: indicates the synchronization sequence number, which is used to establish a connection. The SYN flag bit and ACK flag bit are used together. When a connection request is made, SYN=1 and ACK=0. When the connection is answered, SYN=1, ACK=1; Packets with this flag are often used for port scanning. The scanner sends a packet containing only SYN. If the host responds with a packet, it indicates that the host has this port. However, this scanning method is only the first handshake of the TCP three-way handshake. Therefore, the success of this scanning method indicates that the scanned machine is not very secure. A secure host will force a connection to perform the TCP three-way handshake strictly.

FIN: Indicates that the sender reaches the end of data transmission, that is, data transmission is complete and no data can be transmitted. After the TCP packet with the FIN flag bit is sent, the connection is disconnected. Packets with this flag are also often used for port scanning.

  • Window: the size of a Window, known as a sliding Window, used for flow control This is a complex issue and will not be summarized in this blog post;

The information needed for the time being is:

ACK: According to the TCP protocol, this parameter is valid only when ACK=1. After the connection is established, the ACK value of all packets sent must be 1

SYN(SYNchronization) : used to synchronize the sequence number when a connection is set up. When SYN=1 and ACK=0, it indicates that this is a connection request packet. If the peer agrees to establish a connection, SYN=1 and ACK=1 should be set in the response packet. Therefore, a SYN value of 1 indicates that this is a connection request or connection accept message.

FIN (finis) means to finish, to terminate, to release a connection. If the FIN value is 1, the sender finishes sending data and requests to release the connection.

The three-way handshake:

What a clear picture, of course, I didn’t draw it, I just quote it to illustrate the point.

  1. First handshake: Establish a connection. The client sends a connection request packet segment and sets the SYN position to 1 and Sequence Number to X. Then, the client enters the SYN_SEND state and waits for confirmation from the server.
  2. Second handshake: The server receives a SYN packet segment. Context The server should acknowledge the SYN segment received from the client, and set this Acknowledgment Number to X +1(Sequence Number+1). Set the SYN position to 1 and Sequence Number to y. The server puts all the above information into a packet segment (namely, SYN+ACK packet segment) and sends it to the client. At this time, the server enters the SYN_RECV state.
  3. Third handshake: The client receives a SYN+ACK packet from the server. A. Then set this Acknowledgment Number to Y +1 and send an ACK segment to the server. After this segment is sent, both the client and the server enter the ESTABLISHED state and complete the TCP three-way handshake. Once the three-way handshake is complete, the client and server can begin transferring data. So that’s the overview of the TCP three-way handshake.

What about the four breakups?

When a client establishes a TCP connection with a server through a three-way handshake, it is necessary to disconnect the TCP connection when the data transfer is complete. For TCP disconnections, there is the mysterious “four breakups”.

  1. First break up: Host 1 (either client or server), set the Sequence Number and Acknowledgment Number, and send a FIN segment to host 2. At this point, host 1 enters the FIN_WAIT_1 state. This means that host 1 has no data to send to host 2.
  2. Second break off: Host 2 receives the FIN segment from host 1 and sends an ACK segment back to Host 1. This Acknowledgment Number is set to Sequence Number plus 1. Host 1 enters the FIN_WAIT_2 state. Host 2 tells host 1 that I “agree” to your shutdown request;
  3. Third breakup: Host 2 sends a FIN packet to host 1 to close the connection, and host 2 enters the LAST_ACK state.
  4. For the fourth time, host 1 receives the FIN packet from host 2 and sends an ACK packet to host 2. Then host 1 enters the TIME_WAIT state. Host 2 closes the connection after receiving the ACK packet from host 1. If host 1 does not receive a reply after waiting for 2MSL, then the Server is shut down.

At this point, TCP’s four breakups were happily completed. When you see here, your mind will have a lot of questions, a lot of don’t understand, feel very messy; All right, let’s keep summarizing.

Why three handshakes

In The fourth edition of Computer Network written by Xie Xiren, the purpose of “three-way handshake” is “to prevent the invalid connection request message segment from being suddenly transmitted to the server, resulting in errors”. In another classic book, Computer Networks, the purpose of the three-way handshake was to solve the problem of “delayed repeated grouping in networks”.

In his book Computer Network, Xie Xiren gave an example as follows:

Invalid connection request segment Is generated in this case: The first connection request segment sent by the client is not lost, but is detained on a network node for a long time. As a result, it is delayed until a certain time after the connection is released. Originally, this is an invalid packet segment. However, after the server receives the invalid connection request packet segment, it mistakenly thinks it is a new connection request sent by the client. Then the client sends a confirmation message to agree to establish a connection. Assuming that the “three-way handshake” is not used, a new connection is established as soon as the server sends an acknowledgement. Since the client does not send a connection request, it ignores the server’s confirmation and does not send data to the server. However, the server assumes that the new transport connection has been established and waits for data from the client. As a result, many of the server’s resources are wasted. The three-way handshake prevents this from happening. For example, the client does not issue an acknowledgement to the server’s acknowledgement. When the server receives no acknowledgement, it knows that the client has not requested a connection.”

This makes sense and prevents the server side from wasting resources by waiting.

Why break up four times

And what about the four breakups? TCP is a connection-oriented, reliable, byte stream – based transport-layer communication protocol. TCP is in full-duplex mode. When host 1 sends a FIN packet, it only indicates that host 1 has no data to send. Host 1 tells host 2 that all data has been sent. However, at this point, host 1 can still accept data from host 2; When host 2 returns an ACK packet, it indicates that it knows that host 1 has no data to send, but host 2 can send data to host 1. When host 2 also sends a FIN packet segment, host 2 also has no data to send, and tells host 1 that it has no data to send, and both parties happily break the TCP connection. If we want to understand the principle of four breakups correctly, we need to understand the state changes in the process of four breakups.

  • FIN_WAIT_1: The FIN_WAIT_1 and FIN_WAIT_2 states are both waiting for FIN packets from the peer. The difference between the two states is as follows: When the SOCKET is in the ESTABLISHED state, it tries to close the connection and sends a FIN packet to the peer. Then the SOCKET enters the FIN_WAIT_1 state. After the peer party responds to an ACK message, it enters the FIN_WAIT_2 state. Of course, under normal circumstances, the peer party should respond to an ACK message immediately, so the FIN_WAIT_1 state is difficult to see, and the FIN_WAIT_2 state can be seen sometimes by using netstat. (Active Party)
  • FIN_WAIT_2: This state has been explained in detail above. In fact, a SOCKET in the FIN_WAIT_2 state is a semi-connection, which means that one party wants to close the connection but also tells the other party that I have some data to send to you (ACK message) and will close the connection later. (Active Party)
  • CLOSE_WAIT: This state actually means waiting for closure. How do you understand that? When the peer sends a FIN packet to the peer after closing a SOCKET, the system responds with an ACK packet and enters CLOSE_WAIT state. The next thing you really need to consider is to see if you have any data to send to the peer. If not, you can close the SOCKET and send the FIN packet to the peer, closing the connection. So when you are in CLOSE_WAIT, all you need to do is wait for you to close the connection. (Passive)
  • LAST_ACK: The LAST_ACK status is easy to understand. It passively closes the FIN packet and waits for the ACK packet. After an ACK packet is received, the system enters the CLOSED state. (Passive)
  • TIME_WAIT: indicates that a FIN packet is received and an ACK packet is sent. The system returns to the CLOSED state after 2MSL. If the FINWAIT1 state receives a packet with both the FIN and ACK flags, the packet enters the TIME_WAIT state without going through the FIN_WAIT_2 state. (Active Party)
  • CLOSED: The connection is CLOSED.

Example:

TCP is used to control the transmission of data flows. Here is an example of browsing the web to explain the process from my own understanding. (Note: The second ACK belongs to the code segment ACK bit)

The packet transmitted during the handshake does not contain data. After three handshakes, the client and server start data transmission.

First handshake: The client sends a SYN packet (SYN = J) to the server and enters the SYN_SEND state. Second handshake: After receiving a SYN packet, the server must acknowledge the client’s SYN (ACK = J +1) and send a SYN packet (ACK = K). In this case, the server enters the SYN_RECV state. Third handshake: After receiving the SYN+ACK packet from the server, the client sends an ACK packet (ACK = K +1) to the server. After the packet is sent, the client and the server enter the ESTABLISHED state to complete the three-way handshake. The packet transmitted during the handshake does not contain data. After three handshakes, the client and server start data transmission. Ideally, once a TCP connection is established, it is maintained until either of the communication parties actively closes the connection. During disconnection, both the server and the client can initiate a request to disconnect the TCP connection. The disconnection process requires “four handshakes” (the process will not be detailed, that is, the server interacts with the client and finally determines the disconnection).

Corresponding instance

IP 192.168.1.116.3337 > 192.168.1.123.7788: S 3626544836:3626544836 IP 192.168.1.123.7788 > 192.168.1.116.3337: S 1739326486:1739326486 ACK 3626544837 IP 192.168.1.116.3337 > 192.168.1.123.7788: ACK 1739326487, ACK 1

First handshake: 192.168.1.116 sends a syn = 1, randomly generated seq number = 3626544836 packets to 192.168.1.123 192.168.1.123 by syn = 1 know 192.168.1.116 requirements set up online;

Second handshake: 192.168.1.123 After receiving the request, confirm the online information and send ack number=3626544837, SYN =1, ACK =1 to 192.168.1.116, randomly generating a packet with SEQ =1739326486.

Third handshake: After receiving the packet, 192.168.1.116 checks whether the ACK number is correct, that is, seQ number+1 sent for the first time, and whether the bit code ACK is 1. If yes, 192.168.1.116 sends ack number=1739326487, ACK =1. 192.168.1.123 If seq=seq+1 and ACK =1 is received, the connection is established successfully.

I think you get the idea

This is the end of the story, but learning about TCP is far from over. TCP is a very complex protocol, and here’s a little summary of what happens with TCP connections and disconnections, with a lot of “holes” that we’ll fill in later. All right, over!

Carrying articles

TCP three-way handshake description and release the connection process first briefly introduces the TCP three-way handshake. Why is the TCP three-way handshake? Why is the TCP three-way handshake not two or four times?

Github project address:Summary of each chapter of the HTTP Authoritative Guide that I have extracted by myself!