preface

Welcome to the article of “Programmer Cxuan”. From now on, you will be my reader.

My Github bestJavaer has included this article in the catalog

Github.com/crisxuan/be…

I hope you can give me a star!

This article is the fourth in a series on computer networking. Read the history article

Accidentally drew 24 map analysis network application layer protocol!

Basic TCP/IP knowledge

Summary of basic knowledge of computer network

Let’s begin this article.

Transportation layer, located between application layer and network layer, is the fourth layer in OSI layered system and an important part of network architecture. The transport layer is responsible for end-to-end communication on the network.

The transport layer plays a critical role in communication between applications running on different hosts. Now let’s discuss the agreement on the transport layer

Transport Layer Overview

The transportation layer of a computer network is much like a highway, which moves people or things from one end to the other, and the transportation layer of a computer network moves messages from one end to the other, which is the end system. In computer network, any medium that can exchange information can be called end system, such as mobile phone, network media, computer, operator, etc.

When transporting packets at the transport layer, it complies with certain protocol specifications, such as the data limit of one transmission and the transport protocol selected. The transport layer implements logical communication between two unrelated hosts, as if two hosts were connected.

Transport layer protocols are implemented in the end system, not in the router. Routing only identifies addresses and forwards them. This is like the Courier to send express, of course, by the address of the recipient is XXX building XXX unit XXX room this person to judge!

How does TCP determine which port it belongs to?

Remember the structure of the packet? Here’s a review

After the packet passes through each layer, the layer protocol attaches the packet header to the packet. A complete packet header diagram is shown above.

After data is transferred to the transport layer, a TCP header is attached to it, which contains the source port number and destination port number.

At the sending end, the transport layer converts the packets received from the sending application process into transport layer groups, also known as segments in computer networks. The transport layer generally divides the packet segments into smaller pieces, adds a transport layer header to each piece and sends it to the destination.

In the process of sending, the transport layer protocols (i.e., transportation vehicles) are mainly TCP and UDP. The selection and characteristics of these two transport protocols are also the focus of our discussion.

Pre-knowledge of TCP and UDP

Among TCP/IP protocols, TCP and UDP are the most representative ones that can realize the transport layer function. When it comes to TCP and UDP, we need to start with the definition of these two protocols.

TCP is called Transmission Control Protocol (TCP). From its name, WE can roughly know that TCP has the function of controlling Transmission, mainly reflected in its controllable, controllable means reliable. Indeed, it is true. TCP provides a reliable, connection-oriented service for the application layer that can reliably transport packets to the server.

Known by its name as User Datagram Protocol (UDP), UDP focuses on datagrams and provides the application layer with a way to send datagrams directly without establishing a connection.

How is it that computer network jargon has so much to say about a single piece of data?

In a computer network, there are different descriptions between different layers. As mentioned above, transport layer packets are referred to as packet segments. In addition, TCP packets are referred to as packet segments, while UDP packets are referred to as datagrams and network layer packets are referred to as datagrams

However, in order to unify, we generally call TCP and UDP packets as packet segments in computer networks. This is quite a convention, and there is no need to tangle how to call them.

The socket

Before sending TCP or UDP packets, you need to pass through a door, namely a socket. The socket connects up to the application layer and down to the network layer. In the operating system, the operating system provides Application Programming interfaces for applications and hardware. In computer network, socket is also an interface, it also has interface API.

When TCP or UDP is used for communication, the socket API is widely used to set the IP address and port number and send and receive data.

Now we know that Socket and TCP/IP are not necessarily related, Socket is only convenient for the use of TCP/IP, how to facilitate the use of it? You can use these methods directly from the Socket API below.

methods describe
create() Create a socket
bind() Socket identifier, commonly used to bind port numbers
listen() Prepare receiving connection
connect() Prepare to act as a sender
accept() Prepare to be the receiver
write() To send data
read() Receive data
close() Close the connection

Socket type

There are three main types of sockets, which are described below

  • Datagram Sockets: datagram socket provides oneThere is no connectionAnd can not guarantee the reliability of data transmission. Data may be lost or duplicated during transmission, and sequential data receipt cannot be guaranteed. Datagram socket usageUser DatagramProtocol (UDP)Data transmission. Because datagram socket cannot guarantee the reliability of data transmission, it is necessary to deal with the possible data loss in the program.
  • Stream Sockets: Streaming sockets are used to provide connection-oriented, reliable data transfer services. Ensure the reliability and sequence of data. Stream sockets provide reliable data services because of their use of transmission control protocol, i.eThe Transmission Control Protocol (TCP)
  • Raw Sockets: Raw sockets allow IP packets to be sent and received directly without any protocol-specific transport layer format. Raw sockets can read and write IP packets that are not processed by the kernel.

Socket processing

In a computer network, to achieve communication, you must need at least two end systems, at least a pair of two sockets. The following is the communication process of the socket.

  1. The API in the socket is used to create an endpoint on a communication link. After the endpoint is created, a description of the socket is returnedSocket descriptor.

Just as file descriptors are used to access files, socket descriptors are used to access sockets.

  1. When an application has a socket descriptor, it can bind a unique name to the socket, and the server must bind a name to access it on the network
  2. After the socket is assigned to the server and the name is bound to the socket using bind, the LISTEN API is called.listenTo indicate the client’s willingness to wait for the connection, Listen must be called before the Accept API.
  3. The client application is invoked on a stream socket (based on TCP)connectInitiate a connection request to the server.
  4. Server application usageacceptThe API accepts client connection requests, and the server must successfully call bind and LISTEN before calling the Accept API.
  5. After establishing a connection between the streaming socket, the client and server can make read/write API calls.
  6. Called when the server or client wants to stop an operationcloseThe API releases all system resources acquired by the socket.

Although the socket API resides in the communication model between the application layer and the transport layer, the socket API does not belong to the communication model. The socket API allows applications to interact with the transport and network layers.

Before we move on, let’s play a brief interlude to talk about IP.

Talk about IP

IP stands for Internet Protocol. It is a network layer Protocol in the TCP/IP system. IP was designed to solve two main types of problems

  • Improve network scalability: Achieve large-scale network interconnection
  • Decouple the application layer and link layer to make them develop independently.

IP is the core of the entire TCP/IP protocol family and constitutes the foundation of the Internet. In order to realize the interconnection of large-scale networks, IP pays more attention to adaptability, simplicity and operability, and sacrifices some reliability. IP does not guarantee the delivery time and reliability of packets. Packets transmitted may be lost, repeated, delayed or out of order.

We know that the next layer of TCP protocol is IP protocol layer, since IP is not reliable, so how to ensure that the data can accurately arrive?

This brings us to the TCP transport mechanism, which we’ll talk about later.

The port number

Before we talk about port numbers, let’s talk about file descriptions and the relationship between sockets and port numbers

For the convenience of resource use, improve the performance, efficiency and stability of the machine, and so on reasons, we have a layer of computer software called an operating system, which is used to help us to manage the resources of the computer can be used, when our program to use a resource, you can apply to the operating system, and then by the operating system for our program resource allocation and management. Usually when we want to access a kernel device or file, the program can call a system function, the system will open the device or file for us, and then return a file descriptor fd (or ID, which is an integer). We can access the device or file only through this file descriptor. This number can be considered to correspond to an open file or device.

So we can apply to the operating system, and then the system will create a Socket for us, and return the ID of the Socket. In the future, our program will use network resources. Operation on the ID of the Socket. Each of our network communication processes corresponds to at least one Socket. Writing data to the Socket ID is equivalent to sending data to the network. Reading data to the Socket is equivalent to receiving data. And these sockets have a unique identifier, the file descriptor FD.

The port number is a 16-bit non-negative integer that ranges from 0 to 65535. This range is divided into three different port number segments that are assigned by the Internet number allocation organization (IANA)

  • Known as the standard port number, it ranges from 0 to 1023
  • The registered port number ranges from 1024-49151
  • The private port number ranges from 49152-6553

Multiple applications can run on a computer. When a message segment reaches the host, which application should it be sent to? How do you know that the segment is destined for the HTTP server and not the SSH server?

By port number? When a packet arrives at the server, it is the port number that distinguishes different applications, so you should use the port number to distinguish.

As an example to refute cxuan, if the two data arriving at the server are both sent by port 80, how can you tell the difference? In other words, the two data ports to the server are the same, but the protocol is different. How can we tell the difference?

Therefore, it is not enough to identify a packet only by the port number.

On the Internet, the source IP address, target IP address, source port number, and target port number are used to distinguish each other. If one of these items is different, it is considered as a different packet segment. These are also the basis of multiplex decomposition and multiplexing.

Determine the port number

Before actual communication, you need to determine the port number. There are two methods to determine the port number:

  • Standard specifies the port number

The standard set of port numbers is statically assigned, each program will have its own port number, each port number has a different purpose. The port number is a 16-bit number that ranges in size from 0 to 65535. Ports in the range from 0 to 1023 are dynamically allocated ports. For example, HTTP uses port 80, FTP uses port 21, and SSH uses port 22. This type of Port Number has a special name, Known as the well-known Port Number.

  • The port number assigned by the sequence

The second way to assign port numbers is a dynamic allocation method, in which the client application does not need to set port numbers itself at all, and the operating system assigns non-conflicting port numbers to each application. The dynamic port number allocation mechanism can identify different TCP connections even if they are initiated by the same client.

Multiplexing and multiplex decomposition

We talked about how each socket on the host is assigned a port number. When a packet segment arrives at the host, the transport layer checks the destination port number in the packet segment and directs it to the appropriate socket. Then the data in the packet segment goes through the socket to the process to which it is connected. Let’s talk about the concepts of multiplexing and multiplexing.

There are two types of multiplexing and multiplexing: connectionless multiplexing (multiplexing) and connection-oriented multiplexing (multiplexing)

Connectionless multiplexing and multiplexing decomposition

The developer writes code to determine whether the port number is a weekly port number or a sequential port number. If A 10637 port on host A wants to send data to 45438 port on host B, the transport layer uses UDP. After the data is generated at the application layer, it is processed in the transport layer and then encapsulated into IP datagrams at the network layer. The IP packet is delivered to host B through the link layer as best it can. Then host B checks the port number in the packet segment to determine which socket it belongs to, as shown in the following sequence

A UDP socket is a binary that contains the destination IP address and destination port number.

Therefore, if two UDP packet segments have different source IP addresses and/or the same source port number, but the same destination IP address and destination port number, the two packets will be located to the same destination process through the socket.

Host A sends A message to host B. Why do you need to know the source port number? For example, if I send a message to a girl that I’m interested in you, does the girl need to know which organ of mine the message is coming from? Wouldn’t it be all right if you knew it was me? In fact, yes, because if a girl wants to show that she’s interested in you, is she likely to kiss you, she needs to know where to kiss you?

In the packet segment from A to B, the source port number is used as part of the return address. That is, when B needs to send A packet segment back to A, B needs to set the source port number from A to B, as shown in the following figure

Connection-oriented multiplexing and multiplexing decomposition

If connectionless multiplexing and demultiplexing are UDP, connection-oriented multiplexing and demultiplexing are TCP. The difference between TCP and UDP in message structure is that UDP is a binary and TCP is a quad. Source IP address, destination IP address, source port number, destination port number, which we also mentioned above. When a TCP packet segment reaches a host from the network, the host disassembs the TCP packet segment to the corresponding socket based on the four values.

The figure above shows the process of connection-oriented multiplexing and multiplexing. In the figure, host C sends two HTTP requests to host B, and Host A sends one HTTP request to host C. Hosts A, B, and C have their own unique IP addresses. Host B can decompose the two HTTP connections because the two source ports that host C requests are different, so for host B, these are two requests and host B can decompose them. For host A and host C, the two hosts have different IP addresses, so host B can also be decomposed.

UDP

Finally, we’re starting to talk about UDP.

UDP, which stands for User Datagram Protocol (UDP), provides a way for applications to send encapsulated IP packets without establishing a connection. If the application developer chooses UDP instead of TCP, the application is dealing directly with IP.

Data from the application is appended with multiplexed/multiplexed source and destination port number fields, as well as other fields, and the resulting message is then passed to the network layer, which encapsulates the transport layer packet segments into IP packets and delivers them to the target host as best it can. The most critical point is that there is no handshake between the transport layer entities of the sender and receiver when the datagram is delivered to the destination host using UDP. Because of this, UDP is known as a connectionless protocol.

UDP characteristics

UDP is a transport layer protocol used by streaming media applications, voice communication, and video conferences. The DNS protocol also uses UDP. These applications or protocols use UDP because of the following points

  • Speed is fastWhen UDP is used, as long as the application process sends data to UDP, UDP packets the data into UDP packets and immediately sends the data to the network layer. TCP has the congestion control function, which determines the congestion of the Internet before sending the data. If the Internet is extremely congested, TCP senders are inhibited. The purpose of using UDP is to achieve real-time performance.
  • No connection requiredTCP requires a three-way handshake before data transmission, whereas UDP requires no preparation for data transmission. Therefore, UDP has no latency for establishing connections. If you use TCP and UDP to compare developers: TCP is the kind of engineer who will design everything well and will not develop without design. He needs to take all factors into consideration before he starts! So it isBy spectrum; UDP is the kind of direct dry dry dry, received the project requirements immediately dry, no matter the design, no matter the technology selection, is dry, this kind of developer is veryunreliable, but it’s great for fast iterative development because you can get started right away!
  • Connectionless stateTCP needs to be maintained on the end systemConnection statusThe connection state includes receive and send caches, congestion control parameters, and ordinal and acknowledgement numbers. These parameters are not present in UDP, nor are they present in send and receive caches. Therefore, certain servers dedicated to a particular application can generally support more active users when the application is running over UDP
  • The packet head has low overheadEach TCP segment has a header overhead of 20 bytes, whereas UDP has a header overhead of only 8 bytes.

It is important to note that not all applications using UDP are unreliable, and that applications can implement reliable data transmission by adding acknowledgement and retransmission mechanisms. Therefore, the biggest characteristic of using UDP protocol is fast speed.

UDP Packet Structure

The following shows the UDP packet structure. Each UDP packet is divided into UDP header and UDP data area. The header consists of four 16-bit (2-byte) fields, which respectively describe the source port, destination port, packet length, and parity value of the packet.

  • Source Port number: This field occupies the first 16 bits of the UDP packet header and usually contains the UDP port used by the application that sends the datagram. The receiving application uses the value of this field as the destination address to send the response. This field is optional and sometimes the source port number is not set. Default to 0 if there is no source port number, usually used in communications where no message is returned.
  • Destination Port number (Destination Port): Indicates the receiving port. The field length is 16 bits
  • Length (Length): This field contains 16 bits and indicates the length of the UDP packet, including the UDP packet header and the LENGTH of the UDP data. The length of the UDP packet header is 8 bytes. Therefore, the value ranges from 8 bytes to 65535 bytes.
  • The Checksum (Checksum): UDP uses the checksum to ensure data security. The checksum also provides error detection. Error detection checks whether data integrity is changed when packets are sent from the source to the destination host. The UDP of the sender performs an inverse operation on 16-bit words in the packet segment, and bit overflow is ignored during summation, such as the following example, in which three 16-bit digits are added

The first two sums of these 16 bits are

Then add the result to the third 16-bit number

The last bit that you add up is going to overflow, the overflow bit 1 is going to be discarded, and then you’re going to do an inverse operation, and the inverse operation is going to turn all the ones into zeros, zeros into ones. So the inverse of 1000, 0100, 1001, 0101 is 0111, 1011, 0110, 1010, which is the checksum, and if there’s nothing wrong with the data on the receiving end, then all four 16-bit numbers are computed, including the checksum, If the resulting value is not 1111 1111 1111 1111, then there is an error in the transmission.

Let’s consider a question. Why does UDP provide error detection?

This is an end-to-end design principle that aims to reduce the probability of various errors in transmission to an acceptable level.

File from host A to host B, that is to say AB host to communication, need to pass three part: first is to host A read from the disk file grouped into small packets packet to the data, and then the packet by connecting network transmission to A host of host A and host B B, the last is the host B received the packet and written to disk. In this seemingly simple but actually very complicated process may affect the normal communication for some reasons. For example: disk file read/write error, buffer overflow, memory error, network congestion and other factors may lead to packet error or loss, which shows that the network used for communication is not reliable.

As communication only goes through the above three links, we would like to add an error detection and correction mechanism in one of them to check the information.

Network layer certainly can’t do it, because the network layer of the main purpose is to increase the rate of data transmission, network layer does not need to consider the data integrity, data completeness and correctness of the system to detect line to the end, so in the transmission of data, for data transmission network layer can only ask its to provide the best possible service, The network layer cannot be expected to provide data integrity services.

UDP is unreliable because it provides error detection, but has no ability to recover from errors and no retransmission mechanism.

TCP

UDP is a protocol that provides connectionless communication services without complex control. In other words, it leaves part of the control part to the application program to handle, and only provides the most basic functions as a transport layer protocol.

Unlike UDP, TCP has more functions than UDP as a transport layer protocol.

TCP is Transmission Control Protocol. It is called a connection-oriented Protocol because before one application can send data to another application, the two processes must shake hands. A handshake is a logical connection, not a real handshake between two hosts.

A connection is a private, virtual communication link, also known as a virtual circuit, between two applications that communicate on various devices, lines, or networks for the purpose of passing messages to each other.

Once A connection is established between hosts A and B, the communication applications only use the virtual communication line to send and receive data to ensure data transmission. TCP controls the establishment, disconnection, and maintenance of the connection.

TCP connections are full-duplex service. What does full-duplex mean? In full-duplex mode, host A and host B have A TCP connection. In this mode, application data can flow from host B to host A and from host A to host B.

TCP can only make point-to-point connections, so the so-called multicast, that is, a host sends messages to multiple recipients, does not exist. TCP connections can only connect two pairs of hosts.

A TCP connection requires three handshakes, which we’ll talk about below. Once the TCP connection is established, hosts can send data to each other, and client processes send data streams over sockets. Once the data passes through the socket, it is controlled by the TCP protocol running in the client.

TCP temporarily stores the data in the connection’s send buffer, one of the caches set up between three-way handshakes. TCP then sends the data from the send buffer to the target host’s receive buffer at the appropriate time. In fact, Each side has a send cache and a receive cache, as shown below

The transmission between hosts is carried on by packet segments, so what is seinterfaces?

TCP divides the data stream into chunks and adds TCP headers to each chunk to form a TCP segment, also known as a packet segment. The transmission length of each packet Segment is limited and cannot exceed the Maximum Segment Size, commonly known as MSS. In the process of packet downward Transmission, packets pass through the link layer. The link layer has a Maximum Transmission Unit (MTU), which is the Maximum size of packets that can pass through the data link layer. The Maximum Transmission Unit is usually related to communication interfaces.

So what does MSS have to do with MTU?

This is important because computer networks are considered in layers. Different layers are called different things. For the transport layer, they are called packet segments and for the network layer, they are called IP packets. Maximum segment Size (MSS) is a transport layer concept, that is, the Maximum number of TCP packets that can be transmitted at a time.

TCP packet segment structure

After a brief talk about TCP connections, let’s talk about the TCP packet segment structure, as shown in the figure below

The TCP packet segment structure has much more content than the UDP packet segment structure. But the first two 32-bit fields are the same. They are the source port number and the target port number, and we know that these two fields are used for multiplexing and multiplexing. Like UDP, TCP contains a checksum field

  • This should be the 32-bit Sequence number field and the 32-bit acknowledgment number field. These fields are used by TCP senders and receivers for reliable data transfer.

  • The 4-bit header Length field indicates the length of the TCP header in 32-bit words. The length of the TCP header is variable, but usually the option field is empty, so the length of the TCP header field is 20 bytes.

  • A 16-bit Receive window field that is used for traffic control. It is used to indicate the number of bytes a receiver is able/willing to accept

  • A variable options field that is used when the sender and receiver negotiate the maximum message length, also known as the MSS

  • A 6-bit flag field. The ACK flag is used to indicate that the value in the acknowledgement field is valid. This packet segment includes an acknowledgement of the successfully received packet segment. The RST, SYN, and FIN flags are used to establish and close connections. CWR and ECE for congestion control; The PSH flag is used to indicate that the data is immediately handed over to the upper layer for processing. The URG flag is used to indicate that urgent data exists in the data that needs to be processed by the upper layer. The last byte of emergency data is indicated by the 16-bit Urgeent Data Pointer field. In general, PSH and URG are not used.

The various functions and features of TCP are reflected through the TCP packet structure. After talking about the TCP packet structure, we will talk about the functions and features of TCP.

Serial number and confirmation number ensure transmission reliability

The two most important fields in the header of a TCP packet are sequence number and confirmation number. These two fields are the basis of TCP reliability. To understand this, we first need to know what is stored in these two fields, right?

The serial number of a packet segment is the byte number of the data stream. Because TCP splits the data stream into byte streams, and because the byte stream itself is ordered, the byte number of each byte stream identifies which byte stream it is. For example, host A wants to send A piece of data to host B. After the data is generated at the application layer, a string of data streams will be divided by TCP based on MSS. Assume that the data is 10000 bytes and the MSS is 2000 bytes, then TCP will divide the data into segments from 0 to 1999, 2000 to 3999, and so on.

Therefore, the first byte number of the first data 0-1999 is 0, and the first byte number of 2000-3999 is 2000…

Then, each serial number is filled into the serial number field at the header of the TCP packet segment.

As for the confirmation number, it is a little more troublesome than the serial number. Here we first expand several communication models.

  • Simplex communication: Simplex data transmission supports data transmission in only one direction. Only one party can receive or send messages at the same time, and two-way communication cannot be realized, such as radio, television, etc.
  • Duplex communication is a point-to-point system consisting of two or more connected parties or devices that communicate with each other in two directions. There are two duplex communication models:Full duplex (FDX) and half duplex (HDX)
    • Full-duplex: In a full-duplex system, two connected parties can communicate with each other. One of the most common examples is telephone communication. Full-duplex communication is the combination of two simplex communication modes, which requires that both sending and receiving devices have independent receiving and transmitting capabilities.
    • Half-duplex: In a half-duplex system, two connected parties can communicate with each other but not at the same time. For example, on a walkie-talkie, only the person holding the button down can speak, and only after one person has finished speaking can the other person speak.

Simplex, half-duplex, and full-duplex communication are shown in the following figure

TCP is A full-duplex communication protocol. Therefore, host A receives data from host B when sending messages to host B. The acknowledgement number that host A fills into the packet segment is the sequence number of the next byte expected to be received from host B. It’s a little convoluted, so let’s do an example. For example, host A receives A packet segment numbered from 0 to 999 bytes from host B, and this packet segment is written into the serial number. Then host A expects to receive 1000 – the remaining packet segments from host B. Therefore, in the packet segment sent from Host A to host B, Its confirmation number is 1000.

Cumulative confirmation

Here is another example. For example, host A sends packets between 0 and 999 and expects to receive packets between 1000 and later, but Host B sends packets between 1500 and later to host A. Will host A continue to wait?

The answer is obviously yes, because TCP will only acknowledge the bytes in the stream up to the first lost byte, because 1500 is A byte after 1000, but host B does not send the bytes between 1000 and 1499 to host A, so host A continues to wait.

Now that we know the serial number and confirmation number, let’s talk about the TCP sending process. The following is a normal sending process

TCP implements reliable data transmission through affirmative ACK. When host A sends data, it waits for host B’s response. If there is an ACK, the data has reached the peer end. Otherwise, data is likely to be lost.

As shown in the following figure, if host A does not acknowledge the packet within A certain period of time, the packet segment sent by host B is considered lost and resends the packet.

If the response from host A to host B cannot be received due to network jitter, host A resends the packet segment after A specified interval.

Host A may not receive A response from host B because host B is lost in the process of sending the message to host A.

As shown in the figure above, the acknowledgement returned by host B is lost in the transmission process due to network congestion and other reasons and does not reach host A. Host A waits for A period of time. If host A does not receive A response from host B within this period, host A resends the packet segment.

Now there is A problem. If host A sends A packet segment to host B, and host B receives the packet segment sending response, the packet segment does not arrive due to network reasons at the moment, and host A sends the packet segment again after A period of time, Then, the response sent by host B reaches host A out of order after host A sends it for the second time. What should host A do?

TCP RFC does not specify anything about this, that is, we can decide for ourselves how to deal with out-of-order segments. There are two general ways of dealing with it

  • The receiver immediately discards the out-of-order packet segment
  • The receiver accepts the segment of the message that is expected to arrive and waits for subsequent segments

In general, the second approach is usually taken.

The transmission control

Use window control to increase speed

We introduced in front of the TCP is to send the form of A data segment, if after A period of time the response of the host before the host B, host A will to send A message, accept to host B response, farewell to continue to send the message behind the paragraph, what we see now, this question one answer still exist in the form of many conditions, For example, the response is not received, the response is waiting, and so on. For the performance-obsessed Internet, this form of performance should not be very high.

So how do you improve performance?

In order to solve this problem, TCP introduced the concept of Windows, even in the case of long round-trip time, a lot of frequency, it can control the degradation of network performance, sounds very cool, so how does it achieve?

As shown in the figure below

Before, each request was sent in the form of message segments. After the introduction of Windows, each request can send multiple message segments, that is, one window can send multiple message segments. The window size is the maximum number of segments that can continue to be sent without waiting for an acknowledgement.

In this windowing mechanism, buffers are heavily used, through the ability to acknowledge responses to multiple segments simultaneously.

As shown in the figure below, the highlighted part of the sending message segment is the window we mentioned. In the window, the request can be sent even if no confirmation reply is received. However, if part of the packet segment is lost before the acknowledgement of the entire window arrives, host A will still retransmit. To do this, host A needs to set up A cache to hold the segments that need to be retransmitted until it receives an acknowledgement from them.

The parts outside the sliding window are unsent message segments and received message segments. If a message segment has been received, it cannot be resend. In this case, the message segment can be cleared from the buffer.

When confirmation is received, the window slides to the position of the confirmation number in the confirmation reply, as shown in the figure above. In this way, multiple segments can be sent sequentially at the same time to improve communication performance. This kind of window is also called Sliding window.

Window control and retransmission

The sending and receiving of a packet segment must be accompanied by the loss and resending of the packet segment. The same is true for a window. What if the packet segment is lost during the sending of a window?

First, let’s consider the case where the confirmation reply is not returned. In this case, the packet segment sent by host A reaches host B without resending the packet. This is different from sending a single packet segment. If a single packet segment is sent, the packet is resent even if no reply is returned.

When the window is large to a certain extent, the packet segment will not be re-sent even if a small part of the acknowledgement reply is lost.

As we know, if the receiving host does not receive the request due to the loss of the sent packet segment, or the response returned by the host does not reach the client, the packet will be retransmitted after a period of time. So in the case of Windows, what happens if the message segment is lost?

As shown in the figure below, packet lost 0-999, but the host A is not waiting, host A will continue to send the rest of the message, the host send confirmation response has been 1000 B, the same number to confirm the reply message will be returned to persist, if the sender host after receive the same confirmation reply 3 times in A row, This mechanism is more efficient than the time-out retransmission mechanism mentioned earlier. This mechanism is also known as high-speed retransmission control. This repeated acknowledgement is also known as a redundant ACK.

If host B does not receive the packet segment with the expected sequence number, host B acknowledges the received data. The sender considers the packet segment lost once it receives the same acknowledgement for three consecutive times. A retransmission is required. Using this mechanism can provide a faster retransmission service.

Flow control

Before, WE talked about transmission control. Now, Cxuan will talk with you about flow control. We know that in every TCP connection on one side of the host has a socket buffer, buffer will receive and send buffer for each connection setting, when the TCP connection is established, from the application of data in the receive buffer will arrive at the receiver, the receiver of the application area of the buffer data may not be immediately read, It waits for the operating system to allocate a time slice. If the sender’s application is producing data too fast and the receiver is relatively slow to read the data from the receiving buffer, then the buffer on the receiver will overflow.

Fortunately, TCP has a flow-control service to eliminate buffer overflows. Flow control is a speed-matching service where the sending rate of the sender matches the reading rate of the receiving application.

TCP provides traffic control by using a variable in the Receive window. The accept window gives the sender an indication of how much cache space is available. The sending end controls the amount of data to be sent according to the actual receiving capacity of the receiving end.

The receiving host notifies the sending host of the size of the data it can receive, and the sending host sends up to that limit, which is the window size. Remember at the beginning of TCP, there’s a receive window, and we talked about this field for flow control. It is used to indicate the number of bytes a receiver is able/willing to accept.

So only know that this field is used for flow control, so how to control?

The sender host periodically sends a window detection packet, which is used to detect whether the receiver host is still able to accept data. Once the buffer of the receiver faces the risk of data overflow, the value of the window size is set to a smaller value to notify the sender, so as to control the amount of data sent.

Below is a diagram of flow control

The sending host controls the traffic according to the window size of the receiving host. In this way, the sending host can also prevent the receiving host from being unable to process large data sent at a time.

As shown in the figure above, when host B receives packet segment 2000-2999, the buffer is full and host B has to temporarily stop receiving data. Host A then sends the window probe packet, which is very small and only one byte. Host B then updates the buffer receive window size and sends A window update notification to host A, which then continues to send the packet segment.

In the above sending process, window update notifications can be lost, and if lost, the sender will not send data, so window probe packets are sent randomly to avoid this situation.

Connection management

Before moving on to the next interesting feature, let’s focus on TCP connection management, because without TCP connections, there’s no way for the next set of TCP features to matter. Suppose a process running on one host wants to establish a TCP connection with a process on another host, then the TCP in the client uses these steps to establish a connection with the TCP in the server.

  • First, the client sends a special TCP packet segment to the server. The header does not contain application-layer data, but a SYN flag bit is set to 1 in the header of the packet segment. Therefore, this particular segment can also be called a SYN segment. The client then randomly selects an initial sequence number (client_ISN) and places this number in the sequence number field of the initial TCP SYN segment, which is then sent to the server wrapped in an IP data segment.

  • Once the IP segment reaches the server, the server extracts the TCP SYN segment from the IP segment, allocates the TCP buffer and variables to the connection, and then sends the client a packet segment that is allowed by the connection. The segments allowed for this connection also do not contain any application-layer data. However, it contains three very important messages.

    These buffer and variable allocations make TCP vulnerable to a denial-of-service attack called SYN flooding.

    • First, the SYN bit is set to 1.
    • Then, the header acknowledgment number of the TCP packet segment is set toclient_isn + 1.
    • Finally, the server chooses its ownInitial Serial number (server_ISN)And put it in the serial number field of the TCP packet header.

    In plain English, I received a SYN segment with a header field client_ISN from you. I agree to establish the connection, and my own initial serial number is server_ISN. The segment that allows connections is called a SYNACK segment

  • Third, after receiving the SYNACK segment, the client also allocates buffers and variables to the connection. The client host sends another packet segment to the server. The last packet segment acknowledges the response packet sent by the server. The SYN bit is set to 0 because the connection has been established. This is the three-way transmission of TCP connections, also known as the three-way handshake.

Once these three steps are complete, the client and server hosts can send segments to each other, with the SYN bit set to 0 in each subsequent segment, as illustrated in the following figure

After a connection is established between a client host and a server host, either of the two processes involved in a TCP connection can terminate the TCP connection. After the connection ends, the cache and variables in the host are released. Suppose the client host wants to terminate the TCP connection, it goes through the following process

The client application process sends a close command. The client TCP sends a special TCP packet segment to the server process. The FIN header of the special TCP packet segment is set to 1. When the server receives the segment, it sends an acknowledgement segment to the sender. The server then sends its own termination segment, with the FIN bit set to 1. The client acknowledges the termination segment. At this point, all resources used for the connection on both hosts are released, as shown below

During the lifetime of a TCP connection, the TCP protocol running on each host changes between various TCP states. TCP states include LISTEN, SYN-send, SYN-received, ESTABLISHED, FIN-WaIT-1, FIN-Wait-2, close-wait, CLOSING, last-ack, time-wait, and CLOSED . These states are explained as follows

  • LISTEN: indicates to wait for any connection requests from remote TCP and ports.
  • SYN-SEND: indicates that a connection request is sent and a matching connection request is awaited.
  • SYN-RECEIVED: indicates the status of the server after the second step in the TCP three-way handshake
  • ESTABLISHED: indicates that a connection has been established and application data can be sent to other hosts

These four states are involved in the TCP three-way handshake.

  • Fin-wait-1: waits for connection termination requests from remote TCP or for confirmation of previously sent connection termination requests.

  • Fin-wait-2: waits for the connection termination request from remote TCP.

  • Close-wait: waits for the connection termination request from the local user.

  • CLOSING: Waiting for confirmation of the connection termination request from the remote TCP.

  • Last-ack: Indicates waiting for confirmation of connection termination requests previously sent to remote TCP (including its connection termination request).

  • Time-wait: waits enough TIME to ensure that the remote TCP receives the confirmation of its connection termination request.

  • CLOSED: Indicates that the connection is CLOSED.

The seven states above are designed for TCP quadruple wave, or disconnection.

TCP connection states are switched based on events called by the user: OPEN, SEND, RECEIVE, CLOSE, ABORT, and STATUS. TCP segment flags include SYN, ACK, RST, and FIN, and, of course, timeout.

Let’s add the TCP connection status and look at the three-way handshake and four-way wave.

Three handshakes establish a connection

The following figure shows the process of establishing a TCP connection. Assume that the left end is the client host and the right end is the server host. Initially, both ends are in the CLOSED state.

  1. The server process is ready to receive TCP connections from the outside. Usually, it calls the bind, LISTEN, and socket functions to do this. This mode of opening is considered to bePassive open. Then the server process is inLISTENStatus waiting for a client connection request.
  2. Client passconnectinitiateActive openSends a connection request to the server with SYN = 1 at the beginning and an initial sequence number (seq = x). The SYN packet segment does not carry data and consumes only one sequence number. At this point, the client entersSYN-SENDState.
  3. After receiving the connection from the client, the server needs to confirm the packet segment of the client. Set both the SYN and ACK bits to 1 in the acknowledgment packet segment. The confirmation number is ACK = x + 1 and the initial number is seq = y. Note that this message segment also does not carry data, but also consumes a sequence number. At this point, the TCP server entersSyn-received (Synchronously RECEIVED)State.
  4. After receiving the response from the server, the client also needs to confirm the connection. Confirm that the ACK in the connection is set to 1, the sequence number is seq = X + 1, and the confirmation number is ACK = Y + 1. According to TCP, this packet segment may or may not carry data. If no data is carried, the sequence number of the next data packet segment is still seq = X + 1. At this point, the client entersESTABLISHEDstate
  5. After receiving the customer’s confirmation, the server also logs inESTABLISHEDState.

TCP requires three packet segments to establish a connection, but four to release a connection.

Four times to wave

After the data transfer is complete, the communication parties can release the connection. After the data transfer is complete, both the client and server hosts are in the ESTABLISHED state, and the connection is released.

The procedure for TCP disconnection is as follows

  1. The client application sends a packet to release the connection, stops sending data, and actively closes the TCP connection. The client host sends a packet to release the connection. The FIN position in the header of the packet is 1, which does not contain data, and the sequence number bit is SEq = U. In this case, the client host enters the fin-WaI-1 phase.

  2. After receiving the packet segment sent by the client, the server host sends an acknowledgement packet, in which ACK = 1 is generated and its serial number seq = V, ACK = U + 1 is generated. Then the server host enters close-wait state. This time the client host -> server host this direction of the connection is released, the client host has no data to send, at this time the server host is a semi-connected state, but the server host can still send data.

  3. After receiving the acknowledgement from the server host, the client host enters the FIN-WaIT-2 state. Waiting for the client to send a packet to release the connection.

  4. When the server host has no data to send, the application process notifies TCP to release the connection. In this case, the server sends a disconnected packet segment with ACK = 1 and sequence number SEq = W. Since some data may have been sent between these segments, seQ may not be equal to V + 1. Ack = U + 1: After sending the disconnect request packet, the server host enters the last-ACK phase.

  5. After receiving the disconnection request from the server, the client needs to respond. In the packet segment, ACK = 1 and sequence number SEQ = U + 1 are sent, because the client has not sent any data since the disconnection, ACK = W + 1, Then it enters the time-wait state. Note that the TCP connection has not been released. The client can enter the CLOSED state only after the Maximum Segment Lifetime is set to 2MSL.

  6. The server enters the CLOSED state after receiving the disconnection confirmation from the client. The server terminates a TCP connection earlier than the client, and four packet segments are sent during the entire disconnection process. Therefore, the process of disconnection is also called four-fold waving.

What is the TIME to WAIT

I just briefly mentioned what time-wait and 2MSL are, but let’s talk about them now.

MSL is the maximum length of time a TCP packet segment can live or reside on the network. RFC 793 defines the MSL as two minutes, but the implementation is up to the programmer to specify, and some implementations use a maximum lifetime of 30 seconds.

So why wait for 2MSL?

There are two main reasons

  • In order to ensure that the last response reaches the server, because in the computer network, the last ACK segment may be lost, leaving the client inLAST-ACKStatus waiting for a client response. At this point the server will retransmitFINACKAfter receiving the disconnection packet, the client reconfirms it and restarts the timer. If the client is not 2MSL and is CLOSED after the client sends an ACK, both hosts cannot enter the CLOSED state if the packet is lost.
  • It can also preventHas the failureThe packet segment of. After sending the last ACK, the client can make all segments generated during the duration of the link disappear from the network after passing 2MSL. Ensure that no packet segments that remain in the network after the connection is closed will disturb the server.

A note here: The timeout retransmission timer starts immediately after the server sends a FIN-ACK. The client starts the time-wait timer immediately after sending the last ACK.

RST as promised

The RST, SYN, and FIN flags are used to establish and close connections. Yes, we discussed above is an ideal case, is the client and server both accept the transmission packet segment, and another case is when the host receives a TCP packet segment, the IP and port number do not match. If the client host sends a request and the server host finds that the request is not for the server after checking the IP address and port number, the server sends a special RST message segment to the client.

Therefore, when a server sends an RST special packet segment to a client, it tells the client that there is no matching socket connection and please stop sending it.

That was the case with TCP, but what about UDP?

Using UDP as the transport protocol, the UDP host sends a special ICMP datagram if the socket does not match.

SYN flood attack

Let’s discuss SYN flooding attacks.

As we have seen in TCP’s three-way handshake, the server allocates and initializes variable connections and caches in response to an incoming SYN. The server then sends a SYNACK in response and waits for an ACK packet from the client. If the client does not send an ACK to complete the last step, the connection is in a pending state, or half-connected state.

In this case, the attacker usually sends a large number of TCP SYN segments, and the server continues to respond, but each connection fails to complete the three-way handshake. As SYN increases, the server allocates resources for half-open connections, causing the server to run out of connections. This attack is also a Dos attack.

One way to defend against this attack is to use SYN cookies, and here’s how it works

  • When the server receives a SYN segment, it does not know where the segment is coming from, whether it is from the attacker host or the client host (although the attacker is also the client, which is easier to distinguish). Therefore, the server does not generate a half-open connection for the message segment. Instead, the server generates an initial TCP sequence number that is the source and destination IP addresses and port numbers of the SYN segment, a complex hash function constructed from this quadSYN CookieIs used to cache SYN requests. The server then sends SYNACK packets with SYN cookies.It is important to note that the server does not remember this Cookie or any other state information of the SYN.
  • If the client is not the attacker, it will return an ACK packet segment. After receiving the ACK, the server needs to verify that the ACK is the same as that sent by SYN. The criteria for verification are as follows: the ACK number and sequence number in the acknowledgement field, the source IP address and destination IP address, the port number, and the hash function, and the hash function + 1. (Roughly, please correct me if I’m wrong). If you are interested, you can read it yourself. If it is valid, the server generates a fully open connection with a socket.
  • If the client does not return an ACK, that is, it is the attacker, then it does not matter, the server did not receive an ACK, no variables and cache resources are allocated, no harm to the server.

Congestion control

With THE TCP window control, the computer network between two hosts is no longer in the form of a single data segment sent, but can continuously send a large number of data packets. However, a large number of data packets are accompanied by other problems, such as network load, network congestion and so on. To prevent such problems, TCP uses the congestion control mechanism. The congestion control mechanism prevents data transmission from the sender when the network is congested.

There are two main methods of congestion control

  • End-to-end congestion control: Because the network layer does not provide display support for traffic layer congestion control. So even if there is congestion in the network, the end system has to infer the network behavior by observing it.TCP uses end-to-end congestion control. The IP layer does not provide feedback about network congestion to the end system. So how does TCP infer network congestion?If timeouts or triple acknowledgements are considered congestion, TCP reduces the window size or increases round trip latency to avoid this.
  • Network - assisted congestion control: In network-assisted congestion control, the router provides feedback to the sender about the state of congestion in the network. This feedback is a bit that indicates congestion in the link.

The following diagram depicts both congestion control methods

TCP congestion control

If you see this, I assume for a moment that you understand the basics of TCP implementation reliability, which is the use of ordinals and confirmation numbers. In addition, another basic implementation of TCP reliability is TCP congestion control. if

The method used by TCP is to allow each sender to limit the rate of sending packets according to the perceived degree of network congestion. If the TCP sender feels that there is no congestion, the TCP sender increases the rate of sending packets. If the sender perceives a block along the path, the sender will slow down the transmission rate.

But there are three problems with this approach

  1. How does a TCP sender limit the rate at which it can send segments to other connections?
  2. How does a TCP sender sense network congestion?
  3. When the sender senses end-to-end congestion, what algorithm is used to change its transmission rate?

Let’s start with the first question, how does a TCP sender limit the rate at which it can send segments to other connections?

We know that TCP consists of a receive cache, a send cache, and variables (LastByteRead, RWND, etc.). The sender’s TCP congestion control mechanism keeps track of a variable called congestion window, which is represented by CWND and is used to limit the amount of data that TCP can send to the network before receiving an ACK. The receiving window (RWND) is used to tell the receiver how much data it can accept.

In general, the amount of data not acknowledged by the sender must not exceed the minimum values of CWND and RWND, i.e

LastByteSent – LastByteAcked <= min(cwnd,rwnd)

Since the round-trip time of each packet is RTT, we assume that the receiver has enough cache space to receive data, we don’t worry about RWND and just focus on CWND. Then, the sender’s sending rate is about CWND /RTT bytes/second. By tuning the CWND, the sender can therefore adjust the rate at which it sends data to the connection.

How does a TCP sender sense network congestion?

This, as we discussed above, is what TCP senses based on timeouts or three redundant ACKS.

When the sender senses end-to-end congestion, what algorithm is used to change its transmission rate?

This is a complex issue, and as I’ll explain, TCP generally follows the following guidelines

  • If packets are lost during transmission, the network is congested. In this case, you need to lower the TCP sender rate.
  • An acknowledgment segment indicates that the sender is passing a segment to the receiver, thus increasing the sender’s rate when an acknowledgment of a previously unacknowledged segment arrives. Why? Because the unconfirmed packet segment reaches the receiver, it indicates that the network is not congested and can be successfully arrived. Therefore, the length of the congestion window of the sender will be larger, so the sending rate will be faster
  • Bandwidth detection, bandwidth detection means that TCP can increase or decrease the number of ACK arrivals by adjusting the transmission rate. If packet loss occurs, the transmission rate will be reduced. Therefore, to detect the frequency at which congestion starts, the TCP sender should increase its transmission rate. Then slowly slow down the transmission rate and start probing again to see if the congestion start rate has changed.

Now that we know about TCP congestion control, let’s talk about TCP congestion control algorithm. TCP congestion control algorithm mainly consists of three parts: slow start, congestion avoidance, and fast recovery

Slow start

When a TCP connection is established, the value of CWND is initialized to a smaller value of MSS. This gives an initial send rate of about MSS/RTT bytes/second, for example, to transmit 1000 bytes of data and RTT is 200 ms, the resulting initial send rate is about 40 KB /s. In practice, the available bandwidth is much larger than this MSS/RTT, so TCP can find the best transmission rate by slow-start. In the slow-start mode, the CWND value is initialized to 1 MSS. In addition, one MSS will be added after each transmission message is confirmed, and the value of CWND will become two MSS. After the two message segments are successfully transmitted, each message segment + 1 will become four MSS, and so on, the value of CWND will double after each successful transmission. As shown in the figure below

Sending rates can’t keep going up. Growth has to come to an end, so when? A slow start usually ends an increase in the send rate in one of the following ways.

  • If packet loss occurs during the slow start sending process, TCP sets the sender’s CWND to 1 and restarts the slow start process. At this time, a concept of SSTHRESH (slow start threshold) is introduced. Its initial value is the CWND value of the packet loss / 2, that is, when congestion is detected, The ssthRESH value is half the window value.

  • If CWND > SSTHRESH is doubled, packet loss may occur. Therefore, the best way to do this is to use CWND = SSTHRESH. Then TCP switches to congestion control mode, ending the slow start.

  • The final way to end a slow start is if three redundant ACKS are detected, TCP performs a fast retransmission and enters the recovery state.

Congestion avoidance

When TCP enters the congestion control state, the CWND value is equal to half of the congestion value, which is the SSthRESH value. Therefore, the value of CWND cannot be doubled every time the message segment arrives. Instead, a relatively conservative approach is adopted, and only one MSS is added to the value of CWND after each transmission. For example, after receiving confirmation of 10 message segments, only one MSS is added to the value of CWND. If there is packet loss, then CWND is an MSS and SSTHRESH is half the value of CWND. MSS growth can also be stopped by receiving three redundant ACK responses. If TCP still receives three redundant ACKS after halving the CWND value, ssTHRESH is recorded as half the CWND value and goes into a quick recovery state.

Fast recovery

In quick recovery, the value of the CWND is increased by an MSS for each redundant ACK received for the missing segment that put TCP into the quick recovery state. When an ACK for the lost segment arrives, TCP goes into congestion avoidance after reducing CWND. If a timeout occurs after the congestion control state, then the migration is to the slow start state, where the value of CWND is set to 1 MSS and the value of SSTHRESH is set to half that of CWND.

Afterword.

If you can see this with your heart, I’m sure you’ll get something.

This article took a long time to write, and many of the patterns and color schemes in the images were carefully chosen, so if you read carefully, you’ll see that I’ve done a good job.

If you think the article is good, you are welcome to help spread the CXuan, which will be the motivation for me to continue updating. Don’t waste your time, because it will make you cheap. Good articles are worth sharing.

Please remember to give me a thumbs up!

In addition, I have uploaded six PDFS by myself. After cXuan, the programmer, follows the official account on wechat, I reply to CXuan in the background and get all the PDFS.