To be clear, TCP is not reliable at all in a communications sense. An abstract protocol, how can it manipulate the medium to ensure reliability, does not exist. But no communication through any medium is infallible!

It’s just better than insurance in our real life, in fact, it can’t prevent anything, no risk can guarantee it won’t happen, it can’t guarantee the plane won’t fall down, it can’t prevent people from getting sick… In fact, TCP is the insurance industry of communications.

How was TCP designed? By extension, how is this kind of communication protocol designed? If you were asked to run a reliable protocol on an unreliable medium, what would you do? This article will introduce some of the causes and effects.

How to build a reliable communication protocol

This starts with the classic military-to-military problem.

First of all, introduce the Two militaries, explanation from Wiki is one of the best: Two Generals’ Problem:en.wikipedia.org/wiki/Two_Ge… The two-military problem is essentially a conformance validation problem, meaning that both sides of the communication need to ensure that the information is consistent, not just one (which is important for understanding TCP). That is, if A and B communicate, then A sends A message M to B. The so-called reliability must meet the following conditions:

  • Channels are unreliable and any message can be lost at any rate
  • It is very important not to resend the message if or if you cannot ensure that the message reaches the other side. In the classic two military problem, the message is delivered by Courier, and Courier is human, and human is the most important resource for an army to fight, but also the least reliable resource, such as mutiny… Therefore, each message or confirmation can only send one messenger to deliver the message, in terms of communication, that is, the message cannot be resent!
  • For A, make sure A knows that B has received M
  • For B, make sure that the fact that he received M is known to A

Mathematically, it is easy to prove by contradiction that the above two military problems have no solution at all, that is, it is impossible to expect the complete reliability of consistent communication. So let me try to derive it.

Let’s say at a point in time.andThe two ends achieve complete information consistency, and all interactive information packets are respectively in chronological order..,…In order to achieve consistency, these packets are indispensable. Now look at the last packetAs we know, the channel is unreliable, so it may be lost, and once it is lost, the whole interaction process will lose consistency, which is contradictory to the hypothesis, so,Consistency is impossible.


This question seems to tear apart the very foundations of communication technology, so what is the point of communication technology?

In fact,

  • First, communication protocols were never designed to satisfy complete consistency requirements

The meaning of communication is to meet the one-way completion requirements of message delivery in time series. The essence of communication is to ensure messaging, not to maintain consistency. Consistency should be the responsibility of the business itself, and communication merely provides the infrastructure for messaging.

  • Second, communication transmits electrical pulses in bytes, and messages can be retransmitted

This greatly reduces the strong constraint on the military-to-military issue. Based on the above assumptions, let’s deduce why TCP protocol is designed this way step by step.

If you take a closer look, you can see that even messaging is mathematically impossible to ensure over unreliable channels. Instead, we ask ourselves, “Just how unreliable are channels?” .

Is it 100% unreliable? If it is, it means that the channel is disconnected, that the two sides are unreachable, and no matter how many times we send packets, they’re going to get lost, and we can end this meaningless discussion in a second. So unreliability just means that there’s probabilistic packet loss, packet loss probabilityIt must be somewhere in the open intervalBetween!

This is very significant, and it means,As long as we retry the particular messageIf the number of times is enough, you are sure to receive a specific message from the peerConfirm!!!It’s a dead cert. No one would disagree.

This leads naturally to the first principle of reliable communication:

  • 1. Retransmission timed out

This principle ensures that messages have a chance to reach the peer end. Whenever a packet is sent and no confirmation of its arrival is made within the expected time, it is retransmitted. The details of timeout retransmission will be discussed later in this article, but for now, let’s look at another issue.


How do I ensure that the one-way delivery of messages is complete?

In other words, the completion of one-way message delivery requires a signature signal * that reveals the fact that the message has been received by the peer end. Obviously, the peer end can send an acknowledgement of the specific message and the local end can receive it.

Once theReceived fromforFor the confirmation ofIn terms of, it knowsI’m sure I did., and forAnd so it didOtherwise, it wouldn’t have sent the confirmation. This confirmation can be lost over unreliable channels, but there is no need to panicFor any message, if we repeat it enough times, the message is bound to reach the other endUnder this inference, the timeout retransmission principle can be adopted.

It now appears that the following measures we have derived have solved almost all the problems:

1. Target messagesThe timeout retransmission mechanism of

2. Target messagesThe confirmation ofThe timeout retransmission mechanism of

But is this the optimal solution?

Not too! This is only one possible solution, but it is not the only solution, and it is not the best solution. Derived the optimum solution requires us to go deep into the nature of communication network, see a first article: Matthew effect/the essence of power law distribution and its mathematical expression: blog.csdn.net/dog250/arti… Note that our communication network is a network topology connected graph, whether it be a single node number of connections or traffic attribute all conform to the law of power law, can be seen from the log-log curve network scale and the node of logarithmic linear relationship between each attribute, and network scale from a copy of exponential growth, the properties of single node characteristics from the node’s behavior, It is obvious that in this linear communication network with logarithmic coordinates, the behavior of individual nodes must be controlled exponentially if the network is to scale equally without collapsing (the logarithmic coordinates can be represented as Cartesian coordinates).

In fact, if we expand the line in the logarithmic coordinate system (the result of solving the differential equation) to the corresponding Cartesian coordinate system, it becomes an exponential curve.

Another abstraction is that if a packet is lost in transit, what factors are involved? Of course, in network communication, it is certainly possible that the matter is related to the transmission medium, but in the number of nodes, i.e. the size of the network, the medium problem is negligible. In other words, the more nodes there are, the more likely the transmission is to conflict, and the less likely the data is to reach the peer end. That is, packet loss events are related to network scale. Network is a linear system, so the retransmission of packet loss must have exponential time characteristics.

The problem of medium increases linearly with the expansion of network scale, while the problem of transmission conflict increases exponentially with the expansion of network scale.

The same is true if you follow the early days of Ethernet, the bus-based CSMA/CD Ethernet.

Therefore, it is clear that the timeout rule for timeout retransmission under the principle that the equilibrium of a linear system is maintained through the individual node behavior of exponential properties must be:

  • 2. Timeout retransmission – Exponential retreat

With this principle in mind, let’s go back to how to implement timeout retransmission of messages and message confirmations. Because the acknowledgement and the message themselves belong to the same behavior, timeout retransmission for the message itself already automatically contains an acknowledgement. Retransmission for the acknowledgement would destroy the exponential characteristic of the single point behavior. Therefore, we can infer the third characteristic of reliable communication:

  • 3. Do not retry timeout confirmation

Since we only want to ensure the reliability of one-way message delivery, that is, to ensure that the peer receives the message sent by the local end without letting the peer know about it, the fourth characteristic is derived:

  • 4. Do not confirm confirmation

This is where the infrastructure is built. Given that communication tends to be two-way, we need to build a reliable two-way communication protocol on top of it.

Easy, on the other endJust do it again! So we observe that the two militaries question ifSplit two-way messaging and acknowledgements into two one-way messaging and acknowledgements under timeout retransmissionThings would be much simpler.

The original solution to the two armies problem:

Solution after transformation:

Well, the transformed solution, the most basic form of the TCP protocol that we’re familiar with. Now it’s time for TCP!


TCP handshake, wave, consistency issues

People often ask why TCP is three handshakes, not two, not four, not five. The most common answer is simply to describe the details of the TCP handshake and then say that it is OK to do so, even though it is OK not to do so.

Most common wrong answers: 1. It’s a tradeoff, since countless handshakes can’t be completely reliable; 2. Describe the protocol details of the handshake. 3….

This question should be very easy to answer given my discussion above, the so-called TCP handshake is essentially a two-way reliable communication connection, one back and one back, each side has a timeout to ensure reliability (not by the number of handshakes). TCP’s three-way handshake is optimized for a four-way handshake. Since the connection is established from scratch, the SYN-ACK and passive-opened SYN are merged into a syn-ACK, and that’s it.

The role of the handshake is designed toDetermines the initial serial number for both directionsUse TCPThe serial numberSince the connection is in both directions, two serial numbers are required. No bytes are transferred during the handshake, only confirmationInitial serial number:

Having said three handshakes, then, its sister question, why does TCP break the chain with four waves instead of three?

In other words, why cannot the ACK of the active breaker FIN be merged with the FIN of the local end?

Is very simple, because TCP is reliable in a one-way communication system built on the basis of the bidirectional transmission control protocol (TCP), during the handshake can merge the SYN and ACK, because no connection on both ends before shaking hands, and in the broken wave, think can disconnect at one end, on the other side but not necessarily, may have data to transmit other end, So instead of merging, the passive closed party can only handle the ACK for the FIN and its own FIN separately, and that’s it.

One more question, does TCP ensure consistency? In other words, is TCP a solution to the military-to-military problem?

Far from it! TCP does not ensure consistency.

At any point in time, TCP cannot fully confirm the status of the current connection at both ends. The status includes the data transmitted at both ends. Consistency is message-based, not connection-based! That is to say, TCP only receive the next packet, just know the receiving situation of the last packet, and can not achieve a telethon! The benefit of TCP is simply that it implements a pipelined approach to consistency validation over a flow of information.

When we understand this pipelining approach, we should not think about sliding Windows, which can be difficult to understand, we should only think about single-byte stops and so on. In fact, the sliding window mechanism was only introduced for flow control, and single-byte stops are inefficient, so it doesn’t really matter. You replace bytes with Windows, i.e. single-window stops.

If we generalize consistency to the level of connection, at the level of connection, consistency is guaranteed by four waves of the hand.

As we can see, the state machine in the four waves is very complicated for a reason, and even with the introduction of TIMEWAIT, there is no way to guarantee complete consistency, which is a conclusion that the two military problems are inherently unsolvable, nothing more.


The TCP in 1974

By now you should have a general idea of how TCP ensures reliability. Furthermore, if you want to know why the TCP header is the way it is and how it all works, you have to read an old paper: “A Protocol for Packet Network Intercommunication” : www.cs.princeton.edu/courses/arc… Let me give you an overview of this landmark paper.

It is no exaggeration to say that this paper laid the foundation of the Internet with TCP/IP as the core. Today, we can brush Douyin, chat with wechat, watch movies online… None of this would have happened without this paper.

The focus of this paper is not on TCP, but on how TCP/IP as a whole works. Back in 1974, the layered model was not yet mature, so when we talk about TCP/IP, it is important to understand that the two protocols were firmly attached to each other until UDP was added in order to be compatible with pure IP forwarding. At this time, people realized the need for hierarchical models. So the ISO/OSI model is abstracted.

The thesis has two main topics:

  • Gateway concept and meaning – the ultimate IP protocol
  • Transmission control for interprocess communication – the ultimate TCP protocol

Attention, we look at the TCP in the form of original, yes, it is as a means of communication between processes have been proposed, the original TCP as a means of communication between processes, and focus on different host inter-process communication, therefore, we can clearly see the apis and file IO API is very similar, this is also the socket can be used as a file descriptor.

In addition, it is worth noting that TCP ACK number is defined as the sequence number of the next requested byte, which realized a simple and complete byte pipeline, saving the protocol header space. Although it did bring a lot of problems, such as not being able to accurately measure THE RTT, such as not being able to perform selective confirmations, and thus not being able to do good congestion control, I have to say that in the 1970s, when space was more important than time, this was an absolute innovation. After all, congestion control didn’t make sense at the time, it was introduced in 1988.


The Internet in 1974

After that 1974 paper, the same authors summarized RFC675:SPECIFICATION OF INTERNET TRANSMISSION CONTROL PROGRAM: Tools.ietf.org/html/rfc675 this epoch-making RFC formally put forward the concept of the Internet, we often say that the Internet is the abbreviation of Internetworking.

TCP/IP is really not a protocol stack, initially they were just a protocol, nothing more. …


So, what’s next?

Next, Skinshoe Wu arrives, carrying his fancy leather shoes, and his fancy suits.