The birth of the TCP/IP

preface

Briefly summarized A Protocol for Packet Network Intercommunication, which was published by VINTON G. CERF and ROBERT E. KAHN in 1974. This paper mainly discusses how to communicate between different systems. The addressing and transmission control protocol between machines proposed in the paper can be said to lay the foundation of the whole Internet with TCP/IP as the core.

The introduction of problems and the presentation of various concepts

At the beginning of the birth of computers, scientists developed a network protocol for exchanging information between computers, and connected multiple computers to the same internal network. In this process, a variety of different network protocols and large and small computer internal networks have been born. The goal of the two authors of this paper is to propose a system that can enable each computer internal network using different network protocols to exchange information with each other.

The concept and function of gateway

In order to interact between different systems, it is actually a common design practice to introduce a middle tier. The paper proposes to connect different computer networks through “gateway”, and the main functions of gateway include two:

Compatibility with network protocols at both ends: In order to exchange and transfer information between networks running different protocols, there must be a “translator” role that understands the protocols at both ends and supports mutual conversion.
Internetwork addressing: In order to put the message from a network of internal delivery one host to another within the network to another host, gateway needs to be given according to the sending host address to determine the target host’s network, and its address in the network, if the target host’s network and its connections, depending on its delivery in the network address, If the network on which the target host is located is not connected to itself, the gateway needs to reorganize the data into a format that the downstream network can understand and recursively pass it to the next gateway.

Head agreement

In order to enable the gateway to identify the source host and target host of the message, address-related information must be transmitted along with the data, which requires that some additional “control” related information, called “Internetwork header” in the paper, must be added during data transmission, including:

Source address and destination address
Serial number and number of bytes
Flag is used to transmit specific control information, such as SYN and FIN

From this proposed TCP/IP protocol in each field design prototype, in the subsequent RFC has a more explicit definition and specification.

Data fragmentation

A predictable problem is that the gateway may split a complete packet into several smaller packets when passing data between different networks, because different networks may have different sizes of packets that can be processed. This requires the ability of the target host to reassemble the segmented packet when it receives it.

Once data sharding occurs, more problems arise and additional mechanisms are needed to ensure reliable transmission.

From interprocess communication to TCP

The concept of Transmission Control Program (TCP) is proposed based on the interprocess communication scenario. Assume that there is a TCP on each host, and data is sent and received between processes through TCP. The process and TCP exchange complete data, while TCP may internally divide data into several segments, also because the receiver may limit the maximum size of a single data transfer (similar to the data fragmentation scenario of the gateway).

In this scenario, TCP is required to support multiplexing. TCP receives segments of data from different senders and sends each segment to different receivers. In order to distinguish each sender and receiver under the same host, data segmentation also needs to attach some additional control information, called “Process header” in the paper, and this information eventually evolved into the later port.

Address format

In order to locate a process between different networks, a TCP address should include three parts:

Network identifier: Identifies the network on which the host resides. Based on this information, the gateway can decide whether to send data directly to the target host or to continue forwarding to other networks
Host IDENTIFIER: Identifies a host on a network
Port identifier: Identifies a process within a host

Packet format

Because messages passed between processes may be split into several segments in transit, a segment should contain the following information:

Source port and destination port
Window size and ACK

Serial number mechanism

The TCP receiver needs to know the sequence number of each segment when reassembling the segments. The sequence number of segments must be monotonically increasing (or decreasing) because the receiver needs the sequence number to determine whether the received segment is out of order, repeated, or missing. Obviously, serial numbers cannot be infinitely increasing, and finite serial numbers can cause the receiver to be unable to determine whether a packet is retransmitted or new. This problem can be solved by introducing a receiving window (sliding window).

According to the design, each section need to assign serial Numbers, and about the distribution of the serial number, the thesis puts forward a method that assumes that the network at the ends of the process of exchange is an infinite long bytes (so a TCP connection is a byte stream oriented connections), and each byte is assigned a serial number, the serial number is its location relative to the flow of the beginning. When TCP creates a new segment, the sequence number of the first byte of data carried by the segment is used as the sequence number of the entire segment, and the number of bytes carried is also set in the protocol header.

Segmented retransmission and repeated detection

In order to ensure reliable transmission, timeout retransmission and confirmation mechanism are introduced in this paper. If no acknowledgement is received within a certain period of time after the packet is sent, the sender resends the segment.

The receiving end maintains a receiving window, returns the serial number of the sub-end expected to be received next time as confirmation, and updates its own receiving window at the same time. The initial size of the window is determined by negotiation between the two sides when the connection is established.

Operating practice

The paper also makes simple suggestions for practical practice, including processing input and output with buffering, how user processes interact with TCP and so on.

TCP makes simple suggestions on how to handle input/output data. When the data is received, it can be put into the buffer after the necessary validation is done. When the receive buffer is full, the received data can be discarded without acknowledgement, so that the dependent sender retransmits. When sending data, you can maintain a small send buffer because the sending process’s buffer holds the entire data.

When the user process needs to send data, it can first insert the Transmit Control block (TCB) to the data to be sent, and then transmit the data to TCP through a pointer. Similarly, to receive data, you can create a corresponding receive buffer and receive control block (RCB) and send it to TCP.

It is mentioned in the paper that if the buffer is insufficient, packets can be directly discarded and wait for retransmission, without focusing on congestion control. Until 1986, network throughput from LBL to UC Berkeley declined sharply from 32Kbps to 40bps due to congestion. Van Jacobson’s 1988 paper “Congestion Avoidance and Control” perfected TCP’s Congestion Control.

The concept of connection

The paper suggests that when both parties are ready for data interaction, the two parties establish a connection. However, you may not be able to truly confirm that the connection has been established until you actually interact with the data.

To build a connection, you need several elements:

Address, where at least one party can locate the other
TCP control information, including the start number and window size, is displayed. Otherwise, both parties cannot confirm the received data

Shake hands and wave

To create a connection between two processes, you must determine the port associated with the process, and obviously the ports on a host cannot be infinite, meaning that the port will be reused. Therefore, to ensure the correct connection status, the two parties need to perform initialization and verification before exchanging data, that is, shake hands.

To send or receive data, TCP must first initialize the various control information (TCB and RCB, window, etc.), so the sender to send the first packet should have special markers, carry some to be negotiated at the same time the control information (such as window size), so as to trigger control information is initialized at the receiving end (that is, the SYN requests). The receiver can verify the initialization request after receiving it to decide whether to accept it or not. Therefore, the receiver should explicitly indicate whether it is willing to receive data requests on a port, and later listen to a port.

When the sender decides not to send any more data, it needs to send a request flag with a special flag to close the connection (i.e., a FIN request). To ensure that both ends clearly know that the connection is to be closed, the receiver also needs to send a special request as confirmation, i.e., a wave. When the receiver determines that all data has been received based on the control information, the connection is closed.