Writing in the front

We programmers deal with web requests every day, and the front-end programmers deal with HTTP requests the most. In daily work, dealing with network requests and other operations is the most. However, the low-level details of a closed loop in which a request is sent from the client, processed by the server, returned to the response, and then received by the client may not have been studied deeply. This article is one of my reading notes, which summarizes exactly this process, and I hope to share it with you to inspire you.

I would be grateful if you could point out any errors in the expression of some points in the article.

Start with a classic interview question

The process from entering the URL to rendering the page

  1. After entering the URL, domain name resolution is performed first. Priority to find the IP address of the local host files have corresponding, no words to find local DNS server, also not line, local DNS server to search for a DNS server to a domain server address query, domain server will query the DNS server address back to the local DNS domain name, local DNS query here with respect to OK.
  2. When the browser gets the IP address of the server, it sends it an HTTP request. HTTP requests go through layers of processing, encapsulation, sending, and finally reach the server through the network. TCP/IP connection is established, and the server receives the request and starts processing.
  3. The server builds the response, and after layers of processing, encapsulation, and dispatch, it reaches the client, where the browser processes the request.
  4. The browser begins to render the page, parse the HTML, build the Render tree, according to the corresponding relationship between the nodes of the render tree and CSS, layout, draw the page.

These four steps cover the full life cycle of an HTTP request, and this article focuses on steps 2 and 3, which are how requests communicate between two physical endpoints. The sending and receiving of data inevitably goes through some processing and parsing processes, which are carried out at different levels of the system.

layered

An HTTP request goes through four layers from the source to the terminal. Each of these layers has its own protocol.

Let’s first understand what an agreement is. An agreement is a set of rules that are agreed upon by both parties and that they must abide by. Each layer above has its own protocol, and the implementation of the protocol is the corresponding layer at both ends of the communication link. Each layer uses protocols to understand data and process it.

Only the most common protocols are shown in the figure above, but there are actually sub-protocols for each layer:

  • Application layer: The application is responsible for wrapping the data in the corresponding rules (protocols) and sending it to the transport layer
    • HTTP: Hypertext transfer protocol
    • FTP: file transfer protocol
    • SMTP: simple mail transfer protocol
    • SNMP: Simple network management protocol
  • Transport layer: it is responsible for grouping the data transmitted from the application layer. To ensure the sequence and integrity of data received by terminals, each packet is marked and handed over to the network layer
    • TCP: Transmission control protocol
    • UDP: user data protocol
  • The network layer is responsible for sending packets of data from the transport layer to the destination terminal
    • IP: Internet protocol
      • ICMP: Internet Control packet protocol
      • IGMP: Internet group management protocol
  • Link layer: Sends and receives data units for the network layer
    • ARP: address resolution protocol
    • RARP: reverse address resolution protocol

Encapsulation and distribution

Data is wrapped by the corresponding protocol as it passes through each layer, and is unpacked layer by layer when it arrives at the terminal. These two processes are called encapsulation and partitioning.

When sending packets, user data is encapsulated as packets by HTTP. Each layer regards the packets from the upper layer as its own data block and adds its own header, which contains the protocol IDENTIFIER, to forward the packets as its own packets.

When receiving data, the data flows from bottom to top. After passing through each layer, the packet header is removed and the correct upper-layer protocol is determined based on the packet id. Finally, the data is processed by the application layer.

encapsulation

When the source end sends HTTP packets, the packets are transmitted in the form of data flows through an open TCP connection in sequence. After receiving the data flows, TCP divides the packets into small data blocks. The TCP headers added to each small block and the data blocks form TCP packets. When a packet send request is received, the packet is placed in the IP datagram, the header is filled, and the datagram is sent out over the link layer.

In this process, some header information and sometimes tail information are added to each layer. Each layer encapsulates data into its own packet and adds a protocol identifier to the packet header. This process is called encapsulation.

Points with

When a terminal receives an Ethernet data frame, the data flows upward from the bottom layer. After removing the packet header added by protocols at each layer, protocols at each layer check the protocol IDENTIFIER on the packet header to determine the upper-layer protocol and ensure that the data is correctly processed. This process is called segmentation.

After receiving the data request from the link layer, the terminal parses the data at the network layer and sends it to the transport layer to verify packet order and integrity. The terminal obtains HTTP packets from data blocks and sends them to the application layer for processing. This process strips the header layer by layer and restores the data.

Step by step analysis

Now that we know that data is processed in layers from source to end, let’s take a look at what each layer does.

HTTP

HTTP belongs to the application layer. It encapsulates the behavior data generated by user triggered interaction and the response of the server into HTTP packets and sends them to the lower layer protocol for processing. Packets serve as carriers for communication between clients and servers. Both sides must follow a unified rule to process information, which is called HTTP.

The interaction between the client and the server is often very complex. To ensure efficient, clear, and secure communication between the two sides (for example, transmission of intent and status, carrying of data, carrying of authentication information, and control of connection behavior and cache), the two sides need to rely on the structure in packets. The following describes the structure first.

Message structure

HTTP packets are divided into two types: request packets and response packets. The request packets encapsulate the actions generated by user operations and inform the server of the actions to be taken. The response packets inform the client of the request result.

Format of request message:

<method> < request-URL > <version> // Start line format <headers> // head <body> // entityCopy the code

Format of response message:

<method> <status> <reason-phrase> // Start line format <headers> // head <body> // entityCopy the code

The starting line

The start line of the packet indicates the start of the packet. The format of the start line of the request and response is different.

The start line of the request message describes what to do. The structure is method + request URL + protocol version, and Spaces are used to separate the two lines:

GET/API/NHT/blog/example HTTP / 1.1Copy the code

The opening line of the response header tells you what happened and is structured as protocol version + status code + description text, separated by Spaces:

HTTP / 1.1 200 OKCopy the code

Methods and status codes

Method to tell the server what the request message to do, the status code to inform the client server according to the request message to complete the action after the approximate result. Common HTTP methods are as follows:

methods meaning Presence of the body
GET Obtain resources from the server There is no
HEAD Only the resource header is retrieved There is no
POST Sends data to the server There are
PUT The data sent by the client is saved to the server in modified scenarios There are
OPTIONS Precheck the server, such as which methods the server supports There is no
DELETE Deletes resources from the server There is no

When the request is completed, a status code is sent in the response packet to indicate the status of the request, whether it is successful or failed, or whether it needs to be redirected. Status codes range from 100 to 599, some of which are already defined. Different ranges mean different things:

The scope of Defined scope meaning
100 ~ 199 100 ~ 101 message
200 ~ 299 200 ~ 206 successful
300 ~ 399 300 ~ 305 redirect
400 ~ 499 400 ~ 415 Client error
500 ~ 599 500 ~ 505 Server error

The first

The header is some information in the request and response messages in the form of key-value pairs. Each pair of key-value ends with a CRLF newline character, which determines the attributes of the request or response messages. For example, Content-Type indicates the data Type of the request body, and Date indicates the creation time of the request. The client and server negotiate specific behavior through the header. Headers can be divided into five categories based on request, response, structure, etc.

  • Request header: The header in the request message, which is used to tell the server some information.
  • Response header: Provide the client with some information that might be needed.
  • Common header: the header contained in both the request and response packets, for example, the Date header
  • Entity header: The description of the body of the packet entity, such as Content-Type, indicating its data Type.
  • Extension header: header field added by the developer to meet customization requirements.

entity

HTTP / 1.0 200 OK Server: XXXXXXX Date: the Sun, 17 Sep 2019 02:01:16 GMT -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- entity first the content-type: Text/plain Content - length: 18 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the entity body Hi! I'm a message! --------------------------------Copy the code

The entity part is optional. It is used to transport the data of the request or response. The entity consists of the entity head and the entity body, which describes the entity body. HTTP/1.1 defines the following basic entity head fields:

  • Content-type: The data Type in the entity body.
  • Content-length: indicates the Length or size of the entity body.
  • Content-language: The Language that best matches the transmitted data.
  • Content-encoding: Identifies the Encoding method used in server Encoding.
  • Content-location: Address of the data to be returned.
  • Content-range: if it is a partial entity, it is used to mark what part of the entity it is.
  • Content-md5: indicates the checksum of the body Content of the entity.
  • Last-modified: The date and time when the transmitted content was created or Last Modified on the server.
  • Expires: Indicates the date and time for an entity to try.
  • Allow: Request method that allows the requested resource.
  • ETag: Identifier for a specific version of a resource. You can make caching more efficient and save bandwidth.
  • Cache-control: command that controls the caching mechanism.

The above is the main structure of HTTP packets. When the request packet arrives at the server, the server will parse the content of the packet, process the request according to the method, resource path, header, and body, and then construct the response through the access result of the request resource, and send it back to the client.

The transport layer – TCP

HTTP connections are built on top of TCP connections, which provide reliable data connections. When an HTTP packet is transmitted, the packet data is transmitted sequentially through an open TCP connection. The TCP divides the received data into small pieces. Each piece is a TCP packet.

Data is sent in small pieces, so complete and reliable data transmission is mainly reflected in whether the packet is complete, whether the packet sequence is normal, whether the packet is damaged, and whether the packet data is repeated. These can be controlled through TCP checksums, sequence numbers, acknowledgement replies, resend control, connection management, and windowing mechanisms.

TCP is a transmission control protocol. Transmission control relies on six flags in the header to control the transmission status of packets and the actions taken by the sender and receiver. When their values are 1, the corresponding functions of the flags are allowed to be executed. For example, when URG is 1, the emergency pointer in the packet header is valid.

  • URG emergency pointer
  • ACK confirmation sequence number is valid
  • The PSH receiver should deliver this packet segment to the application layer as soon as possible.
  • RST Reestablishes the connection
  • The SYN synchronization sequence number is used to initiate a connection
  • The FIN sends the packet successfully

Source port and Destination port: identifies the port number of the sender and receiver. A TCP connection is identified by the source IP address, source port, destination IP address, and destination port. The source IP address and destination IP address are included in an IP address group.

Header length: indicates the length of the TCP header and indicates the number of bytes from which data needs to be transmitted.

TCP segment number: Indicates the sequence number of the first byte of the data sent in this segment. The sequence number of the first byte of the data in each segment starts from 0 and is incremented by 1. After the sequence number of the first byte is incremented by 1, the sequence number of the first byte starts from 0.

TCP segment confirmation number: When the header ACK flag is 1, the confirmation number is valid. After the TCP segment is received by the receiving end, it sends an acknowledgement number to the sender, adding 1 to the sequence number of the last byte received last time.

Inspection and: calculated by the sender and the receiver, if the receiver to detect inspection and is not correct, has shown a possible damage, the TCP segment will be discarded, at the same time the receiver to echo a repeated confirmation number (with a recent correct message transfer confirmation number), that receives a TCP segment is wrong, and told his wish to receive the serial number. In this case, the sender needs to retransmit the wrong TCP segment immediately.

Emergency pointer: When the URG of the header flag is 1, the emergency pointer is valid, indicating that the sender sends urgent data to the receiver. The emergency pointer is a positive offset that is added to the TCP segment sequence number to calculate the sequence number of the last byte of the emergency data. For example, if the receiver receives data and reads from bytes numbered 1000 and the emergency pointer is 1000, then the emergency data is bytes numbered between 1000 and 2000. The recipient decides what to do with this data.

Window size: Determines the throughput of TCP lumped data streams at a time. It should be noted that it represents the amount of data that the sender allows the other party to send. For example, if the size of the window in the sender’s header is 1000, the sender can accept the amount of data that the other party sends at most 1000 bytes. This is related to the data cache space of the sender and can affect TCP performance.

Header flag PSH: If you need to tell the receiver to immediately submit all data to the receiving process, the sender needs to set PSH to 1, where the data is sent with the PSH and all data received previously. If the receiver receives the PSH flag of 1, it needs to immediately submit the data to the receiving process without waiting for additional data to come in.

Reset flag RST: When RST is 1, it indicates that the connection is abnormal. The receiver terminates the connection and notifies the application layer to re-establish the connection.

SYN: Used to establish a connection, involving the TCP three-way handshake.

  1. To start a connection, the client sends a TCP packet to the server with a SYN of 1 at the head and an initial sequence number indicating that this is a connection request.

  2. If the server accepts the connection, it sends a TCP packet to the client containing the SYN and ACK values of 1 and an acknowledgement number of the initial number + 1 from the client indicating that the connection has been accepted.

  3. After receiving the packet sent in the previous step, the client sends an acknowledgement packet to the server. The ACK is 1, and the acknowledgement sequence number is + 1 from the client in the second step. After receiving the confirmation message, the server enters the connected state.

In the confirmation group of step 3, it is possible to carry the data to be sent.

When one end finishes sending data, it sends a FIN flag to terminate the connection. However, because TCP transfers data in two directions (C-S and S-C), each direction has its own FIN & confirmation closing process, there are four interactions, also known as four waves.

  1. If data is sent at the application layer, the client sends a TCP FIN packet to the server to disable data transmission.

  2. When the server receives the flag, it sends back an ACK confirming the received sequence number plus 1, and TCP sends an end-of-file character to the application.

  3. The server closes the connection in this direction, causing its TCP to send a FIN as well.

  4. The client sends back an ACKNOWLEDGEMENT ACK with the received sequence number + 1, and the connection is completely closed.

The TCP segment sequence and confirmation sequence ensure the order of data, verify and ensure data integrity, and the emergency pointer ensure that urgent data can be processed in a timely manner. In addition, TCP has some mechanisms for timeout retransmission, congestion avoidance, and slow start to ensure that packet data is transmitted to the target end in an orderly and complete manner.

– the IP network layer

If TCP packets are containers that pack goods, IP is trucks that deliver them. The IP protocol provides a connection between two nodes to ensure the transmission of TCP data from the source to the terminal as fast as possible, but it cannot guarantee the reliability of transmission.

The IP layer will be transmitted from the upper TCP packet encapsulation, bring their own head, and then choose the route, whether to fragment and reassemble the work, finally reach the destination, in this process, THE IP head plays an important role, let’s take a look at the structure of the head.

The IP header

Version: Indicates the version of the current IP protocol. The current version is 4. Another version is 6, that is, IPV4 and IPV6.

Header length: Length of the entire header. The maximum length is 60 bytes.

Service type (TOS) : It is used to distinguish service types. In fact, the IP layer has not been used in the work. The existing TOS has only 4bit sub-fields and 1bit unused bits. Unused bits must be set to 0. Only one of the four bits of TOS can be set to 1 to indicate the current service type. Four bits correspond to four service types: minimum latency, maximum throughput, maximum reliability, and minimum cost.

Total length: Indicates the total length of the current datagram, in bytes. You can calculate the size and start position of the data in the packet based on the length of the header.

The following three first field involves the IP datagram fragmentation and restructuring process, due to the network layer generally can limit the maximum length of each data frame, IP layer will send a datagram while routing query current device the maximum length of transmission network layer of each data frame, once beyond, the datagram will be shard, arrive again after restructuring, The following three fields are used as the basis for reorganization. It should be noted that the maximum transmission length of data frames is different for each layer of routing devices through which the datagram passes. Therefore, sharding may occur in any routing process.

Group ID: This ID is equivalent to the ID. The IP layer increases the group ID by 1 for each successful shard sent.

Mark: there are three marks, R, D and M, R is not used at present, but D, and M are useful. This field represents the sharding behavior of the datagram. If D is 1, data is transmitted at one time without fragmentation. If M is 1, it means that the data is shard, followed by data, and when it is 0, it means that the current datagram is the last shard or only one shard.

Slice offset: Identifies the position of the current slice from the beginning of the original datagram. After the slice is sharded, the total length of each slice is changed to the length of the slice, not the entire datagram.

Time to live :(TTL) determines whether a datagram is discarded. Because IP is by jumping to send data, data may be set in the routing functions between different IP layer forwarding, so most survival time said the datagram can how many processed through its routing, each layer routing, value minus 1, the duty is zero when the datagram is discarded, and send a message with an error message (ICMP, Part of the IP layer that is used to pass some error messages) to the source. The survival time can effectively solve the problem that datagrams are forwarded continuously in a routing loop.

Header sum: To verify the integrity of the datagram, the sender sums the header and stores the result in the checksum. The receiver computes again. If the result is the same as that in the existence sum, the transmission is OK.

Upper-layer protocol: Determines which upper-layer protocol, such as TCP or UDP, sends data to the receiving end for processing.

Source IP address: records the IP address of the sender and is used when sending error messages.

Destination IP address: indicates the destination IP address. Every route selection decision is made based on this IP address.

routing

The IP header contains only the destination IP address, not the complete path. When sending data, the IP layer makes routing decisions based on the query results of the destination IP address in the local routing table. The datagram is sent to the destination hop by hop, and each hop is a route selection.

The IP layer can be configured as either a router or a host. If the routing function is configured, datagrams can be forwarded. If the destination IP address is not the local IP address, datagrams are discarded.

How does the IP layer with routing function judge which station to forward to when the destination IP address is not the local address? To understand this problem, you need to understand the structure of the routing table. Here is the routing table maintained by the IP layer. (Windows can view the routing table by typing netstat -r on the console.)

Destination Gateway Flags Refcnt Use Interface
140.252.13.65 140.252.13.35 UGH 0 0 emd0
127.0.0.1 127.0.0.1 UH 1 0 lo0
default 140.252.13.33 UG 0 0 emd0
140.252.13.32 140.252.13.34 U 4 25043 emd0

(Routing table data is obtained from TCP/IP Volume 1: Protocol.)

  • Destination: Indicates the network address or host address that the IP packet will reach or pass through.
  • Gateway (next-hop ADDRESS) : indicates the IP address of the neighboring router that maintains the routing table
  • Flags: Indicates the attributes of the current route record. Five Flags are used to indicate the attributes:
    • U: The route is available
    • G: If there is this flag, it indicates that the next hop is a gateway. If there is no flag, it indicates that the next hop is in the same network segment as the current device. In other words, the data packet can be sent directly
    • H: If the next hop is a host or a network, the flag indicates a host. If the flag does not indicate a network, the next hop is a network
    • D: The route is created by redirection packets
    • M: The route has been modified by a redirected packet
  • Interface: indicates the physical port of the current route

When receiving a datagram, the IP layer will query the routing table according to the destination IP address. According to the query status, three results will be led:

  1. The packet is sent to the next Gateway or Interface of the route that matches the destination IP address.
  2. If the routing item matching the network number of the destination IP address is found, the packet is sent to the next Gateway or Interface of the route item.
  3. If there is a default route in the routing table, send it to the Gateway specified by the next destination.

If none of the above results, the datagram cannot be sent. IP datagrams are sent to the destination host in hops, but they have an inherent length. Once the MTU of the destination host is exceeded, they are fragmented.

The concept of datagram sharding

During the handshake, TCP determines the maximum amount of data (MSS) that can be transmitted each time according to the MAXIMUM Transmission unit (MTU) at the destination IP layer. Then TCP groups the data according to MSS, and each group is packed into an IP packet. When IP datagrams pass through routes at any layer during route selection, they may be fragmented by the MTU. In this case, the M flag in the 3-bit flag in the IP header is set to 1, indicating that fragmentation is required. The header of each shard is basically the same, but the slice offset is different. Based on the slice offset, these fragments are reorganized into a complete IP datagram (a TCP packet) at the destination. The IP transfer is unordered, so the resulting datagram is unordered, but if the data is complete, TCP sorts it based on the fields in the header. If AN IP fragment is lost and the IP layer cannot compose a complete datagram, the IP layer tells the TCP layer to retransmit.

When the IP layer encapsulates the data, only the IP address of the target host is available. An IP address alone cannot send a datagram directly, because each hardware device has its own MAC address, which is a 48-bit value. Now that you know the destination IP address, you need to find the MAC address that corresponds to this IP address. To obtain the MAC address corresponding to the destination IP address, query the routing table and combine the ARP protocol at the link layer.

Address resolution protocol: ARP

IP allows data to flow only between logical endpoints, but there is also a network interface layer beneath IP, which also has its own address (MAC address: used to uniquely identify a network adapter on the network). ARP provides this service by converting IP addresses to MAC addresses.

ARP implements mapping from IP addresses to MAC addresses. At first, the starting point does not know the MAC address of the destination, only the destination IP address, and obtaining this address involves ARP requests and responses. ARP also has its own grouping, so let’s take a look at the grouping format.

ARP packet format

Ethernet destination address: indicates the MAC address of the destination end. If it does not exist in the ARP cache table, it is a broadcast address.

Ethernet source address: MAC address of the sender.

Frame type: Different frame types have different formats, MTU values, and ids. In this case, the NUMBER of ARP is 0x0806.

Hardware type: indicates the type of the link layer network. 1 is Ethernet.

Protocol type: indicates the address type to be translated. 0x0800 indicates an IP address. For example, converting an Ethernet address to an IP address.

Op: indicates ARP request (1), ARP reply (2), RARP request (3), and RARP reply (4).

Source MAC Address: indicates the MAC address of the sender.

Source IP Address: indicates the IP address of the sender.

Destination Ethernet Address: MAC address of the target device.

Destination IP Address: Indicates the IP address of the target device.

Before two devices send a packet, the link layer of the source end uses ARP to ask for the MAC address of the destination end. ARP broadcasts the request, and each host on the Ethernet receives the broadcast to ask for the MAC address of the destination IP address. The broadcast mainly introduces its OWN IP and MAC address. If you have a target IP address, please reply to your hardware address. If a host receives a broadcast and sees that it has the IP address and requests an active IP and MAC address, it will respond with an ARP reply to the source. If there is no destination IP, the request is discarded. You can see that the request is broadcast out and the reply is answered separately.

After receiving the reply, the mapping between IP and MAC addresses is cached in the ARP cache table. The validity period is usually 20 minutes, facilitating encapsulation at the network layer next time. Therefore, the complete process is as follows:

After receiving TCP packets and before sending or encapsulating them, the IP layer queries the routing table:

  1. If the destination IP address and the destination IP address reside on the same network segment, the device searches for the MAC address corresponding to the destination IP address in the ARP cache table. If yes, the device sends the MAC address to the link layer for encapsulation. If there is no MAC address in the cache table, broadcast the MAC address and cache it. The IP layer encapsulates TCP and sends the PACKET to the link layer.
  2. If the destination IP address and its own IP address are on different network segments, you need to send the packet to the default gateway. If the MAC address corresponding to the gateway IP address exists in the ARP cache table, the MAC address is encapsulated and sent to the link layer. If not, broadcast the IP address and cache it. The IP layer encapsulates TCP and sends the packet to the link layer.

Ethernet data frame

With everything in place, the package sends Ethernet data frames. The Ethernet destination address, Ethernet source address, and frame type comprise the frame header. The pre-sync code and frame start delimiter are inserted before the header to tell the receiver to do some preparatory work. Frame checking sequence FCS is added to the tail to detect whether a frame is in error.

structure

Front sync code: Coordinates the clock frequency of the terminal receiving adapter to be the same as that of the sender.

Frame start delimiter: A frame start flag indicating that a frame message is coming and ready to be received.

Destination address: MAC address of the network adapter that receives frames. When receiving frames, the receiver checks whether the destination address matches the local address. If not, the receiver discards the destination address.

Source Address: MAC address of the sending device.

Type: Determines which protocol to submit data to once a frame is received.

Data: Data handed to the upper level. In this scenario, the IP datagram.

Frame check sequence: To detect an error in the frame, the sender computes the cyclic redundancy check (CRC) value of the frame and writes this value into the frame. The receiving computer recalculates the CRC and compares it with the value of the FCS field. If the two values are different, it indicates that data is lost or changed during transmission. At this point, you need to retransmit this frame.

Transmission and reception

  1. After receiving the packets from the upper layer, the device determines whether to split IP packets into smaller pieces based on the MTU and packet size. That is, IP packets are fragmented.
  2. Encapsulate a datagram (block) into a frame and pass it to the underlying component, which converts the frame into a bit stream and sends it out.
  3. The device on the Ethernet receives the frame, checks the destination address in the frame, and if it matches the local address, the frame is processed and passed up layer by layer (the process of sharing).

The last

A network request is encapsulated layer by layer from the source end, then split layer by layer from the terminal, and finally all the process is basically clarified. The paper only briefly summarizes the general process, and only takes the PROCESS of HTTP packet transmission through IP through TCP protocol as an example. In fact, there are still many concepts not covered. For example, the link layer tail encapsulation, IP dynamic route selection, inverse address resolution protocol RARP, UDP protocol related concepts, I suggest you can read the following references, I believe there will be more harvest.

Reference article:

The Definitive GUIDE to HTTP

TCP/IP Volume 1: Protocols

Diagram of Ethernet data frame format