Insert the picture description here

Eleven and eight days to play a little hi, not easy to idle down, the next HTTP protocol this dejisui things. This article is a personal book, read the blog, if you have any questions, feel free to point out. Learning computer network all by personal hobby, some, change configuration parameters tuning and so on did not go deep, the follow-up use in learning.

What is the HTTP protocol


HTTP Protocol – Hypertext Transfer Protocol (Hypertext Transfer Protocol), HTTP Protocol is built on the TCP Protocol on one of the applications, the end of this article 😄. Just a joke, just a literal translation.

  • Hypertext: a mixture of text, pictures, audio, video, etc
  • Transfer: A->B TRANSFER from A to B (I don’t care about the process, I only care about the result. 🐶)
  • Agreement: more than one person’s agreement

There’s no other word to break down, but put these three words together and you get the HTTP protocol: here’s my personal translation

  • HTTP protocol: a convention used to transmit a mixture of text, pictures, audio, and video between computers.

The HTTP protocol has experienced several versions: HTTP /0.9, HTTP /1.0, HTTP /1.1, HTTP /2, AND HTTP /3. (Just to know)

  • HTTP /0.9, based on Tim Berners-Lee’s original paper, can only transmit plain text
  • HTTP /1.0 establishes an outline, not a special standard
  • HTTP /1.1 is the most widely used protocol and has a relatively broad range of functions
  • HTTP /2.0 is Google’s SPDY protocol, which is 1.1 compatible and has speed improvements. Before it’s widely used, Google is proposing 3.0
  • HTTP /3.0 In HTTP /3.0, the TCP protocol is deprecated, and QUIC protocol implementation based on UDP protocol is used instead. (Google loves to fight.)

The main new features of HTTP /1.1 are

  • Added new methods such as PUT and DELETE.
  • Added cache management and control.
  • Support long links.
  • Chunked response data is allowed, facilitating the transfer of large files.
  • Both request and response headers support host.

The URL and URI

Urls and URIwiki explain that you can think of them as a subset of urIs. Wikipedia connection URI URL

Two examples of URLS

  • Scheme: protocol type (http://http protocol, FTP, file, etc.)
  • : // : hierarchy URL tokens, as defined
  • User Information: Credential information required to access the resource (omitted)
  • Host :port: the host name (or domain name) plus the port number. The default port number can be omitted
  • Path: Indicates the location of the resource
  • ? Query: Indicates the query condition,? For the starting point, k=v format, multiple conditions with &, Chinese and Japanese converted to % UTF-8
  • Segments of fragments:

Request process

1. URL parsing

  • Address resolution:

The first step is to determine whether you have entered a valid URL or a keyword to be searched, and automate completion, character encoding, etc., based on what you have entered.

  • HSTS

HSTS is used to force the client to access the page using HTTPS due to security concerns. See: What you Didn’t know about HSTS.

  • Other operating

The browser also performs some additional operations, such as security checks and access restrictions (previously, domestic browsers were limited to 996. Icu).

  • Check the cache

Generate HTTP requests

  • The browser generates an HTTP request message that consists of three main parts:
    1. The starting line
    2. Request header
    3. The message body

The overall structure is as follows

  • The request header is different from the first line of the response header

    • Request header: method URI HTTP version
    • Response header: HTTP version status code response phrase
  • Common request headers

Only the common ones are listed here. See Request Headers for the less common ones

Head agreement instructions The sample
Accept The type of postback the browser can accept from the server Accept: text/html
Accept-Encoding The browser declares the encoding method it receives Accept-Encoding: gzip, deflate
Accept-Language The browser declares the language it receives Accept-Language:zh-CN,zh; Q = 0.9
Connection Whether long links are supported Connection: keep-alive
Host Request header field Host: www.baidu.com
Referer Tell the server what page I’m linking from, so the server can get some information to process. Referer:https://www.baidu.com/?tn=62095104_8_oem_dg
User-Agent The name and version of the operating system and browser used by the client The user-agent: Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36
Cache-Control The cache Cache-Control:max-age=10
Cookie Some necessary information for server and client bookings Cookie: _ga = GA1.2.1083213790.1598628058;
  • Common response header

Only common Response Headers are collated, but there are many more. For details, see Response Headers

Head agreement instructions The sample
Cache-Control The cache Cache-Control:max-age=10
Content-Type The type of resource file, as well as the character encoding The content-type: text/HTML. charset=UTF-8
Content-Encoding Coding format Content-Encoding:gzip
Date The server time when the server sends resources Date: Tue, 03 Apr 2018 03:52:28 GMT
Server Server and its version Server: BWS / 1.1
Transfer-Encoding The server sends resources in blocks Transfer-Encoding: chunked
Connection Whether long links are supported Connection: keep-alive
Expires The Cache expiration time is based on cache-control Expires: Sun, 25 Oct 2020 06:13:15 GMT
Access-Control-Allow-Origin Specifies which sites can be shared across domains. * means all sites can be shared Access-Control-Allow-Origin: www.baidu.com
  • Status code in the response information

    • 1xx: inform the processing progress and situation of the request
    • 2 xx: success
      • 200 OK
      • 204 No Content No returned Content
    • 3xx: indicates redirection
      • 301 Moved Permanently, the resource has been Moved elsewhere, and the new URI is in the Location of the response header
      • 302 redirection when Found 0, also in Location
      • 303 See Other is similar to 302, but can only be requested using GET
    • 4xx: Client error
      • 401 Unauthorized authentication fails
      • 403 Forbidden The server refused to access the requested resource
      • 404 Not Found Indicates that the server could Not find the requested resource
      • 405 Not Allowed The request method is incorrect
    • 5xx: Server error
      • 500 Internal Server Error An Internal Server Error occurs
      • 503 Service Unavailable The server is overloaded or is down for maintenance and cannot process requests
  • A URL can only get a resource, such as a page containing a lot of images, JS what. He just keeps sending requests, and you think you just hit enter, but you send a lot of requests

3. DNS query

With the request information generated and the IP address unknown, the DNS query translates the domain name into the port number. The process is similar to knowing the name of the restaurant and not knowing the address.

Insert the picture description here
  1. The DNS server is sent to the local DNS server to ask for the IP address
  2. If the local DNS server finds the IP address in the cache, it returns the IP address directly. If the local DNS server does not find the IP address, it requests the ROOT DNS server
  3. The DNS root server searches for the domain name www.163.com from the back to the front and sends the server to the COM domain server
  4. The local DNS server then asks the COM domain server.
  5. The com domain server sends the local server to the 163 domain server.
  6. The local DNS server sends the request to domain server 163.
  7. 163 The domain server returns the IP address after finding it.
  8. The local DNS server returns the IP address

The hierarchy of domain names is as followsIn the actual situation, a DNS server generally manages multiple domains, and each time such a topdown query pressure will be particularly large, in order to alleviate these pressures, will use the cache, there is a cache record directly returned, directly in the cache location rural query.

If you change the hosts file, this will not happen. If you go to the IP address specified by your host, you will almost never find it.

4. Send messages

Now that you have the message body and the IP address of the peer, you can start sending the message. The general order in which messages are sent is:

  1. Create a socket
  2. Connect to the server socket
  3. To send data
  4. Disconnect and remove the socket

In the operating system, the general relationship is as follows: the top part delegates work to the bottom. As with Java apis, the top level browser doesn’t need to care what the bottom level looks like, it just calls the Socket library. The above DNS query is also a function inside the Socket.

  • What is a Socket?
    • In computer science, the end point of the flow of data between processes in a computer network (fromwiki). A general concept, a storage of communication object IP address, port number, communication operations and other control information memory space, after the creation of a Socket (Socket) can be used to use various APIS. It is a section of communication, generally communication client and server have one. Netstat displays the socket contents
  • When an application requests to create a socket, the stack opens up a chunk of memory before sending and receiving data, and there is only some initial state information in memory.
  • After the Socket is created, connect in the Socket library is called, and information such as the server’S IP address and port number is then written. When a server is started, it creates a connection with a client, such as a socket, but the two parties do not know who they are talking to, so they can never communicate with each other, and the client will actively communicate with the server (so the client initiates the HTTP request, and the server only responds). When the connection is complete, both sockets know each other’s sockets. It’s a three-way handshake, and we’re going to talk about that, but just to get the general idea.
  • The sending application calls write in the Socket to send data, but instead of sending data immediately, it is cached in the sending buffer. How much is sent at a time depends on the program, it may be sent all at once, or it may be sent in batches. MTU (maximum output is expected to be 1500 bytes, including the total length of the header), MTU- request header =MSS (maximum segment size, TCP header +IP header is typically 40 bytes). The MSS is usually determined by both parties when the connection is established.If there is more data in the cache than MSS, it is split in MSS, and each chunk is sent with a request header. After subcontracting, the server will confirm and send the received serial number plus the length of the received data to the client. The client will confirm and resend if there is a problem. If an IP fragment is lost, retransmission will also occur. At this point, the server has not responded, and the application is temporarily suspended waiting for the server to respond. The server returns data by repeating the above process, which will not be detailed here.
  • Disconnect and delete the socket at a time when all data has been sent and there is no need to continue the connection. In this case, both the client and server can initiate a disconnection request. For example, when the server sends a disconnect request, it calls the close method of the Socket library, the protocol stack sends a TCP header with FIN 1, and the client sends an ACK number. When all data is received, the client will also call the close method of the Socket library. As above, the client will send a TCP packet with FIN 1, and the server will send an ACK number. At this point, the communication is over. It waits until an acknowledged ACK number is received (it may be retransmitted if not) to remove the socket. The process is four handshakes, which we’ll talk about later.

Generate packages

So that’s generating packages, but only a small part of it. When the message is sent, the Socket library calls down the TCP module, which adds a TCP header to the request.IP and MAC headers are added when the TCP module delegates to the IP moduleFollow-up will be to the network card, network card will be converted into digital signal light signal, through the cable sent to hub routers and other equipment, will eventually spread to the other side of the card, the card into President signal, via IP module, remove the IP header and MAC header via TCP module again to remove the TCP header, finally get the original request information. The following is the general process

MAC address and transmission

With the IP address of the other party indicated the destination to be transmitted, just know who to send. But Ethernet doesn’t know this stuff, only the MAC address. The MAC address is written in the ROM when the network card is produced, while the IP address is a number given to the computer by the Internet, which is somewhat similar to the residential address and id number. At this point there is a problem, do not know the MAC address of the other party. According to the Gateway column in the routing table of the destination IP address area, the IP module checks which router to send to, obtains the IP address of the router, knows the IP address but does not know what to do with the MAC address, and sends a group message through ARP. If someone finds that the IP address is his or her own, he or she will return the MAC address.

MAC address will fill in this MAC address. Then, the next router forwards the packet to the next router. After several layers of forwarding, the network packet reaches its final destination. During the transmission of network packets, the source IP address and destination IP address do not change, but the MAC address always changes.

other

A long connection

The single request process, as you should know, takes a very long time to connect and destroy if you have three handshakes and four waves each time, hence the long link. If the Connection is keep-alive in the request header, it is a persistent Connection

Three-way handshake

The client initiates the three-way handshake first.

  • The client initializes the sequence number, writes it into the sequence number of the TCP header, and sets SYN to 1. After that, the client enters the SYS_SENT state.
  • After receiving the SYN packet from the client, the server initializes its own sequence number, writes the sequence number in the TCP header, and writes the sequence number +1 sent by the server into the acknowledgement sequence number. Then set the SYN and ACK bits to 1 and return, and enter the SYN_RCVD state.
  • After receiving the packet, the client writes the serial number +1 sent by the server into the confirmation serial number and sets the ACK bit to 1. The system enters the ESTABLISHED state. This transmission can carry data.
  • The server also enters the ESTABLISHED state.

Why three handshakes

  • The server returns. When the client finds that the handshake has expired, it sets the RST bit to 1, indicating the termination of the link.
  • Syncing Serial numbers of both parties SYN+1 is written into the confirmation serial number, indicating that both parties know the serial number of the other party. This ensures that the serial numbers of both parties are synchronized

Four times to wave

Both sides can initiate a request with four waves.

  • When a party attempts to close the connection, it sends a packet with the FIN header marked 1 and then enters the FIN_WAIT_1 state
  • After receiving the ACK packet, the peer party sends an ACK packet and enters the CLOSED_WAIT state
  • After receiving the packet, you enter the FIN_WAIT_2 state
  • After the processing is complete, the peer party sends the FIN packet and enters the LAST_ACK state
  • After receiving an ACK packet, the SVN sends an ACK packet and enters the TIME_WAIT state
  • After receiving the reply packet, the peer enters the CLOSE state
  • Your side also enters the CLOSE state after passing 2MSL

Why four waves

  • When the FIN packet is sent for the first time, the peer party no longer sends data but can still receive it
  • After receiving the packet, the other party first sends an ACK reply packet, because it may need to wait for data processing and sending. When the packet is not sent, it sends a FIN packet to close the link, so it is not merged into one.

Reference blog xiaolin coding graphical network reference books “graphic HTTP” the diagram to TCP/IP “how is the network connection”

This article was typeset using MDNICE