The TCP/IP

Give me a name

It’s like when a baby is born, it needs a name. A host joins a new LAN in the same way: it needs the router to assign an IP name.

This assignment is accomplished through the Dynamic Host Configuration Protocol (DHCP). The interaction of this protocol can be divided into four steps.

  1. DHCP discovery. The new host sends a broadcast packet over the LAN, asking who can assign me an IP

  2. DHCP. A machine with IP assignment rights (typically a router) will enthusiastically assign IP and reply with the assigned IP, subnet mask and lease, and its own IP address.

  3. DHCP requests. Multiple DHCP servers may exist. Therefore, the host needs to notify all DHCP servers of the SELECTED IP address and DHCP server. In this case, the DHCP server that is not selected can reclaim the IP address just allocated.

  4. DHCP for confirmation. The selected DHCP server acknowledges the host’s request to the host.

This process can be observed using WireShark. First, enter the command ipconfig /release on the Windows terminal to disconnect the network, and then enter ipconfig /renew to observe the packet capture below. Remember to use BOOTP filtering, not DHCP.

In the DHCP Offer package, router 192.168.0.1 assigns IP address 192.168.0.100 to the host for a lease of 2 hours. If you are a regular LAN reader, you will find that the IP of the same LAN is basically the same. This seems to contradict dynamic IP address assignment. Well, mainly the IP before the client proposes the renewal. When the lease of a host reaches half, the DHCP server sends a renewal request to the original DHCP server. Then the DHCP server can use the original IP address again. This renewal request is also the DHCP Discover package you just saw. The diagram below:

Section 2 Play with friends in LAN

We usually use IP addresses directly to communicate with other hosts on the LAN, such as ping an IP address or Telnet an IP address. However, the communication between hosts on the Intranet transmits data through the data link layer based on MAC addresses. In other words, sending data depends not on the IP address, but on the MAC address. But we don’t use MAC addresses on a daily basis, and we can’t remember long MAC addresses. At this point, the ARP protocol came on the scene. It can find the corresponding MAC address by IP address.

ARP is a relatively simple protocol. The interaction process is a question and answer. First the enquirer announces:

  1. Enquirer: You have the IP address XXXX, tell me (include your IP and MAC address)

  2. The MAC address of this IP address is YYYY.

The diagram below:

PS: The ARP query result is cached locally. Therefore, the device of the FIRST PING LAN receives an ARP query packet. The second ping fails to see the ARP query packet.

The outside world is so big that I want to go out and have a look

Of course we won’t be content to just play on the Intranet. It’s a big world out there. I do want to go out and see. How to do? It is assumed that we are going out over TCP, the most common protocol. Suppose we got the IP address of some scenic spot in the outside world in our dream last night.

The first step to getting out is to know what cars are in your neighborhood to get to your destination. So the first step is to check the number of trains. Where can I check? Check the routing table. Windows: route print , Linux: netstat -nr

No, no, no. The first step is not getting a ride. Because that IP may be your neighbor’s IP, if the neighbor, just walk there. Because it could be that while you’re asleep your neighbor is talking about an IP over and over again, and you wake up and remember that IP. How do you know if it’s your neighbor’s IP? Check with the village head (router)? The mayor doesn’t care what dream you had last night. You’re on your own.

How do you know if it’s your neighbor? Ask yourself, of course, what IP can be called your neighbor. Yeah, well, it’s the same subnet. Yes! So just use your own subnet mask to differ from your OWN IP address, and use your own subnet mask to differ from the destination IP address, and compare the two results. If they do, they are neighbors. If they are different, they are not neighbors.

Assuming the dream IP isn’t the neighbor, we’ll have to hitch a ride. Go back to the station and check the train numbers. On Windows, run the route print command, and on Linux, run the netstat -nr command to query the routing table. The following table is an example.

A trip to a distant place may require several bus changes to reach the destination, and the local routing table is only the information of the first train. To check for subsequent trains, run tracert host on Windows and traceroute host on Linux. The result of a query is shown as follows: focus on the first and second columns in the table above, where the first column represents the destination and the second column represents the gateway from which to get to the destination. Default means that if the destination is not found in other routing records, the route is used. Routing table records can be added manually. Do not add wrong when adding, otherwise it may make some destinations go to the wrong gateway, never to the destination.

The principle of this query is that the maximum number of hops can be set by using IP packets. You only need to increase the number of hops from 1 to know the routes that can be reached in each step.

After the front of the toss finally found a leading to the end of the bus route. As the real-name system is adopted, as you only have the Intranet IP (i.e. no real-name system), there is no way to ride. How can I get a public IP address using real name system? Ask the village head (router) to assign? Unfortunately, most routers also have a public IP address that you can’t use alone. This problem is solved by the NAT protocol.

When an Intranet host requests external access, it sends a TCP request to the local host. The TCP header randomly uses a local port number. When passing through the router, the router changes the port number in the TCP header and the source IP address in the IP packet header. The port is changed to a port specified by the router, and the source IP address is changed to the public IP address of the route. In this way, the ID card is obtained. The router also needs to record the mapping between the new port and the source port. When receiving a packet from the outside world, the router finds the original port number based on the destination port number of the packet, and changes the destination IP address and port number of the packet. In the view of the Intranet host, the port number and IP address match, and the packet belongs to its own. Accept the packet.

Lecture 2: TCP and Socket programming

TCP has three handshakes and four waves, and its state transitions are shown below.

Next, introduce TCP’s three-way handshake and four-way wave through socket programming.

Socket programming, client connect, server accept and TCP three-way handshake is what the corresponding relationship? For blocked connect, when connect and Accept return, the three-way handshake has been completed (and may fail, of course).

Are there intermediate states that can be observed, or are there parameters that affect the three-way handshake? Yes. For clients, you need to set the socket to non-blocking; The second argument to server-side Listen is int backlog, which means the sum of the two queue lengths: 1. Pending connection queue; 2. The connection queue is complete. At this point, the three-way handshake process is as follows:

PS: For a client socket set as non-blocking, after the call of CONNECT, it generally detects whether the three-way handshake is complete by listening to the writability of the socket.

For four waves, here are some classic interview questions:

  1. What does read return 0 for?

  2. Why is SO_REUSEADDR set on the server?

  3. On the client side or the server side, who usually enters the TIME_WAIT state? What is its role? What is 2MSL timeout?

Generally speaking, the party that disconnects first will end up in TIME_WAIT state. If there is no more data on the socket, the other party will return 0. So when read returns 0, the socket is closed.

There are two reasons to stay in the TIME_WAIT state for 2MSL timeout:

  1. Note the actions from FIN_WAIT2 to TIME_WAIT in the state diagram. The closing party responds with an ACK when it receives a FIN from the other end. But this ACK can be lost halfway through. If the peer end does not receive the ACK within a certain period of time, it sends another FIN. If you remain in TIME_WAIT state, the ACK continues after the FIN is received. If you are not in TIME_WAIT state, an RST is responded directly. The end receiving the RST feels that closing the socket failed. To avoid this, a timeout of 2MSL is required.

  2. Assume that the duration of TIME_WAIT is 0. Consider the following situation: after closing, both ends immediately establish another connection with the same destination and source port number, and a FIN packet is lost during the previous four wave attempts. After establishing a connection, the lost FIN finally reached its destination. At this point, the FIN will be regarded as the new connection’s disconnection by waving.

For the server, if the process exits due to a bug. The server will enter the TIME_WAIT state and wait 2MSL. But the process should be up and running without a moment’s delay. Therefore, the SO_REUSEADDR parameter needs to be set.

The third speak HTTP

The first section the DNS

As mentioned earlier, we need a public IP to play outside, but we rarely use IP on a daily basis. Mainly the IP address is a meaningless and difficult to remember a string of text. Domain names were invented to make it easier for people to remember the address they use on the Internet. But the bottom line is still using IP addresses. To do this, you need a protocol to convert the domain name to IP. This protocol is the Domain Name System (DNS). This service is usually provided by broadband providers, but you can also specify a DNS server address. The following is a query result:

This DNS query is done automatically by the browser and cached. The DNS query principle and the process is too boring, here is not opened, interested can refer to blog.csdn.net/luotuo44/ar…

Classic question: Why are there only 13 domain root servers? In simple terms, DNS queries are performed using UDP by default. UDP packets that are not fragmented are 512 bytes long, which can contain only 13 IP addresses. Specific calculation can refer to www.zhihu.com/question/22…

The previous query is essentially A query to the DNS server for A record of A domain name. However, in addition to A records, there are AAAA records (IPv6), MX records (mailbox) and CNAME records. CNAME record is mainly used for CDN, that is, network acceleration, you brush kuaishou, Zhihu all need CDN acceleration, otherwise it will be slow. CNAME is too human (ren) work (ROu) wisdom (PEI) energy (zhi) will not be started.

Section 2 Message

HTTP packets are plain text packets, and are sent line by line.

Request information consists of request lines, request headers, blank lines, and other message bodies. The diagram below.

Response information also consists of response lines, response headers, and other messages. The diagram below.

Section 3 HTTP problems and Solutions

1. Performance problems

HTTP has gone through versions 0.9, 1.0, 1.1, and 2.0, with performance as one of the main themes. What is the best way to achieve request file performance? It’s best not to ask, of course, so there’s no time consuming. Of course that’s not realistic, the next best thing is to minimize interactions, minimize RTT.

With version 1.0, the default was to create a TCP connection on a single request and close it when it was finished. This was improved in version 1.1. The default is to reuse the previous connection, which is a persistent connection. After transferring one file, continue transferring the next file over the same TCP connection. This reduces the TCP three-way handshake time. In addition, due to TCP’s slow start and the small size of HTTP text itself, reusing the connection makes subsequent transfers faster.

Why not pack all the files into one? It is possible to do this, and frameworks like Vue do pack multiple JS files into one. After all, you shouldn’t pack JS, CSS, HTML, and PNG files into one.

So why request the file serially instead of simultaneously requesting the file over multiple TCP connections? Unfortunately, the concurrency is limited. The browser allows a maximum of six simultaneous connections for a domain name. Of course, there are some ways to break through this limitation, for example, I get more domain names, js, CSS and PNG static files into a fixed domain name, other domain names can be directly referenced. Tools such as Vue can choose which files to leave out of the package when it is packaged. In this case, js, CSS and PNG files can be placed on the CDN for faster access.

HTTP 1.1 is slow for another reason: queue headers block. Suppose the client sends a JS resource request and then a CSS resource request on the same connection. On the server side, CSS files are ready first, but the server still waits for JS resources to be sent to the client before transferring CSS resources. That is, later resources must wait for the previous transmission to complete before being transferred, which is also called queue head blocking.

The solution to http2.0 is to use data frames. As shown in the figure below, add a frame identifier for each request, and the server carries this frame identifier when responding, so that the client can correctly identify the request even if it is returned out of order to the client.

2. Security issues

Cross-domain, HTTPS, stateless

  • stateless

HTTP is a stateless protocol. That is to say, at the protocol level, even if it is the same client and server, the two HTTP requests are independent. But in the real world, you need to be coherent. For example, if I add items to the cart with my front foot, I want to be able to see the contents of the cart when I view the cart details later.

In order to be stateful, you need to authenticate the client. Know who is operating this request.

session

The most common solution is server sessions. The server generates a session for the client and stores it in the server memory. Then it generates an ID that can find the session and writes this ID to the client as a cookie. The cookie is automatically written when the client requests it again. The server then finds the session based on the ID in the cookie, so it knows who the client is and what his previous data was.

The Session scheme also has some problems. How can sessions be shared across machines when services cannot afford to scale horizontally? The reason for sharing Sessin is that the client made two requests to a different server. The solution is to uniformly write sessions to another location, such as the Redis cluster. You can also route the same client to the same server based on the user ID, so there is no sharing problem.

In addition, cookies saved in the browser are at risk of theft, see cross domain below.

jwt

The previous session scenario stores the user information on the server, whereas JWT stores the information on the client. The server is responsible for generating the JWT, which is passed to the client. The client carries this JWT with each request. The server verifies that the JWT is valid and reads who the user is from the contents of the JWT.

JWT is composed of three parts, namely head, load and visa. And separate it with a., which is header.paypay.signature.

The header consists of the token type (that is, JWT) and the signature algorithm used, as follows:

{    "type": "JWT",    "alg": "HS256"}
Copy the code

The JSON will use the Base64 encoding as the header.

The load is a place to store some useful information. Such as user ID, expiration time and so on. It’s also a JSON. Common fields are as follows:

  • Iss: JWT issuer

  • Sub: JWT users

  • Aud: The side receiving the JWT

  • Exp: indicates the expiration time of JWT

  • Iat: issue time of JWT

In addition to the common fields, you can also add your own required fields. This JSON is ultimately encoded in Base64 as payload.

The signature is created by encrypting the string header.paylod using the header declaration encryption method. Note The key is stored only on the server.

Payload can be filled with user information, such as the user name and user ID. The signature part ensures that the payload part cannot be tampered with. The payload is transmitted in plaintext. Therefore, HTTPS is recommended for JWT applications.

  • Cross domain

Cross-domain is where the browser itself forbids some behavior. For example, the page of website A cannot send an HTTP request to website B to obtain non-JS, CSS, or PNG resources. This is mainly for safety reasons. Because the browser visits B’s website, B’s website will be stored in the browser’s cookie. With cookies, site A can fake A request from A user to site B. For example, the developer of website A learned that A payment URL of website B is B.com/payTo/user?… There is A button in the content returned by website A. When the user clicks this button, the request initiated is the URL just now, which is equivalent to the payment operation conducted without the user’s knowledge, because the cookie carried by website B belongs to the user.

Note: static files such as JS, CSS, PNG/JPG do not cause cross-domain problems.

Browsers prohibit this behavior, but in the development mode of the front and back end, the domain name that the front and back end provide services is often different. This causes front-end requests to fail to request data in the background. That’s where the cross-domain problem comes in. Cross-domain problems can be resolved on both the server side and the client side.

  • Server-side method

The reason for the cross-domain origin mentioned above is that it affects the data security of B website. So, assuming that Site B itself allows requests from Site A, there is no reason for the browser to prohibit such behavior. Generally speaking, when the browser encounters cross-domain, it will first send an OPTIONS method request to website B to apply for cross-domain, and website B responds to allow cross-domain. The diagram below:

The response is as follows:

How do you set it up?

If the source domain name is localhost:9528 and the cross-domain domain name is 127.0.0.1:8000, configure nginx where 8000 resides as follows:

Access-Control-Allow-Credentials: true
Access-Control-Allow-Headers: Origin,Content-Length,Content-Type,X-Token
Access-Control-Allow-Methods: GET,PUT,POST,PATCH,DELETE,HEAD
Access-Control-Allow-Origin: http://localhost:9528Access-Control-Max-Age: 43200
Copy the code

If cookies are not involved in 127.0.0.1:8000, you can leave access-control-allow-credentials unchecked and set access-Control-allow-origin to *, indicating that the source domain name can be any.

If cookies are used to Access 127.0.0.1:8000, then access-control-allow-credentials must be set to true and access-Control-allow-origin must be set to the specific source domain name. Indicates which domain names this site can be straddled by.

If you don’t want to set it in nginx, the code for gin is as follows:

func settingCors(r *gin.Engine){ 
    r.Use(cors.New(cors.Config{  
        AllowOrigins: []string{"http://localhost:9528"},  
        AllowMethods: []string{"GET", "PUT", "POST", "PATCH", "DELETE", "HEAD"}, 
        AllowHeaders: []string{"Origin", "Content-Length", "Content-Type", "x-token"}, 
        AllowCredentials: true,  
        MaxAge: 43200 * time.Second, 
    }))}
Copy the code

Access-control-allow-origin allows you to write multiple domain names.

For Spring Boot, the setup code is as follows:

import org.springframework.context.annotation.Configuration; import org.springframework.web.servlet.config.annotation.CorsRegistry; import org.springframework.web.servlet.config.annotation.WebMvcConfigurer; @Configuration public class CorsConfig implements WebMvcConfigurer { @Override public void addCorsMappings(CorsRegistry registry) { registry.addMapping("/**") .allowedOrigins("http://localhost:9528") .allowCredentials(true) .allowedMethods("GET", "POST", "DELETE", "PUT","PATCH") .allowedHeaders("Origin", "Content-Length", "Content-Type", "x-token") .maxAge(3600); }}Copy the code
  • Client-side methods

The solution for clients to cross domains is not to cross domains, is not to compete with the browser.

For example, for all resources that require cross-domain access, add a specific prefix (such as API) to their URIs, then set the proxy to catch this prefix and change the Host to access it, thus circumvent the browser’s cross-domain restrictions.

For Vue, the code is as follows:

// import axios from 'axios' vue.prototype.$axios = axios; if(process.env.NODE_ENV === "development") { axios.defaults.baseURL = "/api"; } //vue.config.js devServer: {host: "localhost", port: 8081, // port number HTTPS: false, // HTTPS :{type:Boolean} open: True,// Configure automatic startup browser proxy: {"/ API ": {target: "http://127.0.0.1/v1",// Back-end domain name and prefix to cross domains //ws: // Enable webSockets changeOrigin: PathRewrite: {'^/ API ': '// remove prefix/API}}} pathRewrite: {'^/ API ':' // remove prefix/API}}}Copy the code

reference

  • Zh.wikipedia.org/wiki/%E5%8A…

  • Blog.csdn.net/zqixiao_09/…

  • Blog.51cto.com/yupeizhi/14…

  • TCP/IP Volume 1: Protocols

  • The Definitive Guide to Web Performance

  • www.jianshu.com/p/576dbf44b…

—————————————————————————————–

Thank you for your attention! More technical dry goods in the “wechat public number (code color)”.