1. Preface

Some time ago, I finished reading the book “Illustrated HTTP” and thought about writing something, but I did not find a suitable entry point. Before I read Wang Fupeng’s blog, he said: “I want to write something, I must think of a very special and reasonable, efficient way of thinking, if there is no I would rather not write.” Therefore, if there is no complete and reasonable thinking in mind, it is really better not to write.

After a lot of thought, I decided to start with purpose. I started this book with two goals in mind

  • 1. What does a complete HTTP request look like?
  • 2. What performance optimizations can be made throughout the HTTP request process?

2. A complete HTTP request

Front-end performance optimization

2.1.DNS Domain name Resolution

First of all, how does a DNS server get IP from a domain

  • 1. Browser cache
  • 2. Operating system cache
  • Router cache
  • ISP (Internet service provider) DNS cache, usually a company or a school or department may be an ISP
  • If you don’t get any of the above, then you need to use recursive queries. For example, www.baidu.com, search the root domain name server first, then com domain name server, then Baidu.com domain name server.

2.2. A TCP connection

The TCP three-way handshake is used to ensure communication reliability. The process is as follows:

2.3.HTTP Request and HTTP Response

In terms of HTTP Request and HTTP Response, I’ll focus on a few fields related to caching. Cache-control :(generic header field) Expires :(HTTP response) last-modified/if-modified-since Etag/if-none-match

Cache-control (http1.1)

Before saying this, we must first say: cache device: mainly includes browser and CDN (cache server), in which the local browser generally caches private resources, CDN mainly places public resources. Cache-control contains the following commands:

  • Max-age: indicates that the resource is directly fetched from the cache without sending a request to the server.
  • S-maxage: the function is similar to that of max-age, except that s-maxage takes resources from a public cache server.
  • No-cache: Go to the server and ask if the cache has expired.
  • No-store: no cache is used at all.

Expires (HTTP1.0) (Strong browser caching policy)

Cache-control takes high priority when both cache-control and Expires exist

Expires Cache expiration time. Specifies the expiration time of a resource. Tell the browser that the browser can retrieve the data directly from the browser cache before the expiration date without having to request it again.

Last-modified/if-modified-since + needs to be used with cache-control

If-modified-since: If the request is the latest, then the server will return 304. If the request is the latest, then the server will return 304. The browser takes the data directly from the cache and returns 200 if it is not new, and last-Modified returns the Last modification time.

Disadvantages: Cannot accurately obtain the modification time

Etag / If-None-Match

For last-modified/if-modified-since, the value of Etag/if-none-match is the hash value of the file content, and the hash value will change whenever the file content changes. Relative to the above time as a decision principle is more accurate.

2.4. Pipelines

Persistent connections make it possible for most requests to be piped, waiting and receiving a response before sending the next request. Pipelining is the ability to send multiple requests simultaneously in parallel.

2.5.Nginx(Load Balancer + Reverse Proxy)

Load balancing

We deploy multiple application servers and distribute user requests to different servers through load balancing to improve the performance and reliability of websites, applications, databases, or other services.

Forward proxy + reverse proxy

Forward proxy Indicates the proxy client and reverse proxy indicates the proxy server. Advantages of forward proxy :(VPN wall-scaling service)

  • Access previously inaccessible resources.
  • Do cache optimization.
  • Client access control management improves security.

Advantages of reverse proxy :(nginx load balancing)

  • Cache optimization.
  • Load balancing
  • Improve security.

3. What performance optimizations can be made throughout the HTTP request process?

We can see that a complete HTTP request is first resolved by DNS to the IP > Tcp connection >HTTP request >HTTP response

Therefore, the optimization points in the HTTP request process mainly include the following two parts:

3.1. The DNS resolution

As you can see, Taobao uses a lot of DNS pre-resolution in the link tag

3.2. Nginx configuration implements caching of static resources

Since this chapter is quite extensive, I will cover nginx and nginx static resource caching in detail in my next blog post.