Nginx processes the entire HTTP request process

Now that I’ve shown you how Nginx handles HTTP headers, it’s time to actually handle HTTP requests. Take a look at the diagram below, which shows how Nginx handles HTTP requests.

  1. Read Request Headers: parses the Request Headers.
  2. Identify Configuration Block: Identifies which location is used for processing and matches the URL.
  3. Apply Rate Limits: Determines whether the Rate limit is limited. For example, the request may have too many concurrent connections to exceed the limit, or the QPS may be too high.
  4. Perform Authentication: performs connection Authentication. For example, you might do some anti-theft Settings based on the Referrer header, or verify the user’s permissions.
  5. Generate Content: Generates a response that is returned to the user. To generate this response, you may need to communicate with Upstream Services while doing the reverse proxy, and then there may be sub-requests or redirects that take you through Internal redirects and subrequests.
  6. Response Filters: Filter the Response returned to the user. Such as compressing the response, or manipulating the image.
  7. Log: records logs.

The above seven steps provide an overview of the process, and the actual process will be described below.

Nginx processes 11 stages of HTTP requests

Here are 11 phases, each of which may correspond to one or more HTTP modules, to give a better understanding of how these modules work.

  1. POST_READ: After reading the header of the request, this is the stage at which processing is performed to retrieve some raw values without doing anything to the header. This will involve a realIP module.
  2. SERVER_REWRITE: As with the REWRITE stage, there is only one module called the REWRITE module, and no third-party module will handle this stage.
  3. FIND_CONFIG: does the location match, currently no module will use it.
  4. REWRITE: Do something with urls.
  5. POST_WRITE: After REWRITE, no modules will appear at this stage.

Next are the three modules that confirm user access:

  1. PREACCESS: Some work is done before ACCESS, such as concurrent connections and QPS need to be restricted, involving two modules: limt_CONN and limit_req
  2. ACCESS: The core is to solve the user ACCESS problem, for example, auth_basic is the user name and password, ACCESS is the user ACCESS IP, auth_request according to the third-party service return whether to ACCESS.
  3. POST_ACCESS: something will be done after ACCESS, and no modules will be used for the time being.

The last three phases deal with responses and logging:

  1. PRECONTENT: Some things are done before CONTENT is processed, such as sending sub-requests to third-party services for processing. The try_Files module is also in this phase.

  2. CONTENT: There are a lot of modules involved in this stage, such as index, Autoindex, concat and so on are effective in this stage.

  3. LOG: Records logs. Access_log module.

Each of these phases is handled in strict order, but of course the order in which the HTTP modules process each phase is also important. If one module does not pass the request down, subsequent modules will not receive the request. And not all modules in each phase need to be executed once, so here is the sequence of requests between modules in each phase.

Sequential processing of 11 stages

As shown in the figure below, each module is in order between processes, so how can this order be obtained? In ngx_module.c, there is an array ngx_module_name, which contains all modules that were included in the with directive when compiling Nginx. The order between them is very important, and the order in the array is reversed.

Char *ngx_module_names[] = {... ... "ngx_http_static_module", "ngx_http_autoindex_module", "ngx_http_index_module", "ngx_http_random_index_module", "ngx_http_mirror_module", "ngx_http_try_files_module", "ngx_http_auth_request_module", "ngx_http_auth_basic_module", "ngx_http_access_module", "ngx_http_limit_conn_module", "ngx_http_limit_req_module", "ngx_http_realip_module", "Ngx_http_referer_module", "ngx_http_rewrite_module ngx_http_concat_module", "",... ... }Copy the code

The grayed modules are the framework parts of Nginx that perform the processing; third-party modules have no chance of being processed here.

In the process of sequential execution, it may not follow this order. For example, in the Access phase, there is a directive called satisfy that tells you to go straight to the next phase when there is a satisfy, such as the try_files module, The auth_BASIC and auth_REQUEST modules are no longer executed.

In the Content phase, when the index module is executed, the auto_index module is not executed, and the log module is skipped directly.

Modules involved in the whole 11 stages and their sequence are shown in the figure below:

Let’s start with a detailed breakdown of the stages. Let’s start with the first stage, the Postread stage, which, as the name implies, comes into play before the request is formally processed.

Postread phase

The Postread phase, the first of the 11 phases, has just fetched the headers of the request, and has not yet done any processing, so we can get some raw information. For example, get the user’s real IP address

Question: How to get the user’s real IP address?

We know that A TCP connection is made up of a quad that contains the source IP address. In the real Internet, there are many forward and reverse proxies. Such as the end user has its own internal network IP address, operators will assign a public IP, then visit a web site, the site may use a CDN to accelerate some static files or pictures, if the CDN missed, so will be back to the source, back to the source of time may pass a reverse proxy, such as ali cloud SLB, Then Nginx will be reached.

We need to obtain the public IP address 115.204.33.1 assigned by the operator to the user, to control or speed limit the concurrent connection, but Nginx gets 2.2.2.2, so how can we obtain the real user IP address?

In HTTP, there are two headers that can be used to obtain the user’S IP:

  • The X- forwardex-for header is used to transfer IP. This header records the IP of each node that passes through
  • X-real-ip: indicates that only one Real IP address can be recorded

How to use the real user IP?

For this problem, Nginx is used based on variables.

Remote_addr = binary_remote_ADDR; remote_addr = binary_remote_addr; remote_addr = real IP; It does not take effect in the Postread phase.

Realip module

  • Does not compile into Nginx by default

    • Need to pass through--with-http_realip_moduleEnable the function
  • Variables: If you want to use the address and port from the original TCP connection, you need to save these two variables

    • realip_remote_addr
    • realip_remote_port
  • function

    • Example Modify the client address
  • instruction

    • set_real_ip_from

      Specifies the trusted address from which the realIP obtained is trusted only when the connection is established

    • real_ip_header

      If this parameter is set to FORWARded-for, the IP address is Forwarded to forwarded-for. If this parameter is set to Forwarded-for, the IP address is Forwarded to Forwarded-for first

    • real_ip_recursive

      This function is disabled by default. If this function is enabled, if the last address of x-Forwarded-For is the same as that of the client, this address is filtered out

Syntax: set_real_ip_from address | CIDR | unix:; Default: - the Context: HTTP server, the location Syntax: real_ip_header field - Real - IP | | X X - Forwarded - For | proxy_protocol; Default: real_ip_header X-Real-IP; Context: http, server, location Syntax: real_ip_recursive on | off; Default: real_ip_recursive off; Context: http, server, locationCopy the code

In actual combat

The default value of real_ip_recursive is disabled when the default value of real_ip_recursive is disabled.

  • Recompile a nginx with a realIP module

About how to compile Nginx, see: iziyang. Making. IO / 2020/03/10 /…

/configure --prefix= with-http_realip_module make make installCopy the code
  • Then go to the Nginx installation directory you specified in the previous step
Shielding the default # nginx. Conf file server piece of content, and add a line include/Users/MTDP/myproject/nginx/test_nginx/conf/example / *. Conf.Copy the code
Realip. conf, set_real_ip_from can be set to its own local IP server {listen 80; server_name ziyang.realip.com; error_log /Users/mtdp/myproject/nginx/nginx/logs/myerror.log debug; Set_real_ip_from 192.168.0.108; #real_ip_header X-Real-IP; real_ip_recursive off; # real_ip_recursive on; real_ip_header X-Forwarded-For; location / { return 200 "Client real ip: $remote_addr\n"; }}Copy the code

In the configuration file above, I set the trusted proxy address to the local address, real_IP_recursive to the default off, and real_IP_header to fetch from X-Forwarded-for.

  • Reloading configuration files
./sbin/nginx -s reloadCopy the code
  • Test response results
➜ test_nginx curl -h 'X-Forwarded-For: 1.1.1.1,192.168.0.108' Ziyang.realip.com Client real IP address: 192.168.0.108Copy the code

Then test the opening of real_IP_recursive:

  • Open in configuration filereal_ip_recursive
server { listen 80; server_name ziyang.realip.com; error_log /Users/mtdp/myproject/nginx/nginx/logs/myerror.log debug; Set_real_ip_from 192.168.0.108; #real_ip_header X-Real-IP; #real_ip_recursive off; real_ip_recursive on; real_ip_header X-Forwarded-For; location / { return 200 "Client real ip: $remote_addr\n"; }}Copy the code
  • Test response results
➜ test_nginx curl -h 'X - Forwarded - For: 1.1.1.1, 2.2.2.2, 192.168.0.108' ziyang.realip.com Client real IP: 2.2.2.2Copy the code

If x-Forwarded-For is used to fetch realIP, real_IP_recursive must be enabled. Realip is set to the trusted address set by set_real_IP_FROM.

So one might ask, well, why not just use x-real-IP to pick the Real IP address? This is fine, but X-real-IP is Nginx specific and not an RFC specification. If there are other proxies between the client and server that are not implemented by Nginx software, x-real-IP headers will not be available, so it depends on the situation.

Rewrite module in the rewrite phase

Take a look at the rewrite module.

In the rewrite module, there is a return directive that will not be executed again. Return the response directly.

The return instructions

The syntax of the return command is as follows:

  • Return the status code, followed by the body
  • Returns the status code, followed by the URL
  • So let’s go back to the URL
Syntax: return code [text];
        return code URL;
        return URL;
Default: —
Context: server, location, ifCopy the code

The returned status codes include the following:

  • Nginx custom
    • 444: Close the connection immediately, the user does not receive any response
  • The HTTP 1.0 standard
    • 301: Permanent redirect
    • 302: Temporary redirection, forbid cache
  • The HTTP 1.1 standard
    • 303: Temporary redirection that allows method changes and forbids caching
    • 307: temporary redirection, do not allow to change the method, do not cache
    • 308: Permanent redirection, method change is not allowed

Return directive and error_page

Error_page is something you’ll see a lot of times. When a 404 is displayed on a website, a friendly page is displayed instead of a 404 NOT FOUND. This is the function of error_page.

Syntax: error_page code ... [=[response]] uri;
Default: —
Context: http, server, location, if in locationCopy the code

Let’s look at a few examples:

1. error_page 404 /404.html; 
2. error_page 500 502 503 504 /50x.html;
3. error_page 404 =200 /empty.gif; 
4. error_page 404 = /404.php; 
5. location / { 
       error_page 404 = @fallback; 
   } 
   location @fallback { 
       proxy_pass http://backend; 
   } 
6. error_page 403 http://example.com/forbidden.html; 
7. error_page 404 =301 http://example.com/notfound.html;Copy the code

Now there are two problems. Take a look at the following configuration file:

server {
    server_name ziyang.return.com;
    listen 80;
    root html/;
    error_page 404 /403.html;
    #return 405;
    location / {
        #return 404 "find nothing!";
    }
}Copy the code
  1. When server contains error_page and location has return directives, which one will be executed?
  2. Return directives appear both under the server block and under the Location block. Are they combined?

Let’s verify these two questions in real combat.

In actual combat

  • Add the above configuration to the return.conf configuration file
  • Bind ziyang.return.com to the local IP address in the hosts file on the local host
  • Access a page that does not exist
➜ test_nginx curl ziyang.return.com/text < HTML >< head><title>403 Forbidden</title></head> <body> <center><h1>403 Who < / h1 > < / center > < hr > < center > nginx / 1.17.8 < / center > < / body > < / HTML >Copy the code

As you can see, error_page is in effect and 403 is returned.

What if you open the comment for the return directive under Location?

  • Open thereturnInstruction comment, reload configuration file
  • Revisit the page
➜ test_nginx curl ziyang.return.com/text find nothing! %Copy the code

At this point, the return instruction is executed. The first problem is that when the server contains error_page and the location has a return directive, the return directive is executed.

Let’s take a look at which return under server and which return under location executes.

  • Open theserverreturnComments for the directive, reload configuration file
  • Revisit the page
➜ test_nginx curl ziyang.return.com/text < HTML >< head><title>405 Not Allowed</title></head> <body> <center><h1>405 Not Allowed < / h1 > < / center > < hr > < center > nginx / 1.17.8 < / center > < / body > < / HTML >Copy the code

There are answers to the above two questions:

  1. When server contains error_page and location has return directives, which one will be executed?

    The return directive under location is executed.

  2. Return directives appear both under the server block and under the Location block. Are they combined?

    There is no merge relation. Whichever return instruction is encountered first executes first.

Rewrite instructions

The rewrite directive is used to modify the urls that users pass into Nginx. Rewrite’s directive rule:

Syntax: rewrite regex replacement [flag];
Default: —
Context: server, location, ifCopy the code

Its functions are mainly as follows:

  • willregexThe specified URL is replaced byreplacementThis new URL
    • Regular expressions and variable extraction can be used
  • whenreplacementIf you start with http:// or https:// or $schema, the 302 redirect is returned
  • The replaced URL is processed according to the mode specified by flag
    • The last:replacementThis URL does a new location match
    • Break: The break instruction stops the execution of the current script instruction, equivalent to a separate break instruction
    • Redirect: Returns 302 redirect
    • Permanent: returns a 301 redirect

Order sample

Now we have a directory structure like this:

HTML/first / └ ─ ─ 1. TXT HTML/second / └ ─ ─ 2. TXT HTML/third / └ ─ ─ 3. TXTCopy the code

The configuration file is as follows:

server { listen 80; server_name rewrite.ziyang.com; rewrite_log on; error_log logs/rewrite_error.log notice; root html/; location /first { rewrite /first(.*) /second$1 last; return 200 'first! \n'; } location /second { rewrite /second(.*) /third$1; return 200 'second! \n'; } location /third { return 200 'third! \n'; } location /redirect1 { rewrite /redirect1(.*) $1 permanent; } location /redirect2 { rewrite /redirect2(.*) $1 redirect; } location /redirect3 { rewrite /redirect3(.*) http://rewrite.ziyang.com$1; } location /redirect4 { rewrite /redirect4(.*) http://rewrite.ziyang.com$1 permanent; }}Copy the code

So our question is:

  1. How does the return directive relate to the rewrite directive?
  2. What is returned by calling /first/3. TXT, /second/3. TXT, /third/3. TXT?
  3. What happens if you don’t carry a flag?

With these three questions in mind, let’s do a practical demonstration.

In actual combat

The preparatory work

  • Add the above configuration to the rewrite-.conf configuration file
  • Bind rewrite.ziyang.com to 127.0.0.1 in the hosts file on the host

last flag

Access rewrite.ziyang.com/first/3.txt first of all, the results are as follows:

➜  ~ curl rewrite.ziyang.com/first/3.txt
second!Copy the code

Why is the result second! ? Should be the third! Well, one might wonder. The actual matching steps are as follows:

  • curl rewrite.ziyang.com/first/3.txt
  • Due to therewrite /first(.*) /second$1 last;In the presence of this directive, last means to use the new URL for location matching, so second/3.txt will be matched next
  • After the /second block is matched, the instructions are executed and 200 is returned
  • Note that the URL is also overwritten in the Location block, but the match does not continue because the flag is not specified.

break flag

Rewrite /second(.*) /third$1; Rewrite /second(.*) /third$1 break;

Continue to visit rewrite.ziyang.com/first/3.txt, the results are as follows:

➜  ~ curl rewrite.ziyang.com/first/3.txt
test3%Copy the code

Test3 from the 3. TXT file is returned. The actual matching steps are as follows:

  • curl rewrite.ziyang.com/first/3.txt
  • Due to therewrite /first(.*) /second$1 last;In the presence of this directive, last means to use the new URL for location matching, so second/3.txt will be matched next
  • After the /second block is matched, urls after rewrite continue to be matched because of the break flag
  • Matching/third location

Therefore, this process is actually requested URL rewrite.ziyang.com/third/3.txt, such a natural consequence is the test3. You can also try to visit rewrite.ziyang.com/third/2.txt what will return.

Redirect and permanent flag

There are also 4 locations in the configuration file, you can try to access each location, the result looks like this:

  • Redirect1:301 is returned
  • Redirect2: Returns 302
  • Redirect3: Returns 302
  • Redirect4: Returns 301

rewriteBehavior logging

The main instruction is rewrite_log:

Syntax: rewrite_log on | off;
Default: rewrite_log off; 
Context: http, server, location, ifCopy the code

This command will write the rewrite log to the logs/rewrite_error.log file.

2020/05/06 06:24:05 [notice] 2020/05/06 06:24:05 #0: *25 "/ matches (.*)" matches "/first/3.txt", client: matches "/first/3.txt", server: Rewrite.ziyang.com, request: "GET /first/3. TXT HTTP/1.1", host: "rewrite.ziyang.com" 2020/05/06 06:24:05 [notice] 86959#0: *25 rewritten data: "/second/3.txt", args: "", client: Server: rewrite.ziyang.com, request: "GET /first/3. TXT HTTP/1.1", host: "rewrite.ziyang.com" 2020/05/06 06:24:05 [notice] 86959#0: *25 "/second(.*)" matches "/second/3.txt", client: Server: rewrite.ziyang.com, request: "GET /first/3. TXT HTTP/1.1", host: "rewrite.ziyang.com" 2020/05/06 06:24:05 [notice] 86959#0: *25 rewritten data: "/third/3.txt", args: "", client: 127.0.0.1, server: rewrite.ziyang.com, request: "GET /first/3. TXT HTTP/1.1", host: "rewrite.ziyang.com"Copy the code

If the instructions

The if directive also comes into play at the rewrite stage, and its syntax looks like this:

Syntax: if (condition) { ... }
Default: —
Context: server, locationCopy the code

Its rules are:

  • If condition is true, the instructions in braces are executed. It also follows inheritance rules for value directives (see my previous article on Nginx configuration directives).

So what does a conditional expression for an if directive contain? Its rules are as follows:

  1. Checks if the variable is null or the value is 0
  2. To match a variable to a string, use = or! =
  3. Match variables with regular expressions
    • Case sensitive, ~ or! ~
    • Case insensitive,* or!*
  4. To check whether the file exists, use -f or! -f
  5. To check whether the directory exists, use -d or! -d
  6. To check whether files, directories, or soft links exist, run the -e or! Command. -e
  7. To check if it is an executable, use -x or! -x

Here are some examples:

Rewrite ^(.*)$/ MSIE /$1 break; } if ($http_cookie ~* "id=([^;] (+)? :; | $) ") {# and variable http_cookie matching set $id $1; If ($request_method = POST) {$request_method = POST; } if ($slow) {# slow variable is defined in the map module, and can also match limit_rate 10k; } if ($invalid_referer) { return 403; }Copy the code

Find_config phase

After the rewrite module matches a URL, it enters the find_config phase and starts looking for the location configuration for the URL.

The location instructions

Command syntax

Let’s look at the syntax of the location directive:

Syntax: location [ = | ~ | ~* | ^~ ] uri { ... } location @name { ... } Default: - the Context: server, the location Syntax: merge_slashes on | off; Default: merge_slashes on; Context: http, serverCopy the code

There is a merge_slashes directive that combines two duplicate slashes in a URL into one. This directive is turned on by default and only needs to be turned off for base64 or other encoding of the URL.

Match rule

Location matches only URIs, ignoring parameters. There are three big cases:

  • Prefix string
    • The conventional matching
    • = : Exact match
    • ^~ : The regular expression will not be matched
  • Regular expression
    • ~ : case sensitive regular match
    • ~* : case insensitive
  • The name of the user’s internal jump is Location
    • @

These rules may seem confusing at first, but let’s look at a few examples.

In actual combat

Take a look at the Nginx configuration file:

server { listen 80; server_name location.ziyang.com; error_log logs/error.log debug; #root html/; default_type text/plain; merge_slashes off; location ~ /Test1/$ { return 200 'first regular expressions match! \n'; } location ~* /Test1/(\w+)$ { return 200 'longest regular expressions match! \n'; } location ^~ /Test1/ { return 200 'stop regular expressions match! \n'; } location /Test1/Test2 { return 200 'longest prefix string match! \n'; } location /Test1 { return 200 'prefix string match! \n'; } location = /Test1 { return 200 'exact match! \n'; }}Copy the code

The question is, what does each of the following urls return?

/Test1
/Test1/
/Test1/Test2
/Test1/Test2/
/test1/Test2Copy the code

For example, when accessing /Test1, several parts will match:

  1. Regular prefix matches: location /Test1
  2. Exact match: location = /Test1

When accessing /Test1/, several parts will also match:

  1. location ~ /Test1/$
  2. location ^~ /Test1/

So which one will it match? Nginx actually follows a set of rules, as shown below:

All prefix strings are placed in a binary tree, and Nginx matches them in two parts:

  1. The longest prefix string is selected. If the string is an exact match of = or a prefix match of ^~, it will be used directly
  2. If = or ^~ is not matched in the first step, the longest matched prefix string location is remembered first
  3. Match regular expressions as configured in the nginx.conf file
  4. If all regular expressions do not match, the prefix string with the longest match is used

Let’s see what the actual response looks like:

➜ test_nginx curl location.ziyang.com/Test1 exact match! ➜ test_nginx curl location.ziyang.com/Test1/ stop regular expressions to match! ➜ test_nginx curl location.ziyang.com/Test1/Test2 longest regular expressions to match! ➜ test_nginx curl location.ziyang.com/Test1/Test2/ longest prefix string match! ➜ test_nginx curl location.ziyang.com/Test1/Test3 stop regular expressions to match!Copy the code
  • /Test1 matches location = /Test1
  • /Test1/ matches location ^~ /Test1/
  • /Test1/Test2 matches location ~* /Test1/(\w+)$
  • /Test1/Test2/ matches location /Test1/Test2
  • /Test1/Test3 matches location ^~ /Test1/

/Test1/Test3

  1. Iterate over all the prefix strings that can be matched. There are two of them
    • ^~ /Test1/
    • /Test1
  2. Select the longest prefix string /Test1/, and use the location ^~ /Test1/ rule because ^~ prevents regular expression matching
  3. returnstop regular expressions match!

Preaccess phase

This brings us to the Preaccess phase. One question we often encounter is how to limit the number of concurrent connections per client. How to limit access frequency? This is done in the Preaccess phase, which, as the name implies, is before the connection. Take a look at the limit_CONN module.

Limit_conn module

The ngx_HTTP_limit_conn_module module has the following basic features:

  • Effective stage:NGX_HTTP_PREACCESS_PHASEphase
  • Module:http_limit_conn_module
  • Default to compile into Nginx, pass--without-http_limit_conn_moduledisable
  • Effective range
    • All worker processes (based on shared memory)
    • The preaccess phase does not take effect
    • The effectiveness of the restriction depends on the design of the key: the realip module relies on the postread stage to fetch the realip

The limit_conn key is set to limit the value of the user’s real IP address.

Having said the limit_conn module, let’s talk about the instruction syntax.

Command syntax

  • Define shared memory, including size, and the key keyword
Syntax: limit_conn_zone key zone=name:size; Default: - the Context: HTTPCopy the code
  • Limit the number of concurrent connections
Syntax: limit_conn zone number;
Default: —
Context: http, server, locationCopy the code
  • Limit the level of logging at which it occurs
Syntax: limit_conn_log_level info | notice | warn | error;
Default: limit_conn_log_level error; 
Context: http, server, locationCopy the code
  • Limits the error code returned to the client when it occurs
Syntax: limit_conn_status code;
Default: limit_conn_status 503; 
Context: http, server, locationCopy the code

In actual combat

Let’s see how these instructions work with a practical example.

As usual, configuration file first:

limit_conn_zone $binary_remote_addr zone=addr:10m; #limit_req_zone $binary_remote_addr zone=one:10m rate=2r/m; server { listen 80; server_name limit.ziyang.com; root html/; error_log logs/myerror.log info; location /{ limit_conn_status 500; limit_conn_log_level warn; limit_rate 50; limit_conn addr 1; #limit_req zone=one burst=3 nodelay; #limit_req zone=one; }}Copy the code
  • Set limit.ziyang.com to the local IP address in the local hosts file

In this configuration file, there are two limits: limit_rate is limited to 50 bytes and the number of concurrent connections limit_CONN is limited to 1.

➜  test_nginx curl limit.ziyang.comCopy the code

Visit the limit.ziyang.com site at this point, and you’ll find that the speed is very slow, with only 50 bytes per second.

If you visit the same site at the same time, 500 is returned.

I accessed simultaneously from another terminal:

➜ ~ curl limit.ziyang.com < HTML >< head><title>500 Internal Server Error</title></head> <body> <center><h1>500 Internal Server Error</h1></center> <hr><center>nginx/1.17.8</center> </body> </ HTML >Copy the code

As you can see, Nginx returns 500 directly.

Limit_req module

We raised two questions at the beginning of this section:

  • How do I limit the number of concurrent connections per client?

  • How to limit access frequency?

The first problem limiting the number of concurrent connections has been solved, so let’s look at the second problem.

The ngx_HTTP_limit_req_module has the following basic features:

  • Effective stage:NGX_HTTP_PREACCESS_PHASEphase
  • Module:http_limit_req_module
  • Default to compile into Nginx, pass--without-http_limit_req_moduledisable
  • Effective algorithm: Leaky bucket algorithm
  • Effective range
    • All worker processes (based on shared memory)
    • The preaccess phase does not take effect

Leaky bucket algorithm

Leaky bucket is called the Leaky bucket algorithm. Other methods used to limit the request rate include the token-ring algorithm, which is not described here.

The principle of the leaky bucket algorithm is to define the size of a bucket first, all requests into the bucket will be processed at a constant rate, if too many requests exceed the bucket capacity, it will immediately return an error. Let me illustrate it with a picture.

In this picture, the faucet is dripping continuously, just like the request sent by the user, all the water drops are discharged at a constant rate, that is, to be processed. Leaky bucket algorithm has a good effect on limiting burst traffic, and will process all requests smoothly.

Command syntax

  • Defines shared memory, including size, as well as the key keyword and limiting rate
Syntax: limit_req_zone key zone=name:size rate=rate ; Default: - the Context: HTTPCopy the code

Rate: r/s or r/m (how many requests are processed per minute or per second)

  • Limit the number of concurrent connections
Syntax: limit_req zone=name [burst=number] [nodelay];
Default: —
Context: http, server, locationCopy the code
  • Burst defaults to 0
  • Nodelay, if set, will return an error immediately for requests in the leaky bucket
  • Limit the level of logging at which it occurs
Syntax: limit_req_log_level info | notice | warn | error;
Default: limit_req_log_level error; 
Context: http, server, locationCopy the code
  • Limits the error code returned to the client when it occurs
Syntax: limit_req_status code;
Default: limit_req_status 503; 
Context: http, server, locationCopy the code

In actual combat

Before the actual verification, two issues need to be noted:

  • If the limit_REq and limit_CONN configurations take effect at the same time, which one has a higher priority?
  • What’s the difference if nodelay is added or not?

Add the configuration file, which is the same configuration file as the one in the previous section with a comment:

limit_conn_zone $binary_remote_addr zone=addr:10m; limit_req_zone $binary_remote_addr zone=one:10m rate=2r/m; server { listen 80; server_name limit.ziyang.com; root html/; error_log logs/myerror.log info; location /{ limit_conn_status 500; limit_conn_log_level warn; #limit_rate 50; #limit_conn addr 1; #limit_req zone=one burst=3 nodelay; limit_req zone=one; }}Copy the code

Conclusion: With the limit_req zone=one command, 503 is returned immediately after the number of requests processed per minute is exceeded.

➜ test_nginx curl limit.ziyang.com < HTML >< head><title> Service Temporarily Unavailable</title></head> <body> <center><h1>503 Service Temporarily Unavailable</h1></center> <hr><center>nginx/1.17.8</center> </body> </ HTML >Copy the code

Change the comment directive:

limit_req zone=one burst=3;
#limit_req zone=one;Copy the code

When the Burst parameter is not added, an error is returned immediately, and when it is added, it does not return an error, but waits for the request restriction to be lifted until the request can be processed.

Look again at the nodelay parameter:

limit_req zone=one burst=3 nodelay;Copy the code

With nodelay, requests can be processed and returned immediately until the burst limit is reached, and 503 will be returned when the burst limit is exceeded.

Now we can answer our first two questions:

  • If the limit_REq and limit_CONN configurations take effect at the same time, which one has a higher priority?
    • Limit_req is processed before limit_CONN, so limit_REQ takes effect
  • What’s the difference if nodelay is added or not?
    • Without nodelay, requests wait until they can be processed; Add nodelay, which will process and return immediately if the burst limit is not exceeded, or 503 if it is exceeded.

The access stage

After limiting traffic for users in the PreAccess phase, the Access phase is reached.

The access module

The module involved is ngx_HTTP_access_module, which has the following basic features:

  • Effective stage:NGX_HTTP_ACCESS_PHASEphase
  • Module:http_access_module
  • Default to compile into Nginx, pass--without-http_access_moduledisable
  • Effective range
    • It does not take effect before entering the Access phase

Command syntax

Syntax: allow address | CIDR | unix: | all; Default: - the Context: HTTP server, the location, limit_except Syntax: deny address | CIDR | Unix: | all; Default: - Context: HTTP, server, location, limit_exceptCopy the code

The access module provides two directives, allow and deny. Here are some examples:

Location / {deny 192.168.1.1; Allow 192.168.1.0/24; Allow 10.1.1.0/16; allow 2001:0db8::/32; deny all; }Copy the code

For user access, these instructions are executed sequentially, and when one is satisfied, they are not executed further down. This module is relatively simple, we don’t do the actual practice here.

Auth_basic module

The auth_BASIC module is used for user authentication. When you enable this module, you will be returned with an 401 Unauthorized statement when accessing a web site through your browser. The 401 statement will not be visible to users and a dialog box will be displayed asking for a user name and password. This module uses the definition in RFC2617.

Command syntax

  • The user password is authenticated based on HTTP Basic Authutication
  • Nginx is compiled by default
    • –without-http_auth_basic_module
    • disable ngx_http_auth_basic_module
Syntax: auth_basic string | off;
Default: auth_basic off; 
Context: http, server, location, limit_except

Syntax: auth_basic_user_file file;
Default: —
Context: http, server, location, limit_exceptCopy the code

We’ll use a tool called htpasswd that will generate the password file, and auth_basic_user_file will rely on that.

Htpasswd depends on the installation package httpd-tools

To generate a password, run the following command:

Htpasswd -c file -b user passCopy the code

The generated password file is in the following format:

# comment 
name1:password1 
name2:password2:comment 
name3:password3Copy the code

In actual combat

  • Generate the password file auth.pass in the example directory
htpasswd -bc auth.pass ziyang 123456Copy the code
  • Adding a Configuration File
server { server_name access.ziyang.com; listen 80; error_log logs/error.log debug; default_type text/plain; location /auth_basic { satisfy any; auth_basic "test auth_basic"; auth_basic_user_file example/auth.pass; deny all; }}Copy the code
  • Reload the Nginx configuration file
  • Add access.ziyang.com to the /etc/hosts file

If you visit access.ziyang.com, a dialog box will pop up asking you to enter your password:

Auth_request module

  • Run the following command to forward a request to the upstream service: If the response code returned by the upstream service is 2XX, the request continues; if the response code returned by the upstream service is 2XX, the request continues; if the response code returned by the upstream service is 401 or 403, the request is returned to the client
  • Principle: After receiving a request, a sub-request is generated and forwarded to the upstream service through reverse proxy technology
  • –with-http_auth_request_module is not compiled into Nginx by default

Command syntax

Syntax: auth_request uri | off;
Default: auth_request off; 
Context: http, server, location

Syntax: auth_request_set $variable value;
Default: —
Context: http, server, locationCopy the code

In actual combat

  • Add the following to the previous configuration file
server { server_name access.ziyang.com; listen 80; error_log logs/error.log debug; #root html/; default_type text/plain; location /auth_basic { satisfy any; auth_basic "test auth_basic"; auth_basic_user_file example/auth.pass; deny all; } location / { auth_request /test_auth; } location = / test_auth {proxy_pass http://127.0.0.1:8090/auth_upstream; proxy_pass_request_body off; proxy_set_header Content-Length ""; proxy_set_header X-Original-URI $request_uri; }}Copy the code
  • In this configuration file, the/path will forward the request to another service, which can be built using Nginx
  • If the service returns 2xx, authentication succeeds; if 401 or 403 is returned, authentication fails

Restrict SATISFY directives for all access phase modules

Command syntax

Syntax: satisfy all | any;
Default: satisfy all; 
Context: http, server, locationCopy the code

The SATISFY directive has two values — all and any — and this module applies to all three modules of the ACCES stage:

  • The access module
  • Auth_basic module
  • Auth_request module
  • Other modules

If the value of satisfy instruction is all, it means that all modules in the ACCESS stage must be executed and passed before release. A value of any indicates that any module is executed.

Here are a few questions to help you understand:

  1. Does the Access phase take effect if there is a return directive?

    The return directive is in the rewrite stage, before the Access stage, and therefore does not take effect.

  2. Does the order of multiple Access modules matter?

    ngx_http_auth_request_module,
    ngx_http_auth_basic_module,
    ngx_http_access_module,Copy the code

    Have an impact on

  3. Enter the correct password, can the following access to the file?

    location /{
        satisfy any;
        auth_basic "test auth_basic";
        auth_basic_user_file examples/auth.pass;
        deny all;
    }Copy the code

    Can be accessed because satisfy is value any, so it can be released as long as any module is satisfied.

  4. What if I put deny all before auth_basic?

    Yes, because the order of execution of each module is independent of the order of instructions.

  5. If you change to Allow all, do you have a chance to enter your password?

    There is no chance, because Allow All is an Access module that executes before the auth_BASIC module.

Precontent phase

With that said, let’s review the 11 stages in which Nginx processes HTTP requests:

Now we are in the PreContent phase, where there is only one instruction, try_files.

Try_files module

Command syntax

Syntax: try_files file ... uri;
        try_files file ... =code;
Default: —
Context: server, locationCopy the code
  • Module:ngx_http_try_files_moduleThe module
  • Attempts to access files corresponding to multiple urls (specified by root or alias directives) in sequence. Returns the contents of the files if they exist, or the last URL result or code if none exists

In actual combat

Let’s look at an example in action:

server {
    server_name tryfiles.ziyang.com;
    listen 80;
    error_log  logs/myerror.log  info;
    root html/;
    default_type text/plain;
    location /first {
        try_files /system/maintenance.html
            $uri $uri/index.html $uri.html
            @lasturl;
    }
    location @lasturl {
        return 200 'lasturl!\n';
    }
    location /second {
        try_files $uri $uri/index.html $uri.html =404;
    }
}Copy the code

The results are as follows:

  • Call /first and actually get to lastURL and return 200
  • A call to /second returns 404

Both results are consistent with the configuration file.

➜ test_nginx curl tryfiles.ziyang.com/second < HTML > < head > < title > 404 Not Found < / title > < / head > < body > < center > < h1 > 404 Not Found < / h1 > < / center > < hr > < center > nginx / 1.17.8 < / center > < / body > < / HTML > ➜ test_nginx curl tryfiles.ziyang.com/first lasturl!Copy the code

Mirror module

The Mirror module copies traffic in real time, which is useful for requests that require simultaneous access to multiple environments.

Command syntax

  • Module:ngx_http_mirror_moduleModule, compiled into Nginx by default
    • Remove the module with –without-http_mirror_module
  • Function: When a request is processed, a subrequest is generated to access other services. The return value of the subrequest is not processed
Syntax: mirror uri | off;
Default: mirror off; 
Context: http, server, location

Syntax: mirror_request_body on | off;
Default: mirror_request_body on; 
Context: http, server, locationCopy the code

In actual combat

  • The configuration file is shown below, and you need to start another Nginx to receive requests
server { server_name mirror.ziyang.com; listen 8001; error_log logs/error_log debug; location / { mirror /mirror; mirror_request_body off; } location = /mirror { internal; Proxy_pass http://127.0.0.1:10020$request_uri; proxy_pass_request_body off; proxy_set_header Content-Length ""; proxy_set_header X-Original-URI $request_uri; }}Copy the code
  • You can see the request logging in the access.log file

The content stage

Let’s start with the Content phase and look at the static module in the Content phase. Although this is the last processing module in the Content phase, it is introduced here.

The static module

Root and alias directives

Start with the root and alias directives, both of which map file paths.

Syntax: alias path; Default: - the Context: the locationCopy the code
Syntax: root path;
Default: root html; 
Context: http, server, location, if in locationCopy the code
  • Function: Maps urls to file paths to return static file content
  • Difference: Root maps the full URL to the file path, alias only maps the URL after location to the file path

In actual combat

Here’s a question:

Now there is a file path:

HTML/first / └ ─ ─ 1. TXTCopy the code

The configuration file is as follows:

server { server_name static.ziyang.com; listen 80; error_log logs/myerror.log info; location /root { root html; } location /alias { alias html; } location ~ /root/(\w+\.txt) { root html/first/$1; } location ~ /alias/(\w+\.txt) { alias html/first/$1; } location /RealPath/ { alias html/realpath/; return 200 '$request_filename:$document_root:$realpath_root\n'; }}Copy the code

What response do I get when I visit the following URL?

/root
/alias
/root/1.txt
/alias/1.txtCopy the code
➜ test_nginx curl static.ziyang.com/alias/1.txt test1% ➜ test_nginx curl static.ziyang.com/alias/ <! DOCTYPE html> <html> <head> <title>Welcome to nginx! </title> ... ➜ test_nginx curl static.ziyang.com/root/ < HTML >< head><title>404 Not Found</title></head> <body> <center><h1>404 Not Found < / h1 > < / center > < hr > < center > nginx / 1.17.8 < / center > < / body > < / HTML > ➜ test_nginx curl static.ziyang.com/root/1.txt <html> <head><title>404 Not Found</title></head> <body> <center><h1>404 Not Found</h1></center> < hr > < center > nginx / 1.17.8 < / center > < / body > < / HTML >Copy the code

Accessing these four paths yields the following results:

  • / root: 404
  • / alias: 200
  • / root / 1. TXT: 404
  • / alias / 1. TXT: 200

Why is that? When mapping a URL, root adds the path from location to the URL.

  • static.ziyang.com/root/The actual access ishtml/root/
  • static.ziyang.com/root/1.txtThe actual ishtml/first/1.txt/root/1.txt
  • static.ziyang.com/alias/Actually, I got it righthtmlFolder, because of the back/The presence of the actual access is thereforehtml/index.html
  • static.ziyang.com/alias/1.txtThe actual access ishtml/first/1.txtFile exists

Three related variables

The same configuration file as above:

location  /RealPath/ {
    alias html/realpath/;
    return 200 '$request_filename:$document_root:$realpath_root\n';
}Copy the code

Here’s a question: what are the values of each of these variables when accessing /RealPath/1.txt?

To answer this question, let’s explain three variables:

  • Request_filename: specifies the complete path of the file to be accessed
  • Document_root: Folder paths (possibly containing soft links) generated by URI and root/alias directives
  • Realpath_root: Replace the soft link in document_root with the realpath

To verify these three variables, create a soft link in the HTML directory to the first folder:

ln -s first realpathCopy the code
➜ HTML curl static.ziyang.com/realpath/1.txt /Users/mtdp/myproject/nginx/test_nginx/html/realpath/1.txt:/Users/mtdp/myproject/nginx/test_nginx/html/realpath/:/Users/ mtdp/myproject/nginx/test_nginx/html/firstCopy the code

As can be seen, the three paths are:

  • /Users/mtdp/myproject/nginx/test_nginx/html/realpath/1.txt
  • /Users/mtdp/myproject/nginx/test_nginx/html/realpath/
  • /Users/mtdp/myproject/nginx/test_nginx/html/first

There are other configuration instructions, such as:

Content-type when a static file is returned

Syntax: types { ... }
Default: types { text/html html; image/gif gif; image/jpeg jpg; } 
Context: http, server, location

Syntax: default_type mime-type;
Default: default_type text/plain; 
Context: http, server, location

Syntax: types_hash_bucket_size size;
Default: types_hash_bucket_size 64; 
Context: http, server, location

Syntax: types_hash_max_size size;
Default: types_hash_max_size 1024; 
Context: http, server, locationCopy the code

Error log when a file was not found

Syntax: log_not_found on | off;
Default: log_not_found on; 
Context: http, server, locationCopy the code

In a production environment, it is often possible that a file cannot be found and will be printed in the error log:

[error] 10156#0: *10723 open() "/html/first/2.txt/root/2.txt" failed (2: No such file or directory)Copy the code

If you don’t want to log, you can turn it off.

Redirect redirect domain name

The static module returns a 301 redirect when we visit a directory without a/at the end.

# the directive when decided to redirect the domain name, which can be decided to return to the domain name Syntax: server_name_in_redirect on | off; Default: server_name_in_redirect off; Context: HTTP server, the location # port when the command decided to redirect the Syntax: port_in_redirect on | off; Default: port_in_redirect on; Context: HTTP server, the location # the directive to decide whether to fill in the domain name, the default is open, is also returns the absolute path Syntax: absolute_redirect on | off; Default: absolute_redirect on; Context: http, server, locationCopy the code

To demonstrate the three commands in action, take a look at the configuration file:

server {
    server_name return.ziyang.com dir.ziyang.com;
    server_name_in_redirect on;
    listen 8088;
    port_in_redirect on;
    absolute_redirect off;

    root html/;
}Copy the code

Absolute_redirect is on by default, we turn it off and see how it returns:

➜  test_nginx curl localhost:8088/first -I
HTTP/1.1 301 Moved Permanently
Server: nginx/1.17.8
Date: Tue, 12 May 2020 00:31:36 GMT
Content-Type: text/html
Content-Length: 169
Connection: keep-alive
Location: /first/Copy the code

The return header Location does not contain the domain name.

Absolute_redirect turn on the absolute_redirect and see what is returned:

  • absolute_redirect on
  • server_name_in_redirect on
  • port_in_redirect on
➜  test_nginx curl localhost:8088/first -I
HTTP/1.1 301 Moved Permanently
Server: nginx/1.17.8
Date: Tue, 12 May 2020 00:35:49 GMT
Content-Type: text/html
Content-Length: 169
Location: http://return.ziyang.com:8088/first/
Connection: keep-aliveCopy the code

As you can see, this time is returned to the domain name, and returns the we configure the main domain name and port number, this is because the server_name_in_redirect and port_in_redirect this two instructions opened, if turn off the two instructions, see what returns:

  • absolute_redirect on

  • server_name_in_redirect off

  • port_in_redirect off

➜  test_nginx curl localhost:8088/first -I
HTTP/1.1 301 Moved Permanently
Server: nginx/1.17.8
Date: Tue, 12 May 2020 00:39:31 GMT
Content-Type: text/html
Content-Length: 169
Location: http://localhost/first/
Connection: keep-aliveCopy the code

When both directives are set to off, instead of returning the main domain and port number, we return the requested domain name and port number. If we put Host in the request header, we will use the domain name in the request header.

The index module

  • Module: ngx_http_index_module

  • Function: Returns the contents of index file when a directory is accessed at the end of specified /

  • Grammar:

    Syntax: index file ... ; Default: index index.html; Context: http, server, locationCopy the code
  • Executes before the autoindex module

This module, when accessing a directory ending in /, looks for index.html in the folder of the root or alias directive, returns the contents of the file if it exists, and optionally specifies other files.

Autoindex module

  • Module: ngx_http_autoindex_module, compiled into Nginx by default, cancelled with –without-http_autoindex_module

  • Run the following command to return the directory structure of the root/alias directory in HTML/XML /json/jsonp format when the URL ends in /

  • Grammar:

    Open or close # Syntax: autoindex on | off; Default: autoindex off; Context: HTTP server, the location # when output in HTML format, converting control whether KB/MB/GB Syntax: autoindex_exact_size on | off; Default: autoindex_exact_size on; Context: HTTP server, the location # control output Syntax in which format: XML autoindex_format HTML | | json | json. Default: autoindex_format html; Context: HTTP server, whether the location # control displayed in local time format or UTC format Syntax: autoindex_localtime on | off; Default: autoindex_localtime off; Context: http, server, locationCopy the code

In actual combat

  • The configuration file is as follows:
server { server_name autoindex.ziyang.com; listen 8080; location / { alias html/; autoindex on; #index b.html; autoindex_exact_size on; autoindex_format html; autoindex_localtime on; }}Copy the code

Here I have commented out the index b.html directive, the index module is compiled into Nginx by default, and the default directive is index index.html, so I will look to see if there is an index.

  • Open a browser, visit autoindex.ziyang.com: 8080, HTML directory. The default is the index of HTML file, and therefore displays the results as follows:

  • Open theindex b.htmlInstruction comments. Since there is no B.HTML file in the HTML folder, the request goes to the autoindex module and displays the directory:

The following file size display format is autoindex_exact_size on; This is the order.

Concat module

Here is a module that can improve the performance of small files, developed by Alibaba and widely used in Taobao.

  • Module: ngx_http_concat_module

  • Module developer: Tengine(github.com/alibaba/ngi… –add-module=.. /nginx-http-concat/

  • Function: Merge multiple small file requests to significantly improve the performance of HTTP requests

  • Instructions:

    # add?? To the URI ", "Split file, if there are parameters, pass at the end? Add parameters concat on | off the default HTTP, concat off the Context server, the location concat_types MIME types default concat_types: text/css application/x-javascript Context http, server, location concat_unique on | off Default concat_unique on Context http, server, location concat_max_files numberp Default concat_max_files 10 Context http, server, location concat_delimiter string Default NONE Context http, server, locatione concat_ignore_file_error on | off Default off Context http, server, locationCopy the code

Open taobao home page, you will find that small files are through this module to improve performance:

If you are interested in compiling this module, you can do some experiments. I will put the configuration file here:

server { server_name concat.ziyang.com; error_log logs/myerror.log debug; concat on; root html; location /concat { concat_max_files 20; concat_types text/plain; concat_unique on; concat_delimiter ':::'; concat_ignore_file_error on; }}Copy the code

The log phase

Finally, we come to the last of the 11 phases, the log module that logs the request access.

  • Run the following command to log HTTP request information
  • Module:ngx_http_log_module, cannot be disabled

Access Log format

Syntax: log_format name [escape=default|json|none] string ... ; Default: log_format combined "..." ; Context: httpCopy the code

Default combined log format:

log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer"  "$http_user_agent"';Copy the code

Configure the log file path

Syntax: access_log path [format [buffer=size] [gzip[=level]] [flush=time] [if=condition]];
        access_log off;
Default: access_log logs/access.log combined; 
Context: http, server, location, if in location, limit_exceptCopy the code
  • Path The path can contain variables: If the cache is not enabled, the log file must be opened or closed for each log

  • If controls whether requests are logged or not by variable values

  • The log buffer

    • Run the following command to write logs in the memory to disks in batches

    • Conditions for writing to disk:

      All logs to be written to disk exceed the cache size;

      Reaches the expiration date specified by Flush;

      The worker process executes the reopen command, or is closing.

  • Log compression

    • Run the following command to compress logs in the memory in batches and write logs to disks
    • The default buffer size is 64KB
    • The default compression level is 1 (1 has the highest compression rate and 9 has the lowest compression rate).
    • When log compression is enabled, log caching is enabled by default

Optimizations for log file names that contain variables

Syntax: open_log_file_cache max=N [inactive=time] [min_uses=N] [valid=time];
        open_log_file_cache off;
Default: open_log_file_cache off; 
Context: http, server, locationCopy the code
  • Max: the maximum number of file handles in the cache, which will be eliminated by LRU algorithm
  • Inactive: Indicates that the file will not be closed within the period of time after it is accessed. The default 10 seconds
  • Min_uses: Memory will continue to exist only after min_uses are used more times in inactive time. The default 1
  • Valid: After the valid time expires, the system checks whether the cached log file exists. The default 60 seconds
  • Off: Disables the cache function

The log module is not used.


At this point, we’ve combed through all 11 phases of Nginx processing HTTP requests, with modules for each phase. I believe that for such a whole process of analysis, we can understand the configuration of Nginx, on this, but also according to the needs of flexible configuration of their own configuration, so that the real master 11 stages.

Finally, welcome to my personal blog: iziyang.github. IO


This post was first published on my blog: iziyang.github. IO