The original link

Nginx recently encountered some difficult to troubleshoot 502 and 504 errors, by the way, learn about nginx related configuration. I have found that many descriptions of nginx timeout configurations on the Web simply list the meanings and values of these configurations, without explaining what triggers which configuration. So take this opportunity to demonstrate how to make Nginx appear 502 and 504 correctly as expected.

502 and 504

In the definition of HTTP status:

  • 502 Bad Gateway: The server was acting as a gateway or proxy and received an invalid response from the upstream server.
  • 504: he server was acting as a gateway or proxy and did not receive a timely response from the upstream server.

Error 502 is caused by Bad Gateway, which is usually caused by the failure of upstream service. Nginx access to the upstream service timed out. But in some cases, an upstream service timeout (triggering TCP reset) can also cause 502, which we’ll discuss later.

The demo environment

You need 3 logical components: nginx server, PHP-FPM, client access client. 3 components can be in the same machine, I used Docker to configure PHP and Nginx environment, access from the host. If you are familiar with these three components, skip this section. It’s very convenient to do all kinds of tests and experiments with Docker, so I won’t go into that here. Docker-compose configuration is referenced in this article. My Docker Composer file is as follows:

version: '3'
services:
  web:
    image: nginx:alpine
    ports:
      - "8080:80"
    volumes:
      - ./code:/code
      - ./nginx/site.conf:/etc/nginx/conf.d/site.conf
    depends_on:
      - php
  php:
    image: PHP: 7.1 - FPM - alpine
    volumes:
      - ./code:/code
      - ./php/php-fpm.conf:/usr/local/etc/php-fpm.conf
Copy the code

The images used are based on Alpine and very small:

REPOSITORY TAG SIZE PHP 7.1- FPM -alpin 69.5MB nginx alpine 18.6MBCopy the code

Nginx configuration:

server {
  index index.php index.html;
  server_name php-docker.local;
  error_log  /var/log/nginx/error.log;
  access_log /var/log/nginx/access.log;
  root /code;

  location ~ \.php$ {
    try_files $uri =404;
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_index index.php;
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_param PATH_INFO $fastcgi_path_info;
    fastcgi_connect_timeout 5s;
    fastcgi_read_timeout 8s;
    fastcgi_send_timeout 10s; }}Copy the code

PHP – FPM configuration

[global]
include=etc/php-fpm.d/*.conf
request_terminate_timeout=3s
Copy the code

The code is on Github.

The key parameters

In this demo, there are two key arguments to PHP. One is the MAX_execution_time of the PHP script, which is configured in php.ini. The other is the request_terminate_timeout of php-fpm, in php-fpm.conf. When the phP-fpm service is provided, the request_TERMINate_TIMEOUT setting overrides the max_execution_time setting, so we only test request_terminate_timeout here.

Request_terminate_timeout specifies the timeout period for php-Fpm to accept the request. After this timeout, php-Fpm will kill the worker process executing the script.

The key parameters of nginx are fastCGI_connect_timeout, fastcgi_read_timeout, fastcgi_send_timeout.

Nginx fastcgi_connect_timeout specifies how long nginx has to connect to fastCGI. Fastcgi_read_timeout is the timeout for nginx to read fastCGI content. Fastcgi_send_timeout is the timeout for Nginx to send content to FastCGI.

The demo process

Start nginx and PHP first:

docker-compose up
Copy the code

Add an index. PHP file to the code folder:


      
sleep(70);
echo 'hello world';
Copy the code

The upstream service actively resets

Access PHP – docker. Local: 8080 / index. PHP, an error 502 bad gateway. Request_terminate_timeout was triggered, and phP-fpm closed the connection.

By observing the ps aux | grep PHP can be found that PHP – FPM is through kill timeout process to solve the problem of process timeout (pid every time there is a change, shows a process to kill, and start another process. This is related to the php-fpm process pool Settings, your Settings may not restart a new process).

/var/www/html # ps aux | grep php
    1 root       0:00 php-fpm: master process (/usr/local/etc/php-fpm.conf)
    6 www-data   0:00 php-fpm: pool www
    7 www-data   0:00 php-fpm: pool www
/var/www/html # ps aux | grep php
    1 root       0:00 php-fpm: master process (/usr/local/etc/php-fpm.conf)
    7 www-data   0:00 php-fpm: pool www
   17 www-data   0:00 php-fpm: pool www
/var/www/html # ps aux | grep php
    1 root       0:00 php-fpm: master process (/usr/local/etc/php-fpm.conf)
   17 www-data   0:00 php-fpm: pool www
   20 www-data   0:00 php-fpm: pool www
Copy the code

In this case, the error in the nginx log is:

recv() failed (104: Connection reset by peer) while reading response header from upstream
Copy the code

This makes sense when the connection is reset by the server (PHP).

Note that in this case the phP-fPM log will also record:

php_1  | [18-Jul-2018 16:33:42] WARNING: [pool www] child 5, script '/code/index.php' (request: "GET /index.php") execution timed out (3.040130 SEC), terminating php_1 | 16:33:42] [18 - Jul - 2018 WARNING: [pool WWW] child exited on 15 (SIGTERM signal after 30.035736 seconds from the start php_1 | 16:33:42] [18 - Jul - 2018 NOTICE: [pool www] child 8 startedCopy the code

This is also a place where problems can be found.

Nginx timed out reading the upstream service

Delete the request_terminate_timeout configuration and restart the application:

docker-compose down && docker-compose up
Copy the code

The PHP script will execute 70s and must exceed nginx timeout. 8s later, a 504 Gateway time-out error is raised.

upstream timed out (110: Operation timed out) while reading response header from upstream
Copy the code

The fastcGI_read_timeout setting is triggered.

Disabling the Upstream Service

Turn off the PHP service:

docker-composer stop php
Copy the code

Error 504 on first access after PHP service stopped

upstream timed out (110: Operation timed out) while connecting to upstream
Copy the code

The timeout is set to fastcGI_CONNECt_TIMEOUT. At this time, the TCP connection is still available, but the attempt fails.

Error 502 is returned with the following error:

connect() failed (113: Host is unreachable) while connecting to upstream
Copy the code

The reason of “502” is easy to understand. The upstream service was suspended. At the same time, the connection was disconnected because the previous access found that the connection could not be connected, and the host could not be found when the connection was connected again.

I had suspected that the first access to 504 was due to Keepalive. But it took me a long time after I stopped PHP to send my first request, and I still get this result.

If you set nginx fastcgi_pass to 127.0.0.1:9000 (there is no local port for this), error 502 will be immediately thrown:

connect() failed (111: Connection refused) while connecting to upstream
Copy the code

Log in to the nginx service and use tcpdump to monitor traffic on 9000:

tcpdump -i eth0 -nnA tcp port 9000
# if your PHP is local, change eth0 to lo
Copy the code

Nginx will try to make several TCP SYN requests to PHP for the first time after PHP is shut down, but PHP will not respond, so nginx will return 504. On the second visit, nginx does not make any request at all, but simply 502. Nginx: configuration file /etc/nginx/nginx.conf test failed.

Switch to a different configuration

This article mentioned that our previous nginx configuration was not reasonable [^1], we reset nginx:

server {
  index index.php index.html;
  server_name php-docker.local;
  error_log  /var/log/nginx/error.log;
  access_log /var/log/nginx/access.log;
  root /code;
  resolver 127.0.0.11;  # here
  location ~ \.php$ {
    set $upstream php:9000; # here
    try_files $uri =404;
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    fastcgi_pass $upstream;  # here
    fastcgi_index index.php;
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_param PATH_INFO $fastcgi_path_info;
    fastcgi_connect_timeout 5s;
    fastcgi_read_timeout 8s;
    fastcgi_send_timeout 10s; }}Copy the code

127.0.0.11 is the Intranet DNS resolver of the Docker. This configuration dynamically specifies the FastCGI pass, so nginx does not check whether the connection can be established.

To start with this configuration, visit index.php to establish a connection, and then close PHP as follows:

During keepalive, a 504 fastcGI_connect_TIMEOUT error is thrown:

upstream timed out (110: Operation timed out) while connecting to upstream
Copy the code

Error 502 is raised after keepalive is disconnected.

connect() failed (113: Host is unreachable) while connecting to upstream
Copy the code

According to this article, this configuration is not considered a problem by Nginx, and executing nginx-t is. Nginx sends the SYN to upstream for a period of time (504 status code) and then stops sending TCP packets (502 status code).

other

In addition, PHP scripts also have a timeout setting: max_execution_time. It limits the execution time of PHP scripts, but this time does not count system calls (such as sleep, IO, etc.). Because of this, PHP will raise a FATAL error when it kills a process, whereas PHP-FPM does not have a fatal error.

In this experiment, THE PHP fastCGI working mode is used. If nginx connects to the upstream service by proxy, fastcgi_connect_timeout, fastcgi_read_timeout, Replace fastcgi_SEND_timeout with proxy_connect_TIMEOUT, proxy_read_timeout, and proxy_send_timeout.

conclusion

The cause of 504 is relatively simple. Generally, the execution time of the upstream service exceeds the waiting time of nGINx. This situation is caused by the time-consuming service of the upstream service, or the connection to the upstream server times out. From the above experiment, the cause of the latter is more difficult to trace, because in this case the connection exists, but cannot be connected. Fortunately, the 504 usually changes to 502 after a period of time.

The cause of 502 is due to the failure of the upstream server, such as downtime, process killing, upstream service reset connection, process death, etc. Error 502 can be found in the nginx log as 104: Connection reset by peer, 113: Host is unreachable, 111: Connection refused.

There are some subtle differences related to how Nginx works that haven’t been thoroughly explored.

[^1]: This article shows that our previous setup, if PHP is not started first, then nginx will not start. This setup is not reasonable: one of nginx’s upstream services has problems, and as a result nginx will not be able to provide services. But this is not relevant to our demo, so it is not covered in the text.

[^2]: Nginx knows PHP is not reachable and does not send TCP requests. It was found in the experiment that 502 in this case had a delay of about 3s, for unknown reasons.