The opening

In the fierce debate with the product last week, Xiao Ming finally used a set of observer mode to better solve the problem of frequent changes in subsequent operations after the occurrence of specific behaviors, but it was not so good for colleagues’ feelings in the process.

So Xiaoming temporarily left the business development, to do some technical support, before the poor Xiaoming encountered new problems.

Technical background

They had an internal Websocket service that pushed messages to online users, which at its peak had around 2,000 real-time users (equivalent to more than 2,000 long connections)

The Websocket service is developed using Golang, with Nginx as the reverse proxy

General technical architecture diagram:

A sudden accident

Ding ding ding…

For this kind of voice xiao Ming has already been familiar with, subconsciously shouted 1: there is a situation! “, startled the people around.

Troubleshoot problems

Nginx 504 reported an error: Gateway time-out

Although Xiao Ming started to do websocket service technical support, he immediately asked operation and Maintenance to open an account for him before he had the online permission. Operation and Maintenance warned him not to mess around on this machine, which was also used by other businesses.

Xiaoming get account, login after the crackling repeated clear command warm up, a fierce operation such as tiger, found the nginx error log.

Nginx error log

The CONNECTION to the WS service timed out. Since Nginx says it’s a WS service problem, take a look at the WS service error log.

Ws service error logs

Look at the number of files currently open

lsof |wc -l
output:
4435
Copy the code

Xiao Ming fixed eyes on a look, toward the screen colleagues smile, this problem immediately repair!

View the number of open files configured in the system

ulimit -a
Copy the code

Er… This has been adjusted to 65535 and we are nowhere near that limit…

Xiaoming from excitement into meditation…

Screen again

Ding ding ding…

Check how many files are open for the WS service PID 19246 process

lsof -p 19246 | wc -l
Copy the code

View the system limits for this process

cat  /proc/19246/limits
Copy the code

Restart the great method!

It doesn’t work anymore. Let’s just reboot it.

. Xiaoming eyes a black, immediately short circuit……

Start with Soft Limit

ulimit-Sn Displays the soft limitulimit-Hn looks at hard limitsCopy the code

In Linux, there are restrictions on processes, called limits. In practice, the most common restrictions are restrictions on Open Files, which are used when configuring Web services such as Nginx. In Linux, these limits are classified as soft limit and hard limit. The difference is that soft limits can be changed over the course of an application (breaking limits), while hard limits cannot (unless the application process has root privileges).

Xiaoming thinks there are so many restrictions. Well, I can just raise it a little

Configure specific restrictions

cat /etc/security/limits.conf
Copy the code

* for all users, the soft nofile configuration is already high…

Xiao Ming thought: why does push-WS limit the number to 1024?

Think again

Is the application limiting the maximum or default number of open files? But there is no documentation or code to find this limitation.

The operating system does not make any 1024 or 4096 limit, so this program is limited by who…

Xiao Ming not only baidu, Bing also climbed the Internet to Hong Kong to Google.

It is possible that the Supervisor is affecting our service.

Let’s look at the supervisor affecting the configuration of max-open-files:

minfds=1024; This is the file descriptor that is least free on the system. The supervisor will not be started below this value.
minprocs=200; The minimum available process descriptor, below which supervisor will not start properly.
Copy the code

The official explanation of these two parameters:

The minimum number of file descriptors that must be available before supervisord will start successfully. A call to setrlimit will be made to attempt to raise the soft and hard limits of the supervisord process to satisfy minfds. The hard limit may only be raised if supervisord is run as root. supervisord uses file descriptors liberally, and will enter a failure mode when one cannot be obtained from the OS, So it’s useful to be able to specify a minimum value to ensure it doesn’t run out of them during execution. These limits will be inherited by the managed subprocesses. This option is particularly useful on Solaris, which has a low per-process fd limit by default.

The minimum number of process descriptors that must be available before supervisord will start successfully. A call to setrlimit will be made to attempt to raise the soft and hard limits of the supervisord process to satisfy minprocs. The hard limit may only be raised if supervisord is run as root. supervisord will enter a failure mode when the OS runs out Of process descriptors, so it’s useful to ensure that enough process descriptors are available upon container startup.

Because the child processes managed by the Supervisor are forked by its process, the configuration for the Supervisor affects the system parameters of the child processes. Of course, root is not subject to this restriction, but in a production environment, this is not a good idea.

These two parameters affect the metrics of our current service process

The minfds and Minprocs parameters in the container are responsible for the handling of the container process and its children’s Max Processes and Max Open files, and the container limit is unaffected by the system uLimit.

Of course, for our service scenario, just set the system parameters high.

[supervisord]
minfds=65535
minprocs=65535
Copy the code

Restart the Supervisor service…

The ending

And so on…

You say what?

I want to restart

You can’t

How do I validate it? How do I highlight my contribution

Come down and try again…

All right…

More exciting content, pay attention to the public account dumb bear technology road: