Preface:

We know that Golang abstracts the coroutine from the user, and that to run the coroutine must require the system thread to carry the context, since the thread is the smallest unit of scheduling. Golang’s scheduler will help you schedule the associated PMG, where M is the thread and G is the coroutine we use in Golang with the go keyword spawn. For the user, you can’t see or create native threads. Both Golang Nuts and Golang Issues have mentioned the need for native thread features, but the community has argued that this is not necessary. If necessary, you can use CGO or manually trigger the Runtime. LockOSThread binding.

Xiaorui.cc /? p=5171

That’s a little off topic. Keep going. Golang abstracts PMG, and we don’t usually pay attention to the number of Golang threads, but last time we saw an explosion of Golang threads, up to 1300 threads at the most. So we know what’s wrong with too many threads? The stack space of each pthread is 8MB (virT, of course). These threads also increase the scheduling cost of the kernel, so the optical context switch is not noisy. Of course Golang wouldn’t be so stupid as to let so many threads poll the P context queue, there should be conditional variables to control scheduling.


The reason for the explosion of Golang threads?

What is the problem causing so many threads? I’ll tell you the answer, in my case, because of the disk IO problem. Because some miscellaneous miscellaneous programs are in this host, IOPS is too high, and because ali cloud disk performance is low, and because the memory is almost occupied… .

First of all, we know that Golang blocks writing files, and we’ll explain why later. When we write logs, we usually write to the kernel’s Page cache first, and then the kernel flusher the logs to disk through thresholds or periodic refreshes, instead of calling fsync to drop the disk as a database does.

It should be quick to write to page cache first. But this server has a lot of related services running, in the pressure test scenario will generate a large number of logs, and a large number of logs will be thrown into the page CAHE, but just said that a lot of related services are in an Ali cloud host, although this Ali cloud is highly matched, but related services are eating memory not spit out the role of the skeleton, The page cache is significantly smaller. Since the memory is low, the kernel calls sync for persistence, but because ali cloud host disk I/O throughput is pathetic, so it cannot flush to disk quickly. As a result, Golang is logging more and more threads.





Golang Runtime has a sysmon coroutine that polls all P context queues. Whenever a G-M thread is blocked for a long time, a new thread is created to fetch tasks from the Runtime P queue. The previously blocked thread will be removed, and when it completes the blocking operation, the associated callback will be triggered and added to the callback thread group. In short, if you do not specifically configure Runtime.setMaxThreads, new threads will always be created for threads that are not reusable. Golang’s default maximum number of threads is 10000, which is hard coded. If you want to control the number of Golang pThreads, use Runtime.setMaxThreads ().





Golang Disk IO problem?

Golang uses the epoll event mechanism for network IO to achieve asynchronous non-blocking, but disk IO does not use this mechanism because the disk is always ready. Epoll is based on ready notification, ready & not ready state. Some of the most common high-performance frameworks do not directly address the disk I/O asynchronous non-blocking problem. For example, before nginx, reading and writing to a disk blocks worker processes. Only when AIO and thread pool are enabled can threads bypass the blocking problem. It’s fair to say that aiO isn’t perfect on Linux as a whole, at least slightly worse than on BSD and Windows. For example, IN Windows, IOCP solves the problem of disk I/O asynchrony, but its principle is also key to thread pools.


How to analyze the problem?

The following is the process of analyzing the problem: first, see the number of Golang threads through PS and pprof, then analyze the current call stack logic through Golang Stack, and then track the system call through Strace. The Golang stack doesn’t have much to say. Writing logs blocks and then the call stack shows a lot of log calls. So let’s focus on the relationship between logging and creating threads. Very simple two tools, one is strace, one is lsof.

Write (8) fd: mmap mProtect Clone: mmap mProtect Clone: With Prometheus monitoring, the number of threads increases when IOstat is too high. Due to other reasons, the system memory is also reduced, so the Page cache is constantly flushed.

A screenshot of golang’s built-in Pprof shows 1009 threads.



The number of threads in the PS command is the same as that in pprof.


Current disk I/O usage: utils indicates that I/O throughput is 100% for a long time


Check IO wait trends for Prometheus, and the chart below shows that IO wait times are fairly high.



How to solve it?

Golang thread inflation is caused by disk I/O for a number of reasons. For example, memory is too small, log output too much, disk throughput is not good, all kinds of things.

Our solution is to first limit runtime threads to 200 (Runtime.setMaxThreads), which is generally sufficient under normal circumstances. In an environment of 24 core cpus, a high frequency proxy service is normally loaded with only 80 threads. The main thing here is to avoid and prevent all kinds of unknown and weird problems.

In addition, increase the log level of the service online to WARN, and the whole debug level will cause crazy I/O output.

According to my tests, if the remaining memory is large enough, it can also reduce the abnormal growth of threads, after all, writing log O_APPEND is logically fast order IO.

fsync & sync

Also, I used Golang’s BoltDB for NOSQL storage earlier, which can cause too many threads when there is not enough I/O throughput on the machine disk. Of course it wasn’t 1000, there were over 400 threads.

Boltdb does not modify B + Tree data files in real time, but takes wal logging mode for most db. Wal is sequential I/O, but its WAL logs are dropped through fsync. In this case, dirty pages will not be flushed immediately, but will be flushed when the kernel flusher or triggers a threshold.

Flush only puts write requests into the I/O queue of the device. There’s only one part that crashes, so you have diarrhea. The function of fsync is to ensure that file changes are safely synchronized to the hard disk. This call blocks until the device reports I/O completion. Notice that there’s synchronization, there’s blocking waiting, I think you get the idea.

Boltdb seems to solve the problem behind the version, the specific modification of the logic did not see, interested friends can see. Let me know.

Conclusion:

So much for the golang thread explosion. It took a lot of time to write this article. In fact, it took only about ten minutes to find the problem and solve it. When you encounter some performance problems, use golang Pprof and the performance tool to check and locate the problem easily. Prometheus is recommended for system monitoring, and the Golang Metrics module outputs runtime information and business QPS to Prometheus. When problems arise, grafNA can be used to analyze problems in multiple dimensions.

END.














Original address:

Xiaorui. Cc / 2018/04/23 /…

The original source and this statement must be indicated in the form of links.