• All you need to know to really understand the node.js Event Loop and its Metrics
  • Original post by Daniel Khan
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: MuYunyun
  • Proofreader: Sigoden, Zyziyun

Everything you need to know about fully understanding node.js event loops and their metrics

Node.js is an event-based platform. This means that everything that happens in Node is based on reactions to events. A series of callbacks are iterated through Node’s event handling mechanism.

Callbacks to events, all handled by a library called Libuv, which provides a mechanism called an event loop.

This event loop is probably the most misunderstood concept in the platform. When we touch on the subject of event loop monitoring, we spend a lot of effort on correctly understanding what we are actually monitoring.

In this article, I’ll take you through a rethinking of how the event loop works and how it can be monitored properly.

Common misconceptions

Libuv is a library that provides event loops to Node.js. In his wonderful Node Interaction keynote, Bert Belder, the key figure behind Libuv, began his talk by using Google image search to show images depicting events in various ways, but he pointed out that most of the images depicted were wrong.

Let’s take a look at the most popular misconceptions.

Myth # 1: In user code, event loops run in separate threads

misunderstanding

The user’s JavaScript code runs on the main thread, while another thread runs the event loop. Each time an asynchronous operation occurs, the main thread hands off the work to the event loop thread, which notifies the main thread to perform a callback once it completes.

reality

Only one thread executes the JavaScript code, and the event loop runs on that thread. The execution of the callback (any code passed in and then invoked in a running Node.js application is a callback) is done by the event loop. We’ll talk more about that later.

Myth # 2: All asynchronous content is handled by thread pools

misunderstanding

Asynchronous operations such as manipulating the file system, sending OUT HTTP requests, and communicating with the database are handled by the thread pool provided by Libuv.

reality

Libuv uses four threads by default to create a thread pool for asynchronous work. Today’s operating systems already provide asynchronous interfaces for many I/O tasks (example AIO on Linux).

Libuv uses these asynchronous interfaces whenever possible, avoiding thread pools.

The same applies to third-party subsystems like databases. In this case, the driver author would rather use the asynchronous interface than the thread pool.

In short: Thread pools will be used for asynchronous I/O only if there is no other way to use them.

Myth 3: Event loops are like stacks or queues

misunderstanding

An event loop executes asynchronous tasks in a first-in, first-out fashion, similar to a queue, and calls the corresponding callback function when a task is finished.

reality

Although it involves a queue-like structure, event loops do not process tasks in a stack manner. As a process, the event loop is divided into phases, each of which handles some specific task, and each phase polls for scheduling.

Understand the phases of the event cycle

In order to truly understand the cycle of events, we must understand what is done at each stage. I hope Bert Belder doesn’t mind, but I just took his picture to illustrate how the event loop works:

The execution of the event loop can be divided into five phases, and let’s discuss these phases. See the Node.js website for a more in-depth explanation

The timer

Callbacks registered with setTimeout() and setInterval() are handled here.

IO callback

This is where most of the callbacks will be handled. Most user code in Node.js is handled in callbacks (for example, cascading callbacks to incoming HTTP requests).

IO polling

New polling for events to be processed next.

The Immediate setting

This is where all callbacks registered by setImmediate() are handled.

The end of the

This handles all callbacks to ‘end’ events.

Monitoring event cycle

As you can see, virtually all events that occur in a Node application will run through an event loop. This means that if we can get metrics from it, we can in turn analyze valuable information about the overall health and performance of the application.

There is no readily available API to retrieve runtime metrics from the event loop, so each monitoring tool provides its own metrics, and let’s see what they have.

Record the frequency

The number of records each time.

Record duration

A graduated time.

Since our agent is running as a native module, it is relatively easy to add probes to provide us with this information.

Record frequency and record duration event indicators

When we did our first test under different loads, the results were amazing – let me give you an example:

In the following case, I am calling an express.js application to make an outbound call to another HTTP server.

There are the following 4 situations:

  1. Idle

No incoming request

  1. ab -c 5

I created 5 concurrent requests at a time using the Apache Bench tool

  1. ab -c 10

Ten concurrent requests at a time

  1. ab -c 10 (slow backend)

To simulate a slow back end, we let the called HTTP server return data after 1s. This causes requests to wait for the back end to return data, which is piled up in Node, causing back pressure.

Event loop execution phase

If we look at the resulting graph, we can make an interesting observation:

Event cycle duration and frequency to be dynamically adjusted

If the application is idle, which means that no task (timer, callback, etc.) is executed, it does not make sense to run these phases at full speed, as the event loop would block for a while in the polling phase waiting for a new external event to come in.

This also means that the metrics under no load (low frequency, high duration) are similar to those associated with slow backends under high load.

We also saw that the demo application performed “best” in the scenario with concurrent five requests.

Therefore, tag frequency and tag duration need to be measured based on the number of concurrent requests per second.

While the data has provided us with some valuable insights, we still don’t know at which stage to spend our time, so we looked further and came up with two additional metrics.

Work processing delay

This measure measures how long it takes a thread pool to process asynchronous tasks.

High latency of work processing indicates a busy/exhausted thread pool.

To test this metric, I created a module that uses Sharp to handle express routing of images. Because image processing is too expensive, Sharp leverages thread pools.

Making five concurrent requests through Apache Bench to a route with image processing is very different from a route without image processing, as you can see directly from the diagram.

Event loop delay

The event loop delay measures how long it takes before a task scheduled by setTimeout(X) is actually processed.

Event loop high latency indicates that the event loop is busy processing callbacks.

To test this metric, I created an Express route that used a very inefficient algorithm to compute Fibonacci.

Running Apache Bench with five concurrent connections and Fibonacci enabled routes shows that the callback queue is busy at the moment.

It’s clear to us that these four metrics can give us valuable insights and help you better understand the inner workings of Node.js.

These requirements still need to be looked at in the larger picture to make sense. Therefore, we are gathering information to incorporate this data into our anomaly detection.

Back to the event loop

Of course, without an understanding of how to solve the problem from possible actions, metrics by themselves are not very helpful. Here are a few hints when the event loop is running out.

Event cycle depletion

Use all cpus

Node.js applications run on a single thread. On multi-core machines, this means that the load is not distributed across all cores. You can easily generate one child process per CPU using the Cluster Module that ships with Node. Each child process maintains its own event loop, and the main process transparently distributes the load among all child processes.

Adjusting the thread pool

As mentioned above, libuv will create a thread pool of size 4. The default size of the thread pool can be overridden by setting the environment variable UV_THREADPOOL_SIZE.

While this can solve load problems on I/ O-bound applications, I recommend multiple load tests, as large thread pools may still run out of memory or CPU.

Throw the task to the server process

If Node.js is spending too much time participating in cpu-heavy operations, it is also a viable option to run some service processes to handle these heavy tasks or to write services in other languages for specific tasks.

conclusion

Let’s summarize what we learned in this article:

  • Event loops are what make Node.js applications run
  • Its functionality is often misunderstood – it consists of multiple phases, each of which processes a specific task, and polling schedules between phases
  • Event loops do not provide ready-made metrics, so the metrics collected vary from APM vendor to APM vendor
  • These metrics clearly provide valuable insight into the bottleneck, but a deep understanding of the event loop and the code that is running is key
  • In the future, Dynatrace will add event loops to the first detection element to associate event loop exceptions with problems

There is no doubt to me that we just built the most comprehensive event loop monitoring solution on the market today, and I am very excited that this amazing new feature will be rolled out to all customers in the coming weeks.

The last

Our first-class Node.js agent team does a great job of monitoring the event cycle. Most of the findings presented in this blog post are based on their in-depth understanding of the inner workings of Node.js. I would like to thank Bernhard Liedl, Dominik Gruber, GerhardStobich and Gernot Reisinger for all their work and support.

I hope this article has shed new light on the cycle of events. Follow me on Twitter @dkhan. I’d be happy to answer any questions you have on Twitter or in the comments section below.

Finally, as always: Download the free trial version to monitor your entire stack, including Node.js.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. Android, iOS, React, front end, back end, product, design, etc. Keep an eye on the Nuggets Translation project for more quality translations.