Integrate Envoy with our RPC library using Async-hooks

In this article, we will cover some engineering practices for the new Node.js Async_hooks API in Tubi. First, we introduced what drove us to try it and our specific requirements scenario. Then we’ll cover some of the problems we encountered and explain and demonstrate code examples for some of them. Finally, we will show the current solution in the Tubi production environment. As with all projects, our acceptable tradeoffs may not apply to you, just for your information.

motivation

In addition to Elixir and Scala, some of our services at Tubi use Node. Node is a great platform to get up and running quickly, and makes sense when developing Backends For Frontends. However, it is not without its problems, some of which are inherent to Javascript and others due to Node itself. To address these issues, we wrote our own microframework on top of Node-Restify, which is more of a collection of best practices. Code-named UAPI, short for Unified API, this framework provides Tubi engineers with a standard way to build apis. A typical API would follow a pattern similar to the following:

As you can see, we propagate the necessary information we need through function arguments. In the example above, the call stack would look like this:

Through careful planning, design, and careful code review, the above pattern has worked for us for a long time. For the past year and a half, however, we’ve been enthralled by the Sidecar Proxy Envoy from Lyft. We’ve been standardizing our infrastructure around Envoy and gRPC. To follow DRY development as closely as possible, we implemented RPC libraries in Node, Scala, and Elixir.

Our RPC library uses Envoy to make easy network calls. They add the following features: retry, short circuit breaking, rollback, caching, tracing, and so on. However, one of our biggest pain points with Envoy for Node is propagating trace Headers in the call stack (we used Jaeger). Only then will the correct Headers be set and the request properly traced, regardless of where the network call occurs. Other languages have mechanisms to solve this exact problem, namely thread-local storage. The JVM’s gRPC implementation even comes with a built-in Context object to simplify such requirements. Node had a continuation-local Storage library before V8, which would have been very useful, but it didn’t work well with async/await, a feature we’ve been longing to use since Node V4!

async_hooks API

Fortunately, Node V8 has released a new experimental API called Async_hooks. This API is very useful and versatile because it provides quite a bit of information. In short, it contains the following hooks (callback functions) :

Since we can now get the dependencies between each execution with asyncId and triggerAsyncId, we can build a tree that represents how the application executes, and we can associate all the functions that execute with the requests that trigger them. For example, the call stack used to handle a typical API call might look like this:

The key requirement is the ability to uniquely detach the stack created by each route (REQ, RES). If you are familiar with the framework you are working with, an easy way to do this is Monkey Patching. Here is an example of patching a simple HTTP server:

In our application code, our implementation looks like this (appropriately simplified for brevity) :

This approach is easy to implement and has the added benefit of not having to worry about cleaning up the context. It is destroyed upon completion of the execution of the request/response cycle. On the downside, this approach is closely related to the implementation of whatever framework you use. For example, in our example, we couldn’t find a way to patch the Node-GRPC library to use GRPC calls.

In a future article, we’ll look at a more detailed approach to solving this problem in a better way without the monkey patch, which can be generalized to any server implementation from Express to Node-GRPC. We’ll also discuss how to solve the problem of asynchronous hook life cycles that don’t work with keep-alive connections, and how we can manage memory and garbage collection in these cases.

Note: If a new HTTP request arrives at Node HTTPServer, it will create a new asynchronous execution resource, so we don’t have to implement the logic to handle different requests sharing the same asynchronous ID.

About the author

Yingyu Cheng is a senior back-end engineer in Tubi’s China team. After graduation, he worked at Microsoft for three years, where he was responsible for bing Ad Insights API and Middle Tier platform development. In 2016, I joined the newly established Tubi China team and was successively responsible for several key projects such as Adbreak Finder and Clip Pipeline. Now, I am mainly engaged in the research and development of micro-service infrastructure and Ad Server.

Follow the wechat public account “Bitu Technology”, follow our future exploration at any time, or join us in the new world adventure!

Integrate Envoy with our RPC library using Async-hooks

motivation

async_hooks API

About the author

Related Posts

【C++】 eight queens problem (vertical progression)

What happens if the mysql disk is full? I really met ~

Permissions design practices in distributed systems