Baidu open source this RPC framework BRPC, what is the way behind?
Writing in the front

On September 14, Baidu officially opened its RPC framework BRPC on GitHub based on the Apache 2.0 protocol. BRPC is an RPC framework based on protobuf interface, which is called “Bidu-RPC” in Baidu. It includes all RPC protocols in Baidu and supports a variety of third-party protocols. According to the current performance test data, BRPC leads other similar RPC products.

BRPC, developed in 2014 and mainly using C++ and Java, is the most widely used RPC framework in baidu. It has been verified in the production environment with high concurrency and high load, and supports about 750,000 simultaneous online instances in baidu. InfoQ learned that Baidu had several RPC frameworks in-house, and even opened source another RPC framework, SOFA – PBRPC, in 2014. In what context was BRPC born? What are its advantages? Why open source? InfoQ interviewed Ge Jun, head of BRPC, on these questions.

Making address: https://github.com/brpc/brpc

InfoQ: What are the basics of BRPC? When did you start developing it? What kind of iteration and upgrade did it go through? How is the internal application at present?

Ge Jun: BRPC was founded in 2014 and is called “Bidu-RPC” inside Baidu. So far, BRPC has been modified for about 3000 times, and it is still being optimized. You can find the description of each change on baidu wiki. BRPC’s main languages are C++ and Java, and support for other languages is mainly through wrapping C++ versions, such as the Python version of BRPC that contains most of the functionality of the C++ version.

BRPC currently supports about 750,000 simultaneous online instances (excluding clients) and more than 500 kinds of services in Baidu (last year’s statistics do not count such data now). Hadoop, Table, Mola (another widely used storage), high performance computing, model training, and numerous online retrieval services all use BRPC. BRPC unified the communication framework of distributed system and line of business within Baidu for the first time.

InfoQ: Why did Baidu develop BRPC?

Ge Jun: We realized in practice that RPC, as the most basic communication component, baidu was no longer leading at that time. My manager at the time was Liu Yang, a former Engineer at Google, who was very focused on infrastructure and willing to invest resources in this direction.

We’re going to talk more about these issues internally. “Easy to use” sometimes seems very subjective, but in fact there is evidence to follow, its key point is whether it can really improve user efficiency: development, debugging, maintenance should be considered, if the user efficiency is really improved, users will miss you, by bragging or political promotion of things will not win people’s heart. The original intention of BRPC is to solve the practical challenges faced by Baidu’s business. At the same time, we also hope to become the favorite tool of Baidu students, who will miss BRPC even after leaving Baidu. We hope to provide a useful framework at the same time, but also show a working method: how to write notes, how to play the log, how to write the ChangeLog, how to release the version, how to organize the document, and even for the future students who are not in Baidu’s work is also helpful, so from this point of view, BRPC is to embrace open source from the beginning. In fact, we’re doing pretty well in terms of word of mouth. The BRPC wiki is probably one of the most liked content on Baidu.

InfoQ: What are the advantages of BRPC over other open source RPC frameworks?

Ge Jun: BRPC is all about depth and ease of use. On the one hand, we don’t have the energy to do everything like gRPC. On the other hand, we also noticed that gRPC (including the older Thrift) was not deep enough or easy enough to use. The technical side of things is that, look at the sample program, the documentation is great, but in the real world it’s probably a different story. One of the hidden reasons why companies build their own wheels is that the superficiality is unbearable in some details.

What are RPC’s real pain points? Reliability, ease of use, and ease of locating problems. Do not appear in the service can not explain the long tail, the program variable should be as little as possible, all kinds of weird problems to have tools to support fast investigation. These don’t work well in the current open source RPC frameworks, and most of them look great, but they just can’t be popularized in their own organizations. Going back to the previous three points, how does BRPC work?

  • Reliability. Part of this is code quality, and by setting a very high recruitment bar for the BRPC team and engaging in deep technical discussions within the team, we ensured a solid code base. The other problem is the long tail problem, which is a design problem. BRPC actually contains many modules, among which bThread is an M:N thread library, in order to improve concurrency and avoid blocking. Both read and write in BRPC are wait-free, which is the highest degree of concurrency. Please click the link below for technical details. https://github.com/brpc/brpc/blob/master/docs/cn/bthread.md

    https://github.com/brpc/brpc/blob/master/docs/cn/io.md

  • Ease of use. There is a design where every choice is presented to the user as an option, claiming to have all the features, but when something goes wrong, the user “configured it wrong”. In addition, users are very dependent on the development team, which is basically useless without the support of the development team. The development team has enough reasons to expand the team. This is irresponsible, and users are frustrated by the sheer volume of options. BRPC is very careful about adding options, the framework can make its own judgment, never throw the user, all user options have the most reasonable default values, can be used without setting. We think it’s really important for the user experience.

  • Ease of locating problems. None of the other open source frameworks do this well so far. It’s ok to use it normally, but it’s a problem. In fact, this problem is also very serious in Baidu. Before BRPC, users always need to pull RPC students together to investigate problems. RPC framework is a black box for users, and users do not know what happens inside. According to our experience, almost every day, several users in the group ask about server lag, client timeout and other problems. Troubleshooting is normal, and there will be insufficient staff. For a long time, users will feel that you have all kinds of problems with this framework, and people will rarely return to their messages. BRPC’s solution is to add a variety of HTTP interface built-in services to the server. Through these services, users can quickly see server latency, errors, connections, tracking an RPC, CPU hot spots, memory allocation, lock contention, and so on. Users can also use BVAR to customize various statistics and aggregate them on NOAH, Baidu’s operation and maintenance platform. In this way, most problems can be solved by users themselves. Actually we go to see also see these, but will be more professional. Built-in service specific instructions can be seen here: https://github.com/brpc/brpc/blob/master/docs/cn/builtin_service.md

InfoQ: As an internal RPC framework, what are the service governance considerations?

Ge Jun: Baidu’s internal RPC is widely used, basically RPC calls, some product lines will also use local RPC to isolate engineering framework and policy code. Over the years, the systems surrounding services have become more comprehensive: Compilation is BCLOUD, publishing is Agile, service registration and discovery is BNS, certification is Giano, and monitoring and operation is NOAH. Within Baidu, BRPC is closely bound with these systems, and the user experience is one-stop. Although most of these combinations are removed in the open source version, users can customize them based on their organization’s infrastructure: interaction protocols, name services, and load balancing algorithms can all be customized. For some of the more generic ones, we hope to get user feedback to the open source version to make it easier for everyone.

InfoQ: What is the difference between SOFA – PBRPC and Baidu?

Ge Jun: SOFA – PBRPC is also a relatively early RPC framework developed by Baidu, which is part of sofa programming framework and has applications in search. Compared with SOFA – PBRPC, BRPC has the following advantages:

  • The abstraction of the protocol is more generalized, and unified the whole Baidu communication architecture. BPRC can accommodate a lot of protocols, based on Protobuf, based on HTTP, Baidu nSHEad/McPack, open source Redis/Memcached, and even RTMP/FLV/HLS live protocol, BRPC can be gradually embedded into the existing system, Sofa – PBRPC does not have the capability to extend the protocol without requiring a complete reconfiguration. Similarly, SOFA-PBRPC cannot customize the load balancing algorithm. By default, BRPC provides round-robin, random, consistent hash, and locality-aware algorithms, which can also be customized by users.

  • Multithreaded quality is better. Multithreaded programming is very difficult, and seemingly simple RPCS are riddled with multithreaded pitfalls, such as the code that handles timeouts that may run before the RPC has even been sent; Before the sending function ends, the callback that handles the reply is run; One reply is still being processed another reply is coming back and so on and so forth. Also, what happens when you initiate a synchronous RPC in an asynchronous RPC callback, what happens when you do a synchronous RPC with a lock. We cannot find satisfactory answers to these questions in SOFA – PBRPC.

  • Complete debugging and o&M support. Solve this problem in the nature of scalability, how do you let the user participation in custom indicators they are interested in, we design the bvar, let users can use smaller than the atomic variable cost means the freedom to customize various indicators, the user can see index curve in the browser, or the operational platform NOAH see summary of monitoring data. BRPC also includes a number of built-in services for users to debug applications, view connections, modify GFlags online, track RPC, analyze CPU hot spots, allocate memory, lock contention, and more.

Needless to say, THERE was competition between BRPC and SOFA – PBRPC within Baidu when it was born, but as elsewhere, the competition brought vitality. Similarly, BRPC and other already open source RPC frameworks are in a healthy competition to see who can really improve user efficiency. Every user can cast their vote by comparing code, documentation quality, interface design, ease of use, scalability, etc.

InfoQ: What is the overall architecture of BRPC?

Ge Jun: The technology stack is nothing more than from the transmission layer to the application layer, I will skip it, you can look at the open source documents. BRPC emphasizes “enhance scalability without sacrificing ease of use” on the architecture. For example, BRPC supports many protocols. A BRPC server with port in Baidu can support more than 20 protocols, which is very good for smooth migration of services.

There are also a lot of protocols on the Client side. Users are very comfortable with BRPC and BThread, so hopefully we can unify all the clients, like Redis and Memcached, in this context. These two clients are much easier to use than the official Client. Interested readers should try it out. For example, HTTP is set to “HTTP” and Redis is set to “Redis”. The Server does not even need to be set, it will automatically determine the protocol of each client, how to do this is also in the open source documentation, see this link:

https://github.com/brpc/brpc/blob/master/docs/cn/new_protocol.md

Name services and load balancers can also be customized. However, in order to be responsible to our users, we also do not encourage “too free” customization. For example, a little change in requirements requires a new one. In this case, we need to figure out what the essential difference is. This is something we do every day in our baidu support group. We are open party B, but we are also strict party B.

InfoQ: What is the performance of BRPC? How can such high performance be achieved?

Ge Jun: Performance is one of the things we take very seriously, and it is closely related to user experience. Good but performance is not good, or not good but performance is great, users will be very uncomfortable, we do not want users to tangle. From another point of view, at the beginning of the promotion, how can we convince the product line to use BRPC? The most obvious is performance improvement. And the performance isn’t just in benchmark images, it’s in real applications. The open case documents contain more or less performance improvements as follows:

Baidu Map API entry:

https://github.com/brpc/brpc/blob/master/docs/cn/case_apicontrol.md

Union DSP:

https://github.com/brpc/brpc/blob/master/docs/cn/case_baidu_dsp.md

ELF Learning Framework:

https://github.com/brpc/brpc/blob/master/docs/cn/case_elf.md

Cloud platform agent Services:

https://github.com/brpc/brpc/blob/master/docs/cn/case_ubrpc.md

The performance comparison with other RPC frameworks (including Thrift and gRPC) is shown below:

https://github.com/brpc/brpc/blob/master/docs/cn/benchmark.md

The open source documentation Outlines the performance design, and the documentation under the section “RPC in Depth “covers more detail. I won’t go into details here, but please read the documentation directly:

https://github.com/brpc/brpc#better-latency-and-throughput

InfoQ: Why open source BRPC? What are the next iterations of the open source project planned?

Ge Jun: because there are still many baidu systems that rely on RPC immediately to open source ah. RPC as the most basic component, open source not only for its own sake, but also pave the way for other open source projects. For example, we will open source THE RAFT library based on BRPC soon, which is very convenient for building highly available distributed systems. And bigFlow using BRPC to make streaming computing easy. Baidu’s understanding of open source has been deepening in recent years. Open source seems to expose baidu’s core technology, but it brings more important ecological influence. Since Apollo and PaddlePaddle, Baidu has really embraced open source. The open source version of BRPC is very similar to the internal version, except that the support for some unique internal infrastructure of Baidu is removed. The in-depth analysis of RPC technical details documents we wrote on the Intranet are also open source, and we will timely push changes in the future, please rest assured. This is a live project, not an open source branch.

The guest is introduced

Ge Jun, chief architect of Baidu and lead author of BRPC, has extensive practice in system programming, data structure, design and implementation of large-scale distributed systems.

Highly relevant conference recommendations

Like GeJun said, baidu will seriously embrace open source in 2017, a number of core technology sharing to outsiders, such as the upcoming QCon Shanghai station, baidu security division chief architecture ShiWu wide column will share how to apply machine learning to the risk control, explanation figure correlation analysis in the application core of data mining, and risk control.

The conference also set up software performance and infrastructure technology topics, a total of 100+ technical sharing, more details welcome to click “Read the original article” to learn more!

Today’s recommendation,

Click on the image below to read it

How should programmer face career frustration?