Internet agile research and development, can not leave the efficient code management system. As the basic link of the R&D process, code management has the function of connecting requirements management, continuous integration, continuous delivery and other upstream and downstream R&D links, and also carries the construction of the engineer culture of pursuing code quality and encouraging code reuse. Tencent has nearly 30,000 R&D personnel with a long product line and a wide variety of businesses. Different team sizes, technology stacks and R&D modes put forward different requirements for R&D collaboration, which also leads to uneven code base size and R&D process. At the same time, compiling systems, publishing systems, etc., need to check out all code, and the higher the degree of automation, the greater the pressure to access the code base. Providing secure and stable code services, managing code repositories of different sizes, and supporting various types of r&d processes are the three major challenges facing code management. Based on the industry situation and its own development needs, Tencent chose Git as the basis and incubated its own Git system — worker bee.

The problem of server code base storage expansion must be solved first. Because a single storage node cannot meet the TB storage capacity growth, two solutions can be considered: custom data sharding and general distributed file storage. The advantages of distributed storage are that the application layer shields the underlying storage structure and the architecture is relatively simple. However, for IO intensive code hosting applications, the IO performance of distributed file system is too dependent and the portability is not strong. In contrast, custom data sharding can control sharding policies and balance resource loads flexibly. In addition, the underlying storage of each data sharding can be combined with distributed storage to further expand data backup. The worker bees chose the data sharding scheme, took the warehouse path as the routing rule, and realized the cross-sharding operation in the application layer. Hundreds of thousands of warehouses are distributed in different clusters, enabling dynamic cluster expansion and seamless migration between clusters.

After the storage capacity expansion problem is solved, the performance bottleneck of a single host is gradually exposed due to the increased traffic. As a result, the read and write of the code base are concentrated on one host, resulting in tight computing and memory resources. According to the analysis of the source, a large number of read requests come from compilation and publishing systems. In view of the scenario of more read and less write, the worker bee realizes the read and write separation mode of one master and many slaves at the code base level. The write requests are distributed to the host computer, and the read requests are evenly distributed to the slave computer according to the current load. Git native operation is adopted for data synchronization between master and slave to ensure atomicity and data consistency to the greatest extent. At the same time, a complete code base data disaster recovery system was established to ensure data security by using the slave machine as real-time hot backup data and cooperating with the remote cold backup data. Figure 1 shows the complete code base back-end storage architecture.

How to manage large libraries has always been a problem for code management tools. Git was originally designed to manage text code files, but there will inevitably be dependent libraries and resource files in the project, especially Tencent’s game business, which contains a large number of pictures, audio and video files, making this problem more prominent in Tencent. Worker Bee introduced Git LFS, an open source extension that specializes in hosting large binaries. As shown in Figure 2, you can greatly reduce the size of the Git repository itself and speed up the cloning process by storing these files outside of the Git repository and keeping only text Pointers to the files in the repository. At present, the single large game warehouse hosted by worker bees exceeds 2.5T, and the single library ceiling problem has been solved.

In terms of overall architecture, the worker bee adopts the popular microservice architecture in the industry. Figure 3, the agreement agent service for HTTP, SSH, LFS three protocols provide independent access link, encapsulates the database access data service, routing service for each request addressing the back-end code library data nodes, business services based on the function of the platform to provide resolution, for example, code browsing, statistics, code review, code search are independent of service, etc. In addition, a unified registry and configuration center provide global functions such as service discovery, service routing, abnormal fuses, and service configuration. All microservices are designed in stateless mode for easy horizontal scaling. With containerized deployment capabilities, the number of instances can be adjusted to accommodate high concurrency scenarios.

If code tools are not integrated with upstream and downstream r&d processes, they have very little impact on r&d performance. One of worker bee’s strengths is its rich open capability to support third-party system integration access. Webhook push mechanism is convenient for the third party to subscribe to code base submission events. It is widely used to automatically trigger the compilation and construction of continuous integration system after code submission. The Commit Check interception mechanism is used to automatically pipeline code inspections, such as code specifications, defect detection, and unit testing, before code merging, and strictly control the quality of the code by setting quality red lines. Worker bee also provides a rich API that conforms to restful standards, improves the private token and OAuth authorization mechanism, provides a safe and effective standardized access mode for third parties, and expands the application scenarios of worker bee.

In Tencent, worker bees have been widely used in six business groups, serving thousands of business lines including wechat and QQ, and the number of code base is nearly 200,000. Daily visits reach tens of millions of levels, and API is called millions of times a day, effectively improving the company’s overall research and development efficiency. Under the strategic goal of internal open source collaboration, worker bees are also subtly changing the way of collaboration in the company. At present, more than half of the projects of worker bees have realized internal full open source, and using Issue discussion is becoming an effective way of communication for cross-team collaboration.

At the end of September this year, “Tencent Worker Bee — Git-based R&D Engineering Platform” project stood out in the selection of China Computer Society and won the 2019 “CCF Science and Technology Award”. The CCF Science and Technology Award is given to those who have made important discoveries, inventions or original innovations in the field of computer science, technology or engineering and have a certain international impact in the relevant field. This award is a great recognition of worker bees. In the future, worker bees will devote themselves to exploring the degree of code reuse, r&d integration experience, r&d process data measurement and other aspects, continue to deepen their efforts in the field of code management, and provide greater value for the company and the industry.