On December 15, 2019, He Zhengyu, researcher and head of system Department of Ant Financial, shared ant’s practical experience in financial system software, as well as the concept and practice of open source collaboration at OS2ATC 2019. The following is a summary of the speech:

I would like to share with you today some of my work at Ant, as well as the exploration and practice of open source collaboration in financial level system software.

As a matter of fact, the financial industry attaches great importance to science and technology, because the value of technology can be directly demonstrated. Moreover, it attaches great importance to the perfection and pursues the advancement of technology, which can be quickly transformed into business leadership.

Ant Financial, as the leading financial enterprise in China, has an endless pursuit of technology. Ant’s dream is to serve 2 billion consumers and 100 million small and micro business operators around the world. This is a very big vision, and we believe that only the continuous development of technology can make this impossible. Our 310 lending capacity, for example, is built on first-class financial scale data intelligence technology.

So what are the challenges of our system software and the pressures of making software? If summed up in a word, it is the service continuity guarantee and capital loss risk monitoring under the pressure of massive data. The first is to achieve a very high availability, which is different from the high availability system, such as the carrier-grade system. In addition to the following five nies, there are also some very strict requirements of financial institutions, such as 100% guarantee of capital security, which ant Financial has been pursuing.

Ant Financial and system software

Ant Financial is indeed pursuing perfection in the direction of each system software. First of all, from the perspective of database, OceanBase broke the monopoly of Oracle for many years in TPC-C evaluation. The result was that OceanBase team innovatively realized distributed relational database and was recognized by professional reviewers. Secondly, secure computing. We participated in the Open source project Occlum Trusted Execution environment, and carried out academic cooperation with Tsinghua University. The relevant article has been included in ASPLOS, and we also participated in formulating the first standard of secure computing in China. Then, in the direction of cloud origin, we developed SOFAMesh and took the lead in conducting large-scale verification through this year’s Singles’ Day. Finally, secure container technology, our Kata Containers is OpenStack’s top open infrastructure project.

I’d like to talk about some of our views. I always think that system software is a means, it is not an end, because we must figure out what our system software is doing. The picture on the right is very interesting. This is a staircase, but this staircase is unusable. If we build system software for the sake of doing it, we might end up with something like this staircase, where the goal is achieved but there is no value.

Any piece of basic software, system software, such as a new operating system, is generally expensive and will become obsolete one day. What kind of system software do we do? I believe it must be done to solve some problems, which is the most important thing we system engineers need to consider.

Next, I would like to share with you some of my own experience how we think and use system software to solve problems.

The first example is the problem with containerization that you’re doing. Under the trend of cloud native, people are migrating IT systems to containers, such as OpenStack to Kubernetes. In fact, there is a big problem here, that is, when migrating from virtual machines to containers, the isolation of our system, both in terms of security and performance, is reduced.

Ant Financial is making safe containers to solve the isolation problem of containers, and its principle is well understood. Traditional container isolation is dependent on Linux itself, including cgroup and namespace technologies, but applications still access the kernel directly through system calls. The security container makes an intermediate layer, using the new kernel, hypervisor and other technologies, so that the system call can not rely on the underlying Linux, and the security container itself on Linux is fully known and fixed, and small enough to do a very detailed audit, so as to greatly reduce the risk of host breaches.

Security containers can effectively protect hosts, but financial services themselves still need stronger isolation protection.

Fortunately, the recent rise of Confidential Computing can be very effective in protecting applications. It is essentially TEE technology that is widely used in all mobile phones, but with the development of technology like Intel SGX, it is possible to support TEE on every server.

TEE, now commonly known as Enclave, provides runtime bidirectional protection. Simply put, applications can use it without trusting the underlying software, such as OS. However, there are some problems in Enclave technology, which hinder its application in actual production environment, including:

First, it is necessary to rewrite the application, because there is no kernel and base library in the trusted execution environment, so it cannot execute the application directly in the Enclave.

Second, the application needs to be split, the business program needs to be divided into Enclave and Enclave parts;

Third, it is not clustered. Unlike the client scenario, failover and DISASTER recovery of Enclave applications are also reasons preventing it from being used on a large scale in data centers.

So it’s really hard to do TEE based applications right now, and basically it’s just pure computing stuff right now, because IO can’t solve it. And that brings us to our second case, which is why we make Occlum.

Occlum is the Enclave LibOS that we are focusing on this year, and it is probably the most advanced one in the world right now. You can use it to port Tensorflow Lite to Enclave in 1 minute. What I want to make clear here is that we are not building systems for the sake of building systems, we are building systems for ant businesses such as shared intelligence, blockchain and so on to better and faster reap the dividends of this new technology of secret computing.

Ant Financial and open source

Like the system software mentioned above, open source is a means, not an end. Here are some of our thoughts.

First to popular science Galapagos syndrome, this can actually correspond to our system software, system software from the beginning to the end if we are behind closed doors, it should be according to the status quo of the join part of compromise, and this kind of compromise will be more and more, the last in the face of open source software is not competitive.

So I think an open ecosystem is the key to the long-term viability of system software. In the picture above, killer whales in aquariums, on the left, have slouched dorsal fins, and open water whales, on the right, have straight dorsal fins. So, the ecology of the system is important. What I don’t want to see, whether it’s because of national policy or whatever, is that we just eat each other in a small pond and the last big shark comes along and they all get killed.

From the perspective of Ant Financial, we must remain open and hope to have a lot of healthy competition. Chinese martial arts must have Shaolin and Wudang, if they are the same school that can not work, a hundred flowers bloom, the state of a hundred schools of thought contend is the best.

Finally, the development ideas of Ant Financial system software are summarized. First of all, it must meet the needs of business competition. Then, we will cooperate with top academic institutions for innovation and actively participate in the open source community to shoulder due social responsibilities.

It is worth mentioning that ant Financial also has extensive academic cooperation on its system software. We have cooperated with domestic and foreign experts and scholars, including Tsinghua University, Shanghai Jiao Tong University, Zhejiang University and UC Berkeley, and got good results. For example, The Occlum project mentioned above was cooperated with Teacher Chen Yu from Tsinghua university. Here, I would like to bring out the most important purpose of my sharing this time, which is that I hope to have more exchanges and communication and achieve more cooperation with all of you in the academic and open source sectors. Thank you.