On June 25, 2018, the 11th lecture series of the 60th anniversary of the Department of Computer Science and Technology of Tsinghua University was held in the East Main Building of Tsinghua University. Yang Zhenkun, senior researcher of Ant Financial and the head of OceanBase, delivered a keynote speech entitled “Technology key for OceanBase to process 256,000 payments per second” in this academic report.


Yang Zhenkun took the industry trend as the entry point and made an in-depth analysis of the development process and technical breakthroughs of OceanBase, a financial relational database developed by Ant Financial.

This article is compiled from the speech:

In the Internet world, there is a strange circle: local companies sell local products to local people, but they send advertising money to foreign companies every day. In the United States, a handful of giant companies, such as Google, Amazon, Facebook and so on, have gradually cornered the Market for Internet applications in Europe and Japan.

On the other hand, in China, we have our own e-commerce site (Taobao, Tmall, etc.), our own search engine and social applications (wechat, Weibo, etc.). Can we confidently say that we really have reached the international first-class technical level? From the point of view of Internet application, the application of Made in China continues to evolve and enrich in form, and we admit that we have indeed done better. However, we still have a long way to go on the core foundation components.

Today, in terms of widespread use, China can’t really claim to have its own processor, its own operating system, or its own relational database. Some people may say, today’s open source ecology is booming, we can use open source operating system, open source database.

Yang Zhenkun, head of OceanBase

To give you a data: in the mobile phone market, every Android phone in China not only pays a few dollars to a foreign operating system software company, but also pays a few percent of the price of the phone to a foreign communication giant communication and chip patent fee.

In the financial industry, the key software and hardware infrastructure of almost all banks comes from the United States. The servers are FROM IBM, the shared storage is from EMC, the database software is mostly Oracle, and the rest is a small part of IBM DB2. Suppose, one day, there was a military conflict and the banking system was left without technical support and spare parts for three years, what would happen to our financial sector?

After all these years of hard work, we are still giving our lifeblood and wealth to others when it comes to critical infrastructure software and hardware.

Why China’s own relational database?

This is not only the objective demand driven by the Internet, but also the inevitable result of the digital transformation of the financial industry.

New challenges in the Internet age

In the era without Internet, whether it is shopping malls or banks, the concurrency of relational database system is very limited, from dozens, hundreds of quite common, thousands, tens of thousands of more than is relatively rare. After entering the Internet era, the number of concurrent visits has increased by an order of magnitude. In 2010’s Singles’ Day, the number of concurrent visits on Alibaba’s Tmall reached hundreds of thousands during the peak period. In 2017’s Singles’ Day, the number of concurrent visits reached more than 10 million. This is already a process from quantitative change to qualitative change.

It can be said that the phenomenal concurrent traffic brought by The Tmall Double 11 event was the source, which pushed the whole team to do this. Another important factor is that under the traditional model, whether IT is the mall or the bank, the traffic is very stable, and the mall/bank has enough time to expand the entire IT database system. Today, our site may have 100,000 visits a week, but in a few days it can go up nearly tenfold, and the traditional hardware of a relational database system can’t be bought and implemented in half a year or so.

These are the two characteristics of databases in the Internet era. First, the amount of concurrency is very large, which brings hundreds or thousands of times the cost. The second is that the load changes dramatically, leading to changes in scalability.

As we all know, the 2017 Tmall Double 11 set a new payment record: 256,000 transactions per second, so what is this concept?

The simple explanation is that in order to reach 256,000 payments, the database needs to execute more than 42 million SQL entries. Take a more intuitive example, China’s five largest banks are the establishment of diplomatic relations between Agricultural and industrial enterprises, their ability to pay per second may be more than 10,000 volumes or even less.

That is to say, if Alipay adopts the same solution, IT needs to pay dozens of times the IT system cost of big banks.

Regression to the nature of relational databases: transactions

Databases are valuable because of transactions. It is also difficult because of transactions. In the words of Professor Zhou Aoying, vice president of East China Normal University, the essence of a database is to do three things: transfer money, bookkeeping and booking tickets.

Living in today’s real world, there is no place without a relational database. When you call, the relational database charges you; Buying train tickets, plane tickets, bank deposits and withdrawals, and all kinds of website transactions are actually supported by relational databases. Therefore, it is no exaggeration to say that the relational database is the most critical and irreplaceable infrastructure in the entire information society today.

Databases, however, are tricky. If transactions are the most important thing in a database, the most important thing is the ACID property of transactions.

  • Atomicity (A) : either all or none of A transaction is executed. If you withdraw money from an ATM, for example, the transaction can be divided into two steps: changing the account balance and withdrawing money. The balance can’t go down and the money doesn’t come out. Both must be done at the same time, or not at all.
  • Consistency (C) : In the financial system, a typical scenario is the primary and secondary credit cards. For example, if you and your family use primary and secondary cards, your credit limits will both decrease when you spend money, and increase when your family makes a payment. If it’s not that hard to do on one machine, it can be very difficult on two different machines.
  • Isolation (I) : While a transaction is running, it behaves as if it is the only transaction currently running in the system and does not access inconsistent data due to another transaction concurrently executing in the system.
  • Persistence (D) : The only thing that has persistence today is a hard disk. Data center hard drives fail about 2% to 4% a year, so if your hard drive goes down, will your data still exist?

To sum it up in one sentence:

Databases naturally select computers, but computers naturally don’t fit into databases.

Data can not be wrong, service can not stop for a second

Relational databases are at the lowest level of a business system. Relational databases are also difficult because of a very simple principle: the data can’t go wrong, and the service can’t stop. In any business system, database data error is a huge disaster. For financial business, if your system is out of service for more than 30 minutes, you may need to go to the CBRC to explain the situation.

Because of these two factors, the risk of switching databases is very high, but the benefits are often not matched. That’s why latecomers like IBM and Microsoft haven’t been able to replace Oracle. This leads to the database has become a high threshold, the strong constant strong field, it is difficult for latecomers to occupy.

OceanBase: Rebuilder of relational databases

Limitations of traditional databases


Above is a very simple schematic diagram of a traditional database architecture. Today the IOE system is deeply entrenched in banking. Although IBM server and EMC storage has a part of the domestic manufacturers can be replaced, but No one shook the status of The world’s largest Oracle.

Even with the most expensive and best equipment in a database, a single device failure, such as a power outage, still exists. Therefore, the database system must have a standby database, and at the same time, another problem is the primary and secondary synchronization.

However, in theory, the traditional relational database can not do the synchronization at all. If you have to do synchronization, that means that every transaction has to be synchronized from the primary to the standby, and the standby will confirm it before replying to the client. If there is a problem with the intermediate network, or if there is a problem with the back-up inventory, all synchronization will be blocked, which means that no write operation can take place.

For banks and businesses, this is a matter of life and death, to ensure synchronization, there is a risk that the business will not be available. So banks buy hardware such as more reliable storage and servers. The best hardware is reliable, but also very expensive.

Another limitation of traditional relational databases is the absence of distributed databases. The distributed transaction two-phase commit model looks pretty good, but in fact, once a node fails in phase two, the transaction can neither commit nor roll back, and can only be suspended, which in a real production system can quickly drain the database connection and cause service interruption.

The lack of distributed database leads to the traditional relational database can not be expanded horizontally, but can only be expanded vertically, which not only further increases the cost, but also limits the development of business. Whether the primary database is inconsistent with the standby database, or the lack of distributed database, the fundamental reason is the lack of high availability of traditional relational database itself, that is, today’s traditional relational database is to ensure the availability of external hardware, but not from the database system to solve the problem.

OceanBase’s goal: 10 times the price, do what no one else can do

The market for relational databases is so special that OceanBase must excel at certain points in order to survive and thrive. OceanBase has set itself the goal of being 10 times more cost-effective than traditional databases and doing things that no one else can. Since its inception eight years ago, OceanBase has been doing three things on the ground. 1) The first thing is to ensure high availability while addressing data consistency. OceanBase solves this problem by making availability internal to the database system.

Previously, we analyzed the contradiction between high availability and the consistency of primary and standby database data, which is an objective law that cannot be changed. So how does OceanBase do it? The key to the high availability of OceanBase databases is as follows: One active and two standby nodes or one active and multiple standby nodes are used to replace one active and one standby node. If two out of three databases succeed, then the transaction succeeds. So the availability of the system and the consistency of the data are guaranteed if any of the machines fail. In the case of three libraries, if one machine breaks down, each transaction will occur on at least one of the two remaining machines, so the system can quickly recover and continue to provide services. This not only ensures the data consistency, but also ensures the feasibility of the whole system.

What if two of them break down? Although the probability of two machines breaking down at the same time is extremely low, it is still possible in actual production. So OceanBase uses not three libraries, but five libraries in a significant production system. This means that even if one machine breaks down and another is killed by human factors, the system will still work.

High availability and data consistency, OceanBase ensures both. That’s what we’re talking about, doing what no one else can do. OceanBase can assure the bank that no data will be lost or services will be stopped due to the failure of a small number of servers or even the equipment room. Manual account checking is no longer required. That’s one of the reasons we’re so welcome in the banking industry today.

2) The second thing is to improve the performance. OceanBase has greatly improved the overall system performance, especially the write performance, and greatly reduced the cost through the native read/write separation.

OceanBase stores the newly written data in memory, so that the entire write transaction (except log) does not require random write to disk. This is a qualitative improvement in performance.


But there is a cost to the original separation of read and write. One cost is the need to put new changes into memory, which is limited and cannot be written indefinitely. So be sure to incorporate this into your disk every once in a while.

While you still have to essentially write data to disk, the benefits are significant. Native read-write separation is the perfect way to stagger the peak of business. The business peak to do things (write disk) in the memory, so after the peak, in the peak and low periods, data to write to the hard disk, equivalent to the CPU, hard disk I/O staggerable utilization.

3) The third thing is that we really made OceanBase a distributed database.

One might say that this seems simple, but what you do on one machine is run on several machines. It took a team of dozens of people at OceanBase five years, and three major versions, to bring distributed transactions to where they are today.

What does distributed mean? For banks and enterprises with stable business volumes, some people think this is not a necessary demand. But nowadays, mobile payment has been integrated into everyone’s life. A very common business peak actually occurs in the noon of every working day, when office workers pay for meals with mobile phones. With the further popularization of mobile phone payment, this normal payment peak will be higher and higher.

The future has come, forge ahead

OceanBase Milestone is officially approved in June 2010. In 2011, Taobao favorites went online; In 2014, the Alipay trading system was launched; In 2016, alipay accounting system was launched; In 2017, OceanBase began to be promoted by commercial banks and has been put into operation in many commercial banks.

So far, OceanBase has been successfully applied to all core businesses of Alipay: transaction, payment, membership and accounting systems, e-bank, Indian PayTM, Alibaba Taobao favorites, P4P advertising report and other businesses. Since 2017, OceanBase has started to serve external customers, including Bank of Nanjing, Zheshang Bank, picC Health Insurance platform, etc.

Next Direction

OceanBase 2.0, which will be released this summer, is a truly well-developed version of OceanBase’s distributed transactions. There will also be more improvements in transaction optimizer and SQL optimizer in version 2.0. At the same time, we hope to use artificial intelligence/machine learning to help users better use the database, including secondary indexing, SQL performance optimization, troubleshooting, etc. We are also using new technologies, including RDMA, to further improve system performance and reduce overall costs.

More exciting content welcome to pay attention to “OceanBase” public account, reply to the keyword “exchange group”, you can join ant Financial technology exchange group, come and discuss technology with Teacher Yang!