On November 19, 2019, Ant Financial held the conference of “Peak Insight · Focus on New Financial Technologies” in Beijing, introducing the technology behind Alipay on November 11, 2019, and releasing the new OceanBase version 2.2 and SOFAStack dual-mode micro-service platform. We have compiled and posted the series of talks at
Ant Financial TechnologyWelcome to our official account.

Han Hongyuan, a researcher of Ant Financial, shared the lecture “Continuous and Innovation of enterprise database Platform” at the press conference. The following is the transcript:

In today’s enterprise market, database is one of the fundamental infrastructure support capabilities, traditional database technology is facing a variety of new challenges, facing these challenges when everyone is looking for some new solutions. Where do these new solutions come from? When I want to replace a very mature and widely used platform, how can I reassure customers to do so? In the enterprise database base platform, what might be the future development direction? That’s what I want to talk to you about today.

Relational database has been developed for many years. Today, most of the business systems that support the core of all enterprises run on relational database. The number of terminals accessing the business system in traditional enterprises is very limited, whether it is their own internal business system or the customer front-end counter business system. Basically, the system can be extended through the vertical expansion of the database server can meet the requirements.

With the development of mobile Internet, today’s business systems, whether banks or governments, have been widely exposed to most mobile access customers. When this happens, great changes are found, and traditional vertical scaling of the system is impossible to solve this problem. So how to solve it? This leads to many different solutions. In addition to the traditional relational database persistence processing data, many new data methods have evolved, such as the so-called polymorphic data persistence and so on.

When all of this stuff comes along, you find that when you go back to the real business logic, relational databases have great advantages, they can help customers effectively build business systems in a mathematically sound way, and these capabilities are very difficult to satisfy when you loosen up on the rigor of relationships. Including the customer’s existing business systems, how to continue to operate? How to ensure that customers already have the rigorous business logic to continue to operate? So it brings a lot of new challenges.

Traditional enterprise customers are not only commercial enterprises, generally refers to enterprises and public institutions and the party, government and military scope, generally refers to a certain scale and important entities. Then, the requirements for enterprise-level systems and platforms also include many: 1) high security; 2) High performance; 3) High reliability; 4) High availability; 5) High development efficiency, low maintenance cost; 6) High scalability. These are all core things for an enterprise.

At the bottom of the chart is a horizontal line, with activity at the top of the list very stable from 2013 to 2019. Today, the top three most active databases are still relational databases, and relational databases are still the mainstream of enterprise data management platforms today, not only mainstream, but also support all major business systems.

Mainstream relational databases are relatively stable and are subject to a lot of open source attacks. For most enterprises, it’s not about development and software, it’s about the capabilities of the platform. You need to be highly reliable, highly available, and able to help users continue to grow. As long as this product is reliable enough to use the case, the customer can own the operation and maintenance of the case, there is no need to see the source code. When we talk about enterprise today, we are probably looking more at how to support customers to run their own businesses effectively.

The chart above shows the history of the database. The earliest hierarchical database, network database and so on, when the emergence of relational database, because of their outstanding characteristics, basically the main business systems are migrated to the relational database development mode. Personally, in my 20 + years of experience, relational databases are the most frequently condemned technology in any database, and they are still alive and well.

What is the real value of an RDBMS? I think, first of all, the emphasis on ACID in the database helps to simplify application development. Second, the SQL writing method is very key, close to the natural semantic writing method, the biggest advantage of business development people write out of the code can make business people understand, the benefit is that everyone is easy to communicate, written code is very readable and maintainable, so it is very difficult to get rid of these technologies. But relational database is not invariable thing, from centralized to distributed is a big direction of development, many times these things in order to break through the original limitations and technical bottlenecks.

The changes brought by the Internet and Internet companies are the changes in user visits, as well as the exploration and innovation of many new technologies. Among the earliest Internet companies, quite a number of Internet companies were engaged in businesses that were very different from traditional enterprise businesses. When you do these businesses, some businesses have a standard answer, some businesses don’t have a standard answer. When you do a search on the Internet, you don’t know what the results are until you do the search, right? When you transfer an account, you must know the result of the transfer before you transfer it. These two things have completely different requirements for the ability to support a database platform. After Internet companies are under great pressure, they need to have very strong requirements for system maintainability. Many things are automatic maintenance and automatic high availability management, which is the starting point of a great technological change. For example, the high availability of traditional enterprises basically rely on semi-manual, semi-machine way to do, basically do not believe in machines, their own high availability is very perfect.

For example, there are a lot of financial institutions that have disaster recovery systems, and switching between disaster recovery systems is a decision that no one would dare to let a machine make. When you’re very large, it’s very difficult to manage that, to trigger how to implement automatic high availability, to really make the system complete with judgment.

When the Internet developed to the field of finance, the problem I just said is very troublesome, many users have a lot of concurrent requests, do query do not know the accurate result, do financial must know the accurate result, do any wrong users will directly find you account. There’s a lot of infrastructure that needs to be in place to help you support such a good financial business.

Next, we hope to discuss with you, what kind of development trend will the database technology in the following enterprises have?

First, distribution has become an inevitable trend for several reasons. The expansion capacity of a single large server plus storage is limited and cannot support the sustainable development of enterprises. In today’s cloud environment, you can look at the market no matter which major cloud providers, now there is no cloud provider will let you connect the server to high-end, high-performance storage to support the database to run. If you want to run a database in the cloud, you will have to choose another implementation. So when you look at the testing, including the system architecture design, it is actually in line with the general trend of cloud. Today, there are many big changes in hardware. As people questioned our TPC-C test results, there was a big difference between the hardware 9 years ago and today.

If I were to give you 200 servers today and put a database on it, you wouldn’t be able to put it on, let alone run it. It’s not that easy. It’s a big test of software to be able to use this new hardware effectively. If you look at most of the major relational databases that are still in use, one of the big things about them is that they were all designed 30 years ago. It was designed with two assumptions 30 years ago, that all hardware memory was small, that all storage access was slow, and that all databases were basically the major databases that we use today were developed from that era, and these two constraints put a lot of constraints on it.

If you try it, you might find that if I have an Oracle database with 256 memory today and 512 GIGABytes tomorrow, how much better is the performance going to be? Can you double it? In fact, a 1% increase would be nice. What’s the reason? Its software architecture is designed to make it difficult to effectively use new hardware capabilities, and as new hardware technologies emerge all the time today, new ways are needed to put hardware capabilities to use to give users better systems, better returns and easier management.

In terms of cloud development trend, database is the most suitable service for cloud. How to effectively support customers to use the database in a cloud way is also a great challenge.

Finally, the scale of management is also very large, and the system is very complex. How to reflect more advantages brought by artificial intelligence into the system to help the system to run automatically and to better automatically tune. That’s easy to say today, but the challenge is to have a large enough, wide enough range of usage environments and scenarios to help you accumulate the data that you need to figure out the model you want.

So what capabilities do new enterprise-level databases need?

First, it is best for the database not to have a specific dependency on hardware, which prevents moving to the cloud and optimizing it. Secondly, all enterprises today face a direction of development, which is how to move from traditional enterprise architecture to cloud native architecture, and how to support the smooth transition of the database before and after the transformation of users. However, in today’s enterprise environment, many load changes are sudden. You need to be able to migrate flexibly between different operating environments while protecting your data. Finally, database is a very suitable service for cloud computing. How can we really bring the underlying hardware capabilities into play? For example, multi-tenant environment should be considered at the beginning of design, how to effectively use resources in multi-tenant environment and support all mixed load capabilities. Together, these are some of the factors that must be considered when building the next generation of database platforms.

OceanBase’s exploration of new technology

Going back to OceanBase, we did a lot of exploring in these areas.

The first is high availability. Today, high availability level 6 is very high availability, and most organizations can’t achieve this level, but we can achieve far more than this level today, with automatic recovery in 30 seconds. It all depends on how well the technology works, such as Paxos. No one uses it in traditional databases, but in the new distributed scenario, it can actually help you achieve fully automatic high availability with these new technologies. The advantage of the automatic voting mechanism of Paxos is that the system can continue to run without the need for external intervention to replace the things that fail, which is indispensable in today’s large-scale operation environment.

As I said just now, the pressure brought by today’s Internet has greatly increased the pressure on the business system. It is impossible to support all the business by relying on one database and one system. In our system operating environment, after so much actual environment input, today can give the user a very flexible choice, you can let the user to choose the database deployment granularity, you can choose sub-database sub-table, also can choose single library. The choice to the user, the user can according to their own needs from different ways to do transition between the fusion, can help users continue to develop forward.

Why go to distributed databases? The traditional left-most centralized deployment mode has the advantage of ACID, but the disadvantage is that it can’t be expanded when you want to. In order to solve the problem of extension, we just do sub – library sub – table. In most cases, database and table are implemented by middleware. This breaks the boundaries of the database and introduces many new problems. The biggest problem is that it is not transparent to the application. If you are a complex business application and want to adapt to separate libraries and tables, it is very difficult to modify the application.

To solve this problem, we made a native distributed database. At its simplest, you can treat a distributed database deployed and run as if it were a centralized database, and it won’t make any difference to you. Just as Mr. Yang said, TPC-C measures the business processing capacity of a system. When it comes to the external performance of tPC-C, all the inspection standards are the same as the external performance of the single database. This is one of the hardest things to implement in today’s distributed databases, how do you take something that’s distributed and present it to the user as centralized, with increased power, but not increased complexity in use.

When we look at a traditional database, we tend to emphasize the ACID property. If you just want to satisfy ACID, you can do that very easily. Because the database can stop. To ensure ACID, we can shut down the system in the event of an exception, but the user will not be able to use the system. Today, high availability can in turn help ACID while keeping the system highly available.

In the traditional system, when two-stage submission is used, the biggest problem is not the system consumption brought by the two stages, but the transaction between the two stages. Once one participant becomes unavailable, the whole system cannot continue to run. When the state is unknown, there is no way to ensure the consistency of the system. Automatic RTO recovery in 30 seconds is very useful. When all the transaction participants of the whole system can be recovered in a very short time, it will not suspend the business, can ensure that the business continues to run, eliminating a very big weakness in the traditional distributed system.

In terms of the overall market today, relational databases are a global market that has already been demarcated. The reason why there are new products today is that many factors have come together to create new changes. There are a lot of businesses that use OceanBase inside Ant, which have experienced the development of the past seven or eight years. During this process, OceanBase strengthened its own capabilities and eliminated many problems. After years of training in a large-scale complex environment, we came out in 2017 to serve external users.

We hope that more users will give us the opportunity to try OceanBase in the future. We also hope that this product can help people solve many problems in reality. We also welcome more enterprises and organizations to join us.

With regard to TPC-C, I would like to say that, first of all, TPC-C is currently the only public testing standard with credibility in the world for the combination of database functions and performance. Because every major player in every market has previously published test results on this standard, even though the test model is 20 years old, all the results make sense. And the model definition of TPC-C if you go into it, there’s a lot of science in it.

Second, tPC-C test we often see a lot of misleading information, I run a result at home, single TPC-C can run 1.5 million, 2 million, 3 million, this matter is not meaningful, TPC-C test in addition to running performance indicators, it has a prerequisite. ACID check in TPC-C test is a very big challenge for distributed database. Today, most distributed databases take the approach of avoiding this problem rather than solving the problem directly. Many of the results measured are not valid results. We participated in the TPC-C audit to demonstrate that our distributed system can handle 60 million new orders per minute as a stand-alone database.

The third point is that among tPC-C results certified by TPC, the 60 million tpmC obtained by OceanBase ranks first.

Fourth, the test is based on the public cloud universal model, using the same basic environment as the production system. The biggest change today is that as traditional enterprise databases move to the cloud, the biggest change is that there are no servers as powerful as those available to match the original environment. How do you solve this problem without servers and storage of that size? We showed you that you can achieve the same performance metrics with software capabilities.

Finally, I don’t think TPC-C is a sufficient condition to prove database completeness, but it is a necessary condition. If your database doesn’t have that capability when you’re dealing with the core business systems of these large organizations, you’re not going to be able to easily migrate your applications.

Today, sometimes we tend to focus on a point of the database, but the entire enterprise for the distributed transformation of the road, can not only go on a point in the database. What I want to say is that the entire distributed transformation of the enterprise needs to be combined from top to bottom with middleware and development process management and system assurance management, so as to ensure the effective distributed transformation and truly support the sustainable development of your business.

OceanBase is a completely self-developed distributed relational database, we control all the source code and system design. When designing the system no preset restrictive conditions, without dependence on specific software, without dependence on specific hardware, no dependence on particular architectures, this time I can be very wide to fit all new hardware system and new operation environment, can help us to explore more system combination use of space. I think one of our biggest strengths is that there is no locked dependence on specific external hardware and software systems and systems architecture. Thank you!