Introduction | relational database system has gone through a journey of 50 years, during which the database has become a core component of digital era. In recent years, domestic database is a hundred flowers bloom, on the stage of history. In each stage of historical development, what is the key evolution of database technology? In the future, where will database technology go? This article is based on the speech of Gai Guoqiang, CEO of Yunhe Enmo and Tencent Cloud TVP, in the TechO TVP Developer Summit “The Song of Data Ice and Fire — From Online Database Technology to Massive Data Analysis Technology” titled “Vianxia Update — Development and Future of Database Technology”. The key evolution stage of database technology and the leading technical exploration direction are introduced in detail, and the latest trend of domestic database development is expounded.
Click here for a video of the highlights
First, three trends in the database industry
Today, it is appropriate to use these four words to measure the database field in China. A large number of domestic databases continue to emerge in the market, and different domestic databases have found a broader application space in different scenes. This is the Chinese moment in the database era.
To illustrate these points, let me first share some of my judgments about the industry.
1. Enter the new database era
First of all, I think the development of database technology has gone through three eras: from the commercial database era to the open source era, and then to today — what I call the new database era. Note that many people refer to today’s era as the “cloud database era,” but I prefer to use the word “new” because the foreign and domestic database markets are showing different patterns of development.
The era of commercial database is mainly represented by Oracle, which spawned a series of commercial database software companies. In this era, many domestic database companies started to explore at a very early stage.
The second era is the era of open source database represented by MySQL. Open source makes the Internet possible. Today, people use a lot of Tencent’s services and products. Because the foundation of the Internet is carried by the spirit of free and open source, open source technology makes the Internet possible.
Third, today we enter a new era of databases. Open source database technologies generate value in the cloud. In the past it was difficult to assess how much business sales and business value open source technologies generated, but today when they can be sold in the cloud, their value can be measured. The report issued by Gartner yesterday mentioned that Microsoft database has surpassed Oracle and become the first in the global market. Tencent Cloud, Ali Cloud and Huawei Cloud are the leaders of the three databases in the Chinese market. What does this show? This shows that in the new era, the cloud has become one of the most important places for databases.
However, China’s database field also presents a new characteristic — the database ecology of a hundred flowers blossoming. Because the independent innovation and iteration of domestic databases started late, we still have to go through the foreign technical road again, which is the different pattern of Chinese databases. TDSQL has been around since 2012, and today there are a number of homegrown databases emerging in the market.
Databases become the ultimate battle in the cloud
What’s the second big trend? It’s the database that ends up being the ultimate battle in the cloud.
Because the most basic nature of cloud is based on IaaS, after the completion of the construction of IaaS layer, followed by PaaS layer, PaaS layer must solve the problem of database, who can be the first to break through the PaaS layer, who will win competitive advantage.
Take AWS as an example. AWS announced in 2019 that it would completely replace all 7,500 Oracle databases. Why is this such an important event? Because every year at Oracle World Conference, Oracle laughs at Amazon: you sell open source databases on the cloud, you sell your own cloud database, but in fact you buy a lot of Oracle License from me. So Amazon made a promise to replace them all, which it did in 2019. What is the nature that emerges in this process of substitution? I refer to the amazon in the replacement after they preach, the chief officer of a paragraph, he said that our traditional DBA work most of the time consumption in the expansion of database, storage capacity, and License licensing negotiations, and traditional DBA to do these work can be completed through the cloud self-service today, this is the change of nature, The fundamental change is the technology of the Internet, the technology of the cloud that allows traditional jobs to be automated, and I think that is essentially at the heart of the change.
3. TOB market is the key to database success or failure
The Chinese market is slightly different, I think the Chinese market will be a long-term coexistence of public and hybrid clouds, and the private cloud will still occupy a large market. What kind of database pattern will be presented under this development pattern?
I summarized into it, the experience of the cloud eventually want to transfer to the data, since not all of the database, data can be up to the public cloud, then we will need to public cloud the best user experience, the characteristics of automatic autonomy is transferred to the private data environment deployment, this on the analysis of the report is called call TPC, mean there is only the IaaS and infrastructure, Can not become a real private cloud, but also need to have the user experience of the Internet, the real cloud flexibility and other core capabilities. So I think in the database field, especially in the Chinese market, the next step is the cloud experience under the cloud, and eventually the cloud and the cloud will converge. The cloud will become the only form of infrastructure provision in the future, but there are two terms: public and private, and the essence and core will converge.
Second, the academic viewpoint of database technology
What we have just talked about are several major trends that we see in industrial production practice: from commercial to open source to the cloud age, from cloud to cloud, from cloud to cloud experience under the cloud. Let’s take a look at what the academia is talking about and thinking about.
The Turing Prize has been awarded to five giants in the field of databases over the past 50 years. The first was Bachmann, the inventor of the mesh database. The second is Coder, who is the architect of the relational database that we’re talking about today, and who, through a paper he published in 1970, gave birth to the whole big picture of relational databases, and he won the Turing Prize in 1981. The third is Gray, who is a deep probe of transaction theory, and he is the soul of the industrial implementation of database products including Microsoft. The fourth is Professor Michael Stonebreake, who won the Turing Award in 2014 and has launched a number of database start-ups. I’d like to share with you Ullman, who just won the Turing Award in 2021. His major achievements are in the field of teaching. He is the author of the very famous Dragon Book, and he is the first mentor for many data scientists.
So you can think a little bit about what’s going on with this evolution? From the earliest founder of theory to the explorer of affairs; To Stonebreeke’s attempts at productization in industry — from relational row storage to column storage to big data — he explored many fields. Back to today when we talk about Ullman, he started from academia, from school education, to rethink whether these database theories still have the possibility to explore, to innovate, to innovate. I just discussed with Mr. Li Haixiang, and he said that he was thinking about such things recently, trying to make new thinking and innovation from the business model. So I think of it as a kind of relational database theory that has run out of steam, but people are starting to look back and see if there’s a chance there’s something going on in these basic theories, that’s my point of view.
Professor Ullman made such a point in his recent paper, called “The Battle of Data Science”. So you’ve seen the two wars that I’ve mentioned, one is that databases are the ultimate battle in the cloud, and that’s generally accepted; The second is the battle of data science. For a long time in the database field there has been a pessimistic voice, people say that database management systems are becoming irrelevant, the database people often talk about a phrase is “are we missing another ship”, the ship is data science. The vigorous development and hot status of data science makes database people wonder if we have missed the best cruise ship. However, Professor Ullman pointed out that databases and the technologies generated by database research are still the core of data science. This has not changed. The core of database systems has always been how to process as much data as possible, and should study all kinds of data. And I think he’s reclaiming the nature of databases, and what do we people who work in databases, people who work in data technology, want to do? It’s about storing all the data, looking at all the data, and generating insights from all that data. The world would be a better place if artificial intelligence played a truly innovative role in data science, but it would be based on raw data.
III. Core evolution and innovation of Oracle database
Since there is a belief in the academic world that storing and studying all data is still the essence of data management systems, I would like to review how Oracle, which is still the king of the database world today, has gone down this path.
I’ve outlined some of the key steps we’ve taken from Oracle8 to today. In Oracle8, we see that Oracle has launched the Internet version 8i. Is Oracle late to the Internet? No, it has defined its product as I since 1998, for the Internet, when was 1998? Many of them may have been in kindergarten, but the Internet was already talking about databases. Oracle does research and development for parallel processing and supports database native XML, which happened to be my career, so I was the first to study the implementation of this technology in the database. What does Oracle9i do? It did clusters, and clusters really came of age, distributed clusters of shared storage, and Oracle developed its own Linux distribution and started to do database automation. In 2004, when Oracle10g was launched, Oracle automated storage management technology was made. It was a very successful product, which directly led to the disappearance of some storage software companies in the market, so you can imagine its influence. It was a very important technical event.
Then in 2008, it entered the version of Oracle11. It began to develop all-in-one machine products and began to do read-write separation and other technologies on the database side. But notice that I have highlighted a time point here: In 2006, when AWS launched S3, I think there’s only one thing that Oracle Database has missed in its evolution, and that’s the cloud. When AWS launched S3 in 2006, the Oracle founder responded in 2008 by saying, “I think the cloud is old wine in a new bottle. There is no new technological innovation, it is a hyped concept. But now we look back, his judgment is wrong. It is precisely because of this wrong judgment that we can see that the leaders in the field of database are all those who are successful in the cloud. For example, Microsoft’s cloud is successful, so Microsoft has become the first database, and Oracle has dropped to the second place. So insight into the future is very difficult, but very important. It’s a matter of life and death, which is why we’re here today to talk about the future of databases.
We are moving forward quickly. Oracle started to do distribution in the 12c release released in 2017, which is not like everyone understands that Oracle is not distributed. In 2018 18C made its cluster a sharded architecture, which also supports IoT. By 2019 it had support for intelligent indexing in 19c — I think intelligent indexing is an innovation that has really made its way into production practice. Then we go to 20c — now Oracle releases a version every year, named after the year, but note that 20c is not released because of the epidemic, it will be merged into this year’s release 21c — persistent memory is being applied to the database, what does that mean? Oracle Database is still an innovative product, and is constantly incorporating leading technologies into the database core to provide new productivity, including indexing artificial intelligence, which represents today’s cutting edge. Its multimodality, its combination of hardware and software – so Oracle seems to be taking most of the right path in the database world today, except for the cloud.
What I have just talked about is actually a few macroscopic concepts. On the micro level, I have found some small points to analyze with you.
What are the most important things that databases are looking at moving forward in performance today? The original serial points are broken into parallel points, in fact, is a huge performance improvement, whether Oracle or TDSQL, or other domestic databases, today the most important innovation, performance improvement is to do such things. Let me give you a few simple examples. For example, when we started Oracle9i, we started to split the shared memory pool into seven pieces. Distributed shattering of storage through ASM; Split the process between master and slave. From 12 c, for example, write the log process becomes a master-slave, during the course of decades of Oracle process log writes only one single process writing log, but from becoming a master-slave 12 c, this is a difficult change, we know that the vast majority of database performance bottlenecks will produce in log synchronous tray, everyone was doing optimization. 19 c real-time statistics, plus 19 c to today’s intelligent index, the thinking of the intelligent index are simple, it is to human thinking is simulated, by thinking of the expert system to create a kind of index, try to confirmed that the performance improvement, performance degradation, but the real industry are removed to realize is not easy. In 20c, it introduced automatic parameter tuning, autonomous switching between primary and standby, and so on, and integrated all these capabilities into the database. This is the way that Oracle has been changing technology for more than 40 years, and it is still moving forward.
IV. Taking Tencent TDSQL as an example, the evolution of domestic databases
Let’s take a look at how domestic database technology evolution. I mentioned earlier that relational database technology has come 50 years from a paper by Dr. Cord in 1970, and many of us in the database industry think that relational databases are coming to an end, where is it going to go? I think the future of relational databases should be in China, why in China? It’s because China has the largest data infrastructure, the most centralized data application system, so that’s my judgment.
If we look at China, at least many business systems are centralized at the provincial level, while a relatively large province may have a population of hundreds of millions. Such centralized data infrastructure is unimaginable in foreign countries. Why can there be a breakthrough in the theory of relational database? I think it must be in the application to find a breakthrough, so in fact, I am also watching the development of Tencent TDSQL evolution, from the TDSQL gradually used and mature, this is the completion of the first step of the iteration. In the second iteration, it must be used externally to the broadest user groups and user scenarios, and the most noteworthy case is WeBank. What micro bank achieves today is a single day trading peak has 600 million, the highest TPS can reach 100 thousand, what concept is this? 600 million transactions you can imagine, this is a large number of users can create transactions, such frequent, high-frequency transactions will promote the database to constantly improve, progress. If you’ve ever used Oracle, you know the peak number of transactions and the number of concurrent transactions that an Oracle database can hold in a production system. If you can see an Oracle database with more than 10,000 transactions per second, that’s a huge challenge. But today in the Internet model, in a distributed architecture, A huge amount of applications, a huge amount of high concurrency can be supported, so this is the catalyst that I think can be brought to the Chinese scenario, and these catalysts will push the database to find new upward space.
To expand, what are the core issues that must be addressed once an open source database is applied to the financial core? The first one is data security, how to ensure the consistency of data is absolutely reliable, TDSQL has carried out the transformation of strong consistency of data, master and slave nodes are strong synchronization by default, this is a technical innovation. Second, when we use the distributed architecture, which means to manage the data node will rapidly expand, the original single database storage, now under this architecture, there may be hundreds of thousands of data nodes that hundreds of thousands of data nodes can no longer rely on a man to do data maintenance, because do not arrived, so there are a set of data, With 10,000 service nodes running on WeBank, what should we do? It has to rely on highly automated monitoring and troubleshooting, preferably without human intervention, which I think is the important future of data technology. With that in mind, I mentioned earlier that the entire TDSQL is built on a distributed basis, with read-write separation as its basic tenor, and that such a system has gradually found its way to the future in support of such scenarios.
This is a case two years ago, TDSQL in the Zhangjiagang Rural Commercial Bank customer scene, the past financial transaction core completely replaced. Small Banks is an Internet bank, its advantage is no historical burden, but for a traditional bank, its data structure, data application is very complex, not only have a trading, deposit and withdrawal, account, and reconciliation, such a complex business scenarios how to fall to the ground on a new database architecture, is very difficult. In this case, it took Tencent and the final customer more than two years to put it online. Now it seems that the result is very good. One main mode, three standby mode, second level fault switching, I have been paying attention to it.
But the other very important message for me in this case is that TDSQL provides the Red Rabbit and Bian Que systems as a foundation for automated operation and maintenance. Why do I think this is important? The database in the future must not be a single database kernel system, but a database ecosystem, including automatic operation and maintenance, autonomous primary and standby switching, autonomous high availability and other features. These features are also what Oracle is doing today, and I think they will be very competitive core features in the future.
Five, the future direction of database technology development
To sum up, we have just observed changes in industry usage, changes in database technology from theory to practice, changes in Oracle, including the evolution of TDSQL. Now what should the future of database evolution look like? First of all, I said that the replacement of the database must not be a long march road again, can not replace the commercial database with domestic database, but not good, so the user is unable to accept, so it must not be the degradation of the function and experience, but should be upgraded, but how to upgrade? I think there are five things we should be thinking about:
The first is distributed, distributed can solve the elastic stretching, fault self-healing, the two are equally important, more important is when the failure does not need emergency intervention, it can self-healing.
The second is intelligence, applying artificial intelligence technology to the database, such as to solve the core challenges faced by DBAs in the past. A large part of DBAs used to optimize indexes for the database, so Oracle prioritized the intelligent indexes, which became automated. After the basic work was completed, the indexes became intelligent, and the execution plan was controllable, so the database did not need much human intervention.
The third is multimodularity, which is controversial. I mentioned earlier that Amazon replaced 7,500 Oracle databases with its own cloud database. How many cloud databases can you imagine replacing 7,500 Oracle databases? It could be 75,000 or even 750,000. Why? Because Oracle is a multimodal database, it can store various types of data, such as large objects, IoT, etc. Although this hybrid storage is not always optimal, it is the simplest for the user to present the interface. I think the development of the database must also be a process of on-off, towards points, will eventually go together, a single interface output for the user is the best, if scattered, it is necessary to achieve a high degree of automation.
The fourth is the integration of hardware and software, database technology must not be independent development, it must be able to rely on the progress of hardware technology to achieve excellent performance improvement, so why many databases today are building an all-in-one machine? The overall performance can be improved through high-speed network and high-speed NVME storage, so the coordinated development of hardware and software is indispensable. In the future, the core of database technology should be pushed down to the CPU processor level for optimization. I think this can be done, and it will certainly be done in China in the future.
The fifth is the cloud convergence, the cloud will become the only form in the future. Although the public cloud and the private cloud are still two worlds, they will converge, the technology stack and user experience will converge, and the experience on the cloud and under the cloud is the most important trend in the development of database technology.
So these five ideas, taken together, represent some of my judgments on the future development of databases. Repeat, if hope to allow the user to get a better experience domestic database rather than the drop, there is only one path to make up for, is the automation, ecological and through automation tool to solve this problem, the kernel may also can not compare with the international first-class standard, but automation can help the user to the underlying complex infrastructure.
One last one, DBA’s Song of Ice and Fire. Many DBA friends once fell into confusion, especially in the Oracle camp of DBA, he said in the wave of domestic database Oracle DBA can still survive? A lot of people often ask me will I be eliminated by history? I said no. First of all, we think that data is going to be a core asset of the digital enterprise of the future. It’s going to be more and more important. Second, the operation of the entire database is relatively complex, which involves host, network and storage. It is a full stack skill demand and has certain technical barriers, so DBAs should not be afraid of unemployment. Third, from the original database management, all the experience you accumulated can find a broader employment space today. If we fully understand Oracle or MySQL technology, you can even become a product designer, product manager, kernel developer of a domestic database. If you can turn the advanced technology of foreign database into product design and implementation drive, you will become the core driving force of our domestic database. Therefore, our road in the domestic era is not narrowed, but widened, and our career path is more broad and bright. DBA is a group of people who are very willing to explore, share and summarize. As long as we have such ability, it is not complicated to transfer our original learning to the new technical route.
Finally, I have a thinking, the new era, we do the database to remember these two words: a main and a standby double engine, commercial open source two appropriate. Learning only one database is not enough. Business and open source need to be exposed, learned, and calibrated to each other. There is one major, one minor, and you can choose two subjects. In this domestic era, from the short term we tend to overestimate the difficulty, from the long term we always underestimate the opportunity, the prophet, if you recognize the change of this industry, as soon as possible transformation, as soon as possible into the wave of domestic database, then we are the first batch of people to seize the opportunity. Only by stepping into the game can one step ahead.
The lecturer introduction
CEO of Cloud and Enmo, Tencent Cloud TVP
Founder of Yunhe Enmo, Chairman of ACDU. One of the most famous promoters of Oracle technology in China, his books such as “Deep Analysis of Oracle” and “Progressive Oracle” are widely praised by Oracle technology lovers. In 2009, Mr. Gai Guoqiang founded Yunhe Enmo, which is committed to providing professional data services, products and solutions for Chinese users. The DBPaaS products and expert services of Yunhe Enmo have served more than 500 enterprise customers at home and abroad. In 2019, he founded MotianWheel Technology Community and ACDU (All China DBA Union), which is committed to the continuous dissemination and promotion of data knowledge and applications.