Cloud computing has brought great advantages in business flexibility, Ali cloud database senior product experts slow from the application of the change of architecture, customer practice cases, business analysis and other aspects of POLARDB in detail, and how to use POLARDB to design innovative Internet application database architecture.

Application Architecture Changes – Why do we Need Super MySQL?

POLARDB is 100% compatible with MySQL, with many times more performance than MySQL, as well as a single instance of the maximum 100TB of super storage space, can be understood as Ali’s own development of super MySQL. So why did we build such a super MySQL? We understand that this is an inevitable consequence of the distributed evolution of application architectures for the Internet. First of all, we need to review the history of application architecture changes. From the earliest CS architecture to BS architecture, from J2EE to Spring/Struts/Hibernate, and then to the current microservices architecture, there have been many generations of architecture transformation. From the business architecture of traditional application to the distributed application architecture of Internet, changes have taken place in all aspects. From the resource layer, to the data layer, middleware, application publishing and packaging as well as application framework, the perspective of development operation and maintenance has undergone the Internet distributed transformation.

  • Resource layer: Traditional applications use x86s, small machines, and storage devices. Internet distributed applications in the use of public cloud, private cloud, hybrid cloud, etc.
  • Data layer: Traditional applications use centralized commercial databases such as Oracle and DB2, while Internet applications use distributed databases such as MySQL, Redis, and HBase, which do not need centralized storage devices.
  • Middleware: Traditional applications use WebLogic, WebSphere, etc., while Internet applications often use Swarm, K8S, and Mesos in the transformation to microservices architecture.
  • Application publishing encapsulation: Traditional applications are developed in JAVA and published as war/ EAR file encapsulation, which is then published to middleware. Microservices architectures typically publish applications as mirror images of containers.
  • Application framework: Traditional applications usually use Spring, Struts and Hibernate to develop, while the current Internet distributed applications are more likely to use SpringCloud, Double, EDAS and other micro-service architectures.
  • Development operations: Traditional apps use controlled releases and conservative operations, where new features take weeks or even months to launch. The Distributed architecture of the Internet is more DevOps continuous integration, agile and rapid iteration.

We understand that the goal of these architecture changes in distributed Internet applications is to make the business more agile, more resilient and able to bear the high concurrency pressure from the Internet. Under the innovative architecture, business applications can be horizontally expanded at any time through microservices, but the pressure will not be dealt with, and the load will be directly transmitted to the data level, which solves the problem of application elasticity, but poses a greater challenge to the database. The distributed architecture of the Internet requires databases to be more agile, more resilient and less costly. (In traditional applications, an application may only need one database for bearing, but in distributed Internet applications, after micro-service transformation, a business system may need dozens or even hundreds of databases to bear, so the cost is also required.)

Actual combat – Ali Cloud database to prepare for business architecture changes

At present, the database form of Ali Cloud has covered 99% of the business scenes in the Internet. Relational databases include MySQL, SQL Server, PG, POLARDB. The NoSQL product family includes Redis, MongoDB, and HBase. It also has hybrid analytical data warehouse, distributed database DRDS, and database service tools (DTS, DBS, CloudDBA, DMS, etc.).

Evolution route

Ali cloud provides so many database products, how to choose in practical application? We are ready for the rapid growth and renewal of the business. This is how our proposed application architecture should evolve: in the early days of your business, MySQL is recommended to quickly build business applications. When independent MySQL cannot bear greater business pressure after growing up, it can do read and write separation based on MySQL without any modification. When we entered the rapid growth phase and read/write separation was no longer able to meet business requirements, we could migrate seamlessly to POLARDB without making any changes to the business system, and POLARDB’s read/write separation eliminates replication delays through shared storage, making it more suitable for scenarios with higher data consistency requirements. As the business grows further, a vertical spin-off could be made on POLARDB. Vertical split refers to the vertical split of business modules into different database instances, divided into multiple independent databases, such as user database, order database, warehouse, etc., so as to use more independent database union to cope with the pressure of business load. When the business develops to such a large scale and volume as Taobao, it is necessary to adopt DRDS for distributed transformation, multi-activity across computer rooms, and make unit-oriented transformation according to business separation. This is exactly the evolution path that Ali Taobao application has gone through and is effective.

Application link optimization – automatic read/write separation, short link optimization

We use database agents to optimize the link access layer. The standard mode for accessing a database is direct access to the master and read-only instances. In this mode, logical read/write separation is required at the service level. We provide a proxy pattern that completely decouples the business layer from the database layer. When accessing the database, do not need to directly connect to the database instance, but the connection to the business completely transparent Proxy, it receives SQL requests, speaking, reading and writing will be automated do after separation, all write operations is routed to the primary instance, and the read operation load balancing routing to read-only instance, so as to realize the automation of transparent to business separation, speaking, reading and writing. The proxy mode can not only realize read/write separation, but also transparently switch the faulty database. In both standard mode and proxy mode, when the primary instance fails, the backup instance can be automatically switched over to ensure database availability. However, in standard mode, services need to be reconnected to the database after the switchover, but through Proxy, service applications do not need to be reconnected, and the HA switchover is not felt. At the same time, the proxy mode also provides short connection optimization. For example, if the business is developed in PHP, it connects to the database by means of a short link. Each connection will generate a connection when accessing the database, which makes the database overloaded to handle the connection pool. Proxy can convert short links into long links and maintain connection pools independently. At the same time, the proxy mode also provides anti-brute force cracking function. For example, the Proxy can detect the repeated attempts of an IP address to re-enter the password and actively mask it.

Real-time analytics data Warehouse — POLARMPP, POLARDB’s best partner

Data processing can be divided into database ecology and big data ecology. Database ecology is suitable for processing transaction orders and other data consistency requirements of the scenario, but the processing power and processing magnitude is not particularly large. For example, when the order quantity is at 1TB or 2TB level, it can still be used, but once the number increases to 3TB~5TB, the performance of the single library will appear very large bottleneck, and the complex analysis query will make the database overwhelmed. The usual approach is to adopt big data ecology, and copy the data generated by online transaction processing into Hadoop ecology for real-time analysis through ETL or data replication. In the Hadoop ecosystem, the standard approach is to use MapReduce or Spark for data analysis, but developers are not used to MR or Spark, or Scala, and they are used to SQL. So in this model, SQL components like Hive and Impla are often available to developers so that they can still use SQL to process data. The problem with this approach is that there is a delay between the online transaction processing and the offline data warehouse, ranging from seconds to minutes to hours. And the data is actually stored in two copies, which is not economical.

For this situation, we provide POLAR MPP and HybridDB to solve the problem. It can handle data writing well and provide millions of TPS, which is ideal for storing user behavior, tags, logs, etc. This pattern can make millisecond response to ten billion large tables, do complex aggregation of multi-table associations, do multi-valued subcolumns, and full text retrieval. Most importantly, it can share one data with POLARDB, which greatly alleviates the problem of storing two data in database ecology and big data ecology, and the latency of reading and writing.

Business Scenario Analysis — Practice of Innovative Internet application scenarios

With cloud native database as a weapon, how should Internet innovative business scenarios be designed? Before talking about innovative business, let’s take a look at the traditional MySQL master N slave architecture, how to build data warehouse to drive BI reports to achieve business intelligence. The problem with this architecture is that N copies of data need to be stored for synchronous replication. Data is replicated between the MySQL master and slave, and from the business database to the analysis database.

So how to design a system architecture with cloud-native POLARDB? In this process, POLARDB and read-only analysis library constitute the cloud native data cluster, which is uniformly shared storage by POLAR Store. Business applications will write online business in POLARDB. When POLARDB’s mode of one master and one slave is not enough to cope with, it can be rapidly expanded into one master and two slave or even N slave. This extension differs from MySQL in that it provides agility and business resilience. If the data volume is large, the MySQL read-only library may take hours to generate. Creating a read-only library in the POLARDB ecosystem takes only minutes, regardless of the amount of data. And only one piece of data is required to drive business reports via POLARMPP.

The cloud native architecture brings the following service benefits: 1. Service compatibility without changing applications: As long as the service system is developed using MySQL, 1. Migrate seamlessly to POLARDB. 2. Read/write separation: With POLARDB, one piece of data can achieve read/write separation of multiple nodes, and support minute scale expansion. If MySQL is used to achieve read and write separation, multiple read-only libraries need to be generated through data replication, which wastes time and space. 3. Real-time analysis and data sharing: In data warehouse and BI analysis services, only one piece of data is required, without data replication. 4. Read-only instances share one copy of data: Since only one copy of storage is required, POLARDB’s price is 44% lower than MySQL’s with one master and five slaves. It offers higher cost performance in addition to more powerful performance. 5. Millisecond delay: Since the master and slave libraries share a copy of data, there is only a millisecond delay. When the primary node fails, no data is lost during the switchover. 6. Data consistency of session-level read/write separation: In financial service scenarios that require high consistency, the read consistency is very high and the data delay of seconds or even milliseconds is hard to tolerate. POLARDB can be used to achieve data consistency read in session. 7. Pay on demand, second backup: When using MySQL, if we expect to use 500GB of storage, we need to buy 500GB of storage, but in reality we may only use less than 100GB of data, but we still need to pay for 500GB of reserved storage. But POLARDB does not need to do space reservation, storage on demand. At the same time, POLARDB through data snapshot can achieve backup and recovery in seconds, more conducive to our database security operation and maintenance, bring more value.