OB junOceanBase 2.0 was announced on September 21 at the Computing Conference. We will continue to introduce the”
OceanBase 2.0 technical resolution seriesIn this paper, we analyze the new features of OceanBase 2.0 products and the technical principles behind them from the aspects of operational and maintenance, distributed architecture, data availability, cost performance, and compatibility.


After three articles on o&m, today we’re going to start with distributed architecture and share our experience with relational database architectures on load balancing. More exciting welcome to pay attention to the OceanBase public account continue to subscribe to this series of content!



Qing Tao

Currently, HE is a technical support expert of Ant Financial OceanBase database, responsible for the external technical promotion and support of OceanBase database. I have supported B2B, Tmall and Ali Cloud services, and I am familiar with the operation and database architecture design of Oracle/SQL Server/MySQL/OceanBase.

The introduction

With the popularity of Internet services, the storage and access of massive data has become a common problem in application architecture design. How to maintain the stability and scalability of the database when the requests of millions and tens of millions of applications fall on the database during the business peak? Splitting data together with load-balancing design is the preferred method. This article summarizes the experience of relational database architecture in load balancing, and then introduces the unique charm of OceanBase distributed database load balancing.

The traditional understanding of load balancing

Load balancing is a design that converges all requests with a single central entry and then distributes them to multiple back-end nodes for processing according to a certain algorithm. The processing result can be directly returned to the client or the load balancing center and then returned to the client.

Based on this principle, load balancing design can work at layer 2 (data link layer MAC), Layer 3 (network layer IP), layer 4 (transport layer TCP), and layer 7 (application layer) of the OSI Layer 7 model. As you go up, the principles become more complex, the designs become more intelligent and flexible.

There are two types of load balancing products on the market. There are hardware implementations: stand-alone hardware such as the F5, and those that are integrated into network devices. The other is software, which is installed on a central server. Such as LVS, HAProxy, Keepalive, DNS LoadBalance and so on. The advantages of the software solution are simple configuration, flexible operation, low cost; Disadvantages are system dependency, additional resource overhead, and not very high security and stability. The advantages of the hardware scheme are independent of the system, good performance, more strategies; The downside is that it’s expensive.

There are various request forwarding algorithms, such as Round Robin (RR), Weighted Round Robin (RR), random, random, response time, minimum number of connections, and DNS balancing. You can select these algorithms based on actual scenarios.

The request can be forwarded to any back-end node, indicating that the back-end nodes are stateless and clustered and distributed. So load balancing must be used in a distributed cluster architecture.

Load balancing design of centralized relational database

1. Cluster the database

The architecture of commercial relational databases was centralized in the early days, and only the active/standby architecture was used for high availability and disaster recovery. Later, in response to performance growth, clustered databases were developed. The architecture of the cluster database is to separate the instance and data files, and the data files are placed on a shared storage. The instance nodes are horizontally expanded to multiple, sharing the same data file with each other.

Instance nodes are distributed. On each instance node, a database listening service is configured to listen for multiple VIPs (local and remote). Listening services are also small clusters of each other and decide where to forward requests based on the load information of each instance node. When a new instance node is added, load balancing is performed on requests from each instance node to balance the pressure.

[Figure 1] ORACLE RAC architecture

The problem with clustered databases is also obvious: the data store is not distributed. Once the shared storage fails, the entire database cluster becomes unavailable. So this distributed architecture is not perfect.

2. Distributed database middleware

With the development of middleware technology, a kind of distributed MySQL cluster appears. The idea is to split the data horizontally across multiple MySQL instances and then deploy a set of middleware clusters on the MySQL front end to respond to client SQL requests. The middleware is distributed and stateless with logic for parsing SQL, routing, and data aggregation calculations. Client requests are received on the middleware front end through a load-balancing product (such as SLB or LVS) and distributed to various middleware nodes.


[Figure 2] Architecture of distributed database middleware

This is a distributed database middleware to make up for the traditional centralized database distributed defects. It distributes compute nodes and can scale horizontally. At the same time, the data is split horizontally across multiple stores, which is also distributed. Compared to cluster database architecture, distributed database middleware can provide higher performance, better scalability, and lower cost.

In this distributed architecture, compute nodes scale easily because they are stateless. Data nodes are not so easy. Because it involves data redistribution, new instances need to be added, as well as data migration and splitting. These must be implemented with the help of tools outside the database. This distribution is still not perfect. Because its data storage is static.

Load balancing design of distributed database

Google published a paper on the architecture and operations of the Spanner&F1 product, leading to the development of NewSQL technology, specifically distributed database technology. This is a true distributed database architecture. In this architecture, data is sharded across multiple nodes and can be freely migrated between nodes without external tools.

[Figure 3] Google F1 architecture

Google’s F1 node is stateless. Client requests are load balanced to an F1 node. F1 does not store data but requests data from Spanner. If F1 is added to the machine, it will automatically do the load balancing of the sharding. The load balancing logic of distributed databases described by Google should be the design goal of a distributed database architecture.

In 2010, Alibaba and Ant Financial began to independently develop OceanBase, a distributed relational database. In terms of load balancing design, OceanBase not only implements automatic load balancing among nodes and data storage redistribution after nodes are added or reduced, but also supports load balancing in various service scenarios.

Load balancing design for OceanBase 2.0

1. OceanBase architecture

Before looking at OceanBase’s load balancing design, take a look at the Architecture of OceanBase version 1.0. In version 1.x, an observer process includes both computing and storage capabilities. The status of each node is equal (the three nodes with the master control service rootservice are slightly special). The data of each node in each zone is part of the total data, and each data exists in several other zones, usually at least three copies. Therefore, OceanBase computing and storage are distributed according to the architecture diagram. The difference with Google is that in version 1.x, computing and storage are not yet separated. However, it does not affect its function as a distributed database.


[Figure 4] OceanBase 1.x architecture

2. Partition, copy, and OBProxy

The minimum granularity of data distribution in OceanBase is Partition. A Partition is a Partition or secondary Partition of a partitioned table, or a Partition of a non-partitioned table. The load balancing principle of OceanBase is related to partition. Each Partition has at least three copies of data in the entire cluster, which is commonly called three replicas. Three replica roles: one leader copy and two follower copies. The leader copy is usually read and written by default. The default location (zone) of the leader replica is affected by the default locality and primary zone attributes of the table. In the remote multi-live framework, the role of this locality attribute is fully played.

The Leader copy is distributed in OceanBase and is transparent to the client. Therefore, a reverse proxy OBProxy is required for client requests. This OBProxy is only routing, it will only send SQL requests to a node where the leader copy of the table is located. Unlike HAProxy, it does not have load balancing capabilities, nor does it require. However, OBProxy is stateless and you can deploy more than one and then add a load balancing product to the top for OBProxy’s own load balancing and high availability.

[Figure 5] OceanBase partition, OBProxy

3. Measurement criteria for load balancing

The status of each node (OBServer) in OceanBase is equal (the node where the master control service Rootservice is located has a slightly special role). In the current version, each OBServer sets computing and storage capabilities in a process (2. Later versions will release computing and storage separation). OceanBase balances resource utilization and load on each node as much as possible. This effect is that when several tenants with different resource specifications and sizes run in a large OceanBase cluster, OceanBase will avoid high resource utilization on some nodes while other nodes are not accessed or have low resource utilization. This measure is more complex and is still in constant exploration and improvement. So there is no definite straightforward formula to describe it directly. But can introduce the following factors will be related to:

  • Space: The total space used by the Partition on each node and the proportion of its quota. The total space of each tenant’s Unit on the node and the proportion of the Unit’s quota
  • Memory: the memory allocated for each node’s OB and the proportion of its quota
  • CPU: The total number of cpus allocated to units in each node and the percentage of their total quota
  • Others: Maximum number of connections, IOPS, etc. These specifications are currently defined, but are not yet used for balancing policies
  • PartitionGroup: specifies the number of partition groups on each node. Partitions of the same number in the same table group are a partition group. For example, if table A and table B are partitioned, partition no. 0 is A partition group, and partition No. 1 is A partition group. And so on.

4. Logical evolution of load balancing

The principle of OceanBase load balancing is that the number of leader replicas in each observer node is adjusted internally to indirectly change the request volume of each node, thus changing the value of some load balancing measures in each node. Data migration is automatically performed internally and does not depend on external tools or THE DBA. Therefore, the impact on services can be controlled.

In distributed mode, cross-node request latency has a performance cost. This is especially true when the cluster is deployed across multiple rooms. So, uncontrolled load balancing is not good for business. At the service level, some tables are related to each other. Therefore, it is better to read the corresponding Partition on one node. In addition, OceanBase’s architecture supports multi-tenant features. Data stores under different tenants are allocated in one or more units. Partition is not a random free distribution. For services, it makes sense to control load balancing policies. It facilitates the design of the overall application architecture to maintain the consistency between the upper-layer application traffic distribution rules and the lower-layer data splitting rules, which is the premise of remote multi-live. It also defines the boundary of load balancing capability.

Therefore, the Partition distribution is actually limited by several rules. First, all partitions under each tenant are in one or more groups of units. The Partition migration cannot be outside the Unit range. There are many tenants in an OceanBase cluster, so there are many units. OceanBase first adjusts the distribution of units among nodes. This is called “Unit equilibrium”. However, the specific process of Unit migration is migrated one Partition at a time. Second, partitions in the same partition group must be on the same node. The final state of Partition migration is that partitions in the same PartitionGroup must reside in the same OBServer node.

[Figure 6] Tenants, units, Partitiongroups, and partitions

A load balancing policy is to add rules and set boundaries for Partition behavior. As the scale of the business increases, the business will split the overall data into multiple tenants by module. So there are some tenants whose data is associated at the business level. This is the concept of a TenantGroup. During load balancing, it is considered that partitions in units of the same TenantGroup are allocated to the same node. The purpose of this is to avoid applying a business logic multiple database calls across machine rooms.

In conclusion, the multiple load balancing policies in OceanBase aim to balance resource utilization and minimize the impact on performance. The load balancing policies and rules will be further improved with the expansion of service scale and further use.

5. Load balancing risks and practices

The effect of load balancing may be different in different service scenarios. It needs to be done on a case-by-case basis.

If the cluster structure changes frequently, such as frequent machine changes and additions, or tenants are added and deleted, frequent load balancing operations may occur. The internal data migration consumes CPU and I/O resources of the node. Some services that are sensitive to database performance may be affected. Therefore, operation and maintenance personnel should apply some hard rules and restrictions of load balancing appropriately. For example, setting the Locality attribute of the tenant or table limits the leader partition to several acceptable zones. For example, set the same table group for the tables that are closely related to related services. You can also turn off automatic load balancing if you often find that balancing operations have an impact after taking these actions. The allocation of all units and partitions can be specified by using commands, that is, manual balancing.

6. Separation of computing and storage

In version 2. X, OceanBase will implement a distributed storage architecture that separates computing and storage. In this case, compute node elastic scaling implements load balancing by migrating Partition, and storage node elastic scaling implements load balancing by migrating Block data. Similarly, all operations are asynchronous in the background, without the help of external synchronization tools, and without the intervention of operation and maintenance personnel. The existing high availability is not affected, and the impact on performance is controllable.

conclusion

Load balancing in OceanBase is to adjust the distribution of internal data partitions to indirectly change the requested volume of each OBServer node, thus changing the load of each node. Data migration in the adjustment process is carried out internally asynchronously, without relying on external synchronization tools, and does not require DBA intervention. During this period, the HIGH availability mechanism takes effect, and the impact on service reads and writes can be controlled.

The load balancing of OceanBase is restricted by certain policies to avoid significant impact on application performance. Unconstrained load balancing may not be worth the cost to the business.

The load balancing function of OceanBase is transparent to service and O&M personnel. The front-end reverse proxy OBProxy automatically updates the adjusted partition location information. OBProxy is stateless and can be deployed in multiple copies. In addition, products such as F5 or LVS can be used for high availability and load balancing of OBProxy. Note that this is not OceanBase load balancing.

Next up

This article is the fourth in the OceanBase 2.0 technical analysis series. The next article will provide a comprehensive overview of OceanBase 2.0’s global consistency snapshot function. Stay tuned!

Registration for OceanBase TechTalk · Beijing Is open now!

The second phase of OceanBase TechTalk offline technology exchange will be launched in Beijing on October 27.

At that time, OceanBase three heavyweight guests: Chen Mengmeng (wine man), Han Fusheng (Yan Ran), Qiao Guozhi (kenteng) will talk with you about OceanBase 2.0 version of the product new features and major technical innovation. Beijing Zhongguancun, we be there or be square!



Link: www.huodongxing.com/event/74630…