On 21 July 2021, Juan Pan, co-founder of SphereEx and Apache ShardingSphere PMC, was invited to deliver the “Apache ShardingSphere “keynote speech at AWS Cloud Computing Summit 2021 in Shanghai. Building an open source distributed database middleware ecosystem “.

She talked about the expansion of open source projects, community building, and how ShardingSphere practices “the Apache Way.” This article is a summary of Pan Juan’s views.

New ecosystem layers positioned above databases and below business applications

Different industries, different users, different positioning, different requirements. Today’s database is facing more complex data application scenarios than in the past, as well as more and more personalized and customized data processing requirements. The harsh environment forces different databases to constantly maximize data read/write speed, latency, throughput and other performance indicators.

Gradually, the clear division of data application scenarios leads to the fragmentation of the database market, and it is difficult to produce a database that can perfectly adapt to all scenarios. Therefore, it is very common for enterprises to choose different databases for different business scenarios.

Different databases present different challenges. At the macro level, these challenges have some commonalities on which a factual standard can be formed. When you can build a platform layer on top of these databases that can uniformly apply and manage data, you can develop a system according to certain fixed standards, even if the underlying database differences still exist. This standardized solution will greatly reduce the stress and learning cost of managing the basic data infrastructure for users.

Apache ShardingSphere is the platform layer. Because of its reuse of raw databases, it can help technical teams develop incremental capabilities such as sharding, encryption, and decryption. It does not take into account the configuration of the underlying database and can mask user perception. As a result, it can quickly connect to business-oriented databases in a direct manner and easily manage large data clusters.

How to practice the Apache approach

As enterprises become larger and larger, a database can no longer support a large amount of business data, so it is necessary to scale the database horizontally. This is the problem with distributed management. ShardingSphere builds a hot-plug function layer on the database, which provides traditional database operations while shielding users’ perception of changes in the underlying database, enabling developers to manage large-scale database clusters by using a single database. ShardingSphere mainly includes the following four application scenarios.

  • Shard strategy

As the traffic volume increases, the data sharding pressure increases, so the sharding strategy becomes more and more complex. ShardingSphere allows users to release more fragmentation strategies beyond horizontal scaling in a flexible and scalable manner at minimal cost. It also supports custom extensions.

  • Read and write separation

In general, a master-slave deployment can effectively relieve database stress, but if a machine or table in a cluster fails, it cannot be read or written. This problem can have a significant impact on the business. To avoid this, developers often need to rewrite a set of high availability policies to change the position of the master/slave tables between reads and writes. ShardingSphere can automatically explore all cluster states, so it can immediately detect problems such as unreliable requests, and master/slave switching of databases. It can also automatically restore the old master/slave state, which is not perceived by the user.

  • Shard extension

As the business evolves, it becomes necessary to split the data cluster again. ShardingSphere’s The Scaling component enables a user to start a task with a single SQL command and show The status of The task in real time in The background. The old database ecosystem is connected to the new database ecosystem due to the “pipe-like” extension.

  • Data encryption and decryption

In database application, encryption and decryption of key data is very important. If a system cannot monitor data in a standardized way, some sensitive data may be stored in clear text and the user will need to encrypt it later. This is a common problem for many teams.

ShardingSphere standardizes this capability and integrates it into the middleware ecosystem so that it can automatically desensitize and encrypt/decrypt new/old data for users. The whole process can be done automatically. At the same time, it has a variety of built-in data encryption and decryption/desensitization algorithms, users can customize and expand their own data algorithms as required.

A pluggable database plus platform

ShardingSphere provides three ways of access for developers in different fields, facing various needs and usage scenarios. JDBC for Java, Proxy for heterogeneous databases, and Sidecar for the cloud. Users can perform operations such as fragmentation, read/write separation, and data migration on the original cluster based on their own requirements.

  • **JDBC access: ** An enhanced JDBC driver that allows users to fully use the JDBC mode as it is compatible with JDBC and various ORM frameworks. As a result, users can achieve distributed management, horizontal scaling, desensitization, and so on without additional deployment and dependencies.
  • Proxy access: A mock database service that uses Proxy to manage the underlying database cluster, meaning that users do not need to change their existing schema.
  • ** Cloud-based Mesh access: **ShardingSphere is a form of deployment designed for public clouds. SphereEx has joined Amazon Web Services’ (AWS) entrepreneurial initiative to partner with AWS in its China market and beyond, and to provide more powerful image broker deployments for AWS users. Aws and SphereEx will work together to create a more mature cloud environment for enterprise applications.

Open source connects individual work to the world

ShardingSphere is influential in its industry. Now, when users need to find tools for horizontal scaling in China, ShardingSphere is usually on their shortlist. Of course, ShardingSphere has grown thanks not only to the valuable contributions of the project maintenance team over the years, but also to the increasingly active open source community in China.

In the past, users of China’s open source community mostly downloaded programs and looked for code references, but they rarely participated in community building. In recent years, the concept of open source has become more and more popular in China. Therefore, more and more people with strong technical ability have joined the community. It is with their participation that the ShardingSphere community becomes more and more active. But how do you evaluate a good open source project? Its criteria are not limited to its concepts and technologies, but also its deep foundation in technological impact, open source impact, ecosystem expansion, and developer community.

To this end, ShardingSphere, as one of Apache’s top projects, is still actively calling for more people to join the open source community. These communities are a great way to broaden one’s horizons, be more open and cooperative, and rediscover one’s worth.

Project links.

ShardingSphere Github:github.com/apache/shar…

ShardingSphere Twitter:twitter.com/ShardingSph…

ShardingSphere Slack: https://bit.ly/3qB2GGc


Distributed Database Middleware Ecosystem Powered by Open Source was originally published in Nerd For Tech, and people continued the conversation by highlighting and responding to the story.