Author: Wang Jun

Operation and maintenance is difficult, difficult to heaven.

As PingCAP employees, we not only use TiDB ourselves, but also want people around us to use TiDB. But in this process, we found the following problems:

  • Difficulty in boarding a ship: The officially recommended deployment method TiDB Ansible has many restrictions and certain learning costs. It is easy for operation and maintenance students to get started, but it is not friendly for r & D students. User manuals are detailed but complex, with various limitations, especially when errors are encountered.

  • Difficulty in starting a ship: After using TiDB, subsequent operation and maintenance (such as capacity expansion and upgrade) of TiDB cluster, especially PD capacity expansion and reduction, has high security, but the operation is still a bit complicated.

  • It’s easy to stumble: it’s common to see developers trying to deploy TiDB using TiDB Ansible on their own machine without knowing how deep the water is, and accidentally changing the system beyond recognition.

In order to improve and solve these pain points, we organized a team to participate in TiDB Hackathon 2019 competition, wrote TiExciting project, and finally won the third prize with everyone’s recognition.

What is the current threshold for boarding?

Before taking part in the competition, we first evaluated whether the motion perception problem was a real problem, so we tested it on the new Hackathon Ucloud cluster:

  • Teammate A: Tencent wechat r&d, never contacted TiDB, first deployment:

    • TiDB Ansible’s lengthy deployment tutorial failed after 3 hours and was abandoned.
  • Team member B: PingCAP Senior customer support in East China, deployed TiDB for various commercial customers:

    • Proficient in operation, but often experienced various errors due to hand slip or environmental relations, with rich knowledge to know how to solve, and finally deployed in 20 minutes.

To sum up, the practice shows that both the novice and the veteran have to go through a very complicated process to start using TiDB products. In addition, TiDB Ansible also imposes high requirements on deployment environments, such as public cloud environments that simply fail entry detection, which can discourage new users.

How to get on board and sail quickly?

In order to solve the problem, we first carried out requirements design:

  • To be able to get on board quickly, the deployment process itself should be “fast,” with operations that can be done in parallel.

  • Intuitive, easy to use and clear, preferably without documentation: provide a graphical interface.

  • All features that conflict with ease of use need to be compromised.

    • TiDB Ansible enforces environment check → TiExciting does not prevent deployment (but gives Warning).

    • TiDB Ansible requires deployment from central computer → TiExciting can even be deployed from Windows and does not require configuration of trust.

  • Green environmental protection has moral integrity, there is no family barrel.

    • TiExciting allows users to optionally configure the system to run TiDB better, and users are aware of the changes to be made.

    • TiExciting allows users to select the components they want to install, following the principle of minimization.

The current installation is green on all installation paths. Environment check only generates alarms and is not recommended in the production environment. Later we will add strict mode for production cluster management. Always be in awe of the generated environment.

Hackathon results

The deployment of

Once the target machine is created (see how to add machines later), you are free to choose which machines you want to deploy. You can select desired components or deselect unwanted components on the interface. For example, if you do not want to monitor the gift package, you can simply uncheck “Monitor”. In addition, TiDB can also be unchecked, applicable to only want to use TiKV.

After components are selected, the system automatically generates a deployment scheme based on the number of nodes and desired components. Of course, as Hackathon works, the automated solution here is not necessarily the most reasonable, and in a real world scenario it is likely that users will want to further customize their topology, so users can drag and drop components on the interface to redetermine the topology, or add new ones. Generally speaking, if you don’t want to strictly experience distributed TiDB, including only one node, you can just use the default topology, which is very friendly.

Then demonstrate the effect of the final installation:

Management of the machine

After TiExciting is started, the interface will guide the user to add machines first, including filling in the connection mode and so on. You can specify the password or key for login. There is no need to create a special user or root user. You only need to fill in the usual connection mode of o&M personnel. Settings for adding machines can also be copied from existing machine Settings.

Advanced configuration allows you to specify location labels, including what rack the machine belongs to, and what room the machine belongs to. If there is a specified location label, the machine room and rack will be displayed on the interface after configuration, which is very intuitive.

The full Demo video can be found here: TiExciting Demo video

The technical implementation

interface

In order to display the interface across various platforms, TiExciting provides the interface in Web form, using the popular React + MobX solution. In this way, not only is TiExciting’s interface cross-platform, but it can also be accessed remotely from the user’s browser, even if TiExciting runs on a UI-less server.

cross-platform

TiExciting’s deployment logic and logic for responding to user actions are written in Python for cross-platform purposes. We envisioned the idea of using Python packaging tools to enable users to download, open and run without installation, but in practice Hackathon found that it was only an idea, and there were a lot of holes. The Python Runtime itself is large. If we had the chance again, we would probably switch to Golang. Generating a binary is really easy.

fast

In order to be as fast as possible, TiExciting will reuse files based on file hashes as much as possible, such as TiDB binary packages that have been downloaded and verified do not need to be downloaded again, the same as those that have been successfully deployed. TiExciting also implements the task scheduling mechanism of asynchronous directed acyclic graph. When all the prior tasks are completed, subsequent tasks can be executed, and tasks without dependencies can be executed in parallel, as shown in the figure below:

Talk about the future

Since Hackathon has limited time, there are a lot of features that we haven’t had time to do but want to do:

  • Better deployment planning.

  • Pure interface for capacity expansion and reduction.

  • Pure interface to update the cluster.

  • Pure interface management of the cluster (start, stop, update, scroll, etc.).

  • Bring the existing TiDB Ansible cluster under TiExciting management.

While Hackathon has come to an end, we hope to improve TiExciting in the future and make it a universal tool that everyone likes to use to help more people use TiDB and improve the complexity of operations.

pingcap.com/blog-cn/tie…