Why is my cluster data unevenly distributed? This article is written to solve this problem. But it is recommended even for users who do not experience uneven distribution of TDengine cluster data. Because you’re probably only using clusters in general scenarios right now, when specific scenarios arise, a deeper understanding of cluster parameters and database architecture principles will really give you a head start. How to set up a cluster is not the topic of this article. Please strictly follow the instructions in the official documentation.

The official documentation address: www.taosdata.com/cn/document…

To fully understand the content of this article, it is important to understand the concept of vNodes — each Vnode is a relatively independent unit of work, a basic unit for storing sequential data (tables), with independent running threads, memory space, and persistent storage paths.

If you don’t think it’s clear enough, read on, and you may suddenly see the light as you connect the dots. Now, let’s give specific analysis based on different scenarios:

Generally speaking, there are two types of uneven data distribution.

Scenario 1: Tables are not evenly distributed

This can happen when you need to test the performance of a database with a small number of tables: you build 1200 tables, but find that 1000 tables are in the same Vnode and only 200 tables are in another Vnode. The downside of this scenario is that most tables enter the same VNode and data is not evenly distributed. It also results in only two threads working for TDengine, so you can’t take advantage of the multiple cores of your computer (assuming your server CPU is dual-core or more), wasting TDengine’s horizontal scalability.

I’ll start with a brief explanation of why this happens — there are three parameters in TDengine:

  • MaxVgroupsPerDb: Maximum number of vNodes (a single copy) that can be used in each database. Default: 0;
  • MinTablesPerVnode: Minimum number of tables that must be created in each Vnode. The default value is 1000.
  • TablelncStepPerVnode: Increment step of each Vnode after the minimum number of tables is exceeded. Default: 1000.

TDengine relies on these three parameters to control table distribution during continuous table building. Open another TaOS window and type the show vgroups command all the time to see the table creation process controlled by the above parameters: In the first Vnode, the number of tables increases gradually from 0. When the number reaches minTablesPerVnode, the next Vnode is created and the tables are continued. The process is then repeated until the number of VNodes reaches maxVgroupsPerDb. TDengine will then go back to the first Vnode and continue to create new tables. After the number of tables in each Vnode reaches the number of tablelncStepPerVnode, TDengine will continue to create tables in VNode with tablelncStepPerVnode as step size until all tables are created. (The description looks tedious, it will be very intuitive to run again by yourself)

Now that we understand this logic, we can go back and see why scenario 1 happened. The minTablesPerVnode parameter is set to 1000 by default, so the first 1000 tables will appear only in the first Vnode. This gives the user the impression that data is not evenly distributed.

So how to adjust into the effect we expect.

Don’t worry — we need to know a little more about this parameter before we do: maxVgroupsPerDb.

You may have noticed in taos.cfg that maxVgroupsPerDb is set to 0 by default, which indicates automatic configuration. However, the details of this automatic configuration are explained on the official website: “Each Database can create a fixed number of vgroups, which default to the same number of CPU cores and can be configured with maxVgroupsPerDb.”

If you are not familiar with the concept of Vgroup, you are advised to visit the official website:

www.taosdata.com/cn/document…

Combined with the concept of vNodes mentioned at the beginning of this article — each vnode has its own running thread, so that it becomes obvious — the reason for the default is that the number of VNodes equals the number of CPU cores to maximize CPU resources and server performance.

Ok, now we can finally go back to the scene and present our solution.

Assuming we are a 4-core server, the default maximum vgroups is 4. In scenario 1, you only need to store 300 tables per Vnode. In this case, you only need to change the minTablesPerVnode value to 300 to evenly distribute tables among data nodes.

Scenario 1 is a relatively special case, and the default configuration is only for the most common scenario. For most users, as long as your test scenario is created at the 10000 table level, you will be able to see the tables in vNode sorted and evenly distributed. Say you have a machine with a 4-core CPU and plan to create 40,000 tables. Using the default configuration, four VNodes are generated, and the end result is 10,000 tables per vnode, with no data imbalance.

Here we can discuss more, since the default configuration of maxVgroupsPerDb is to make full use of the CPU. So what is the value of this parameter? Why is it a public parameter if it has no modification value?

Don’t worry, let’s move on to the next scene.

Suppose you have 4 CPU cores, but let’s set maxVgroupsPerDb to 8. The data in the first four Vnodes is hot, and the data in the next four vnodes is used less frequently. This may seem like a lot more VNodes, but it will reduce the load on the first four VNodes and increase performance compared to using the default configuration. So, while the default parameters are generic, sometimes appropriate changes can make the database better, and it is up to the user to evaluate it in the context of the actual business scenario. Currently, this parameter is still a global variable, but in the future it may become a database-level variable to facilitate business differences between different databases.

However, the two variables minTablesPerVnode and tablelncStepPerVnode do not appear in taos. CFG because they are relatively complicated to use, and belong to the “mysterious parameters” mentioned in the title. While they set appropriate defaults (1000) for most scenarios, it was up to us to manually modify these parameters when the business scenario changed and didn’t fit. The modification method is as follows: Open taos. CFG, add the following information in the blank area, and restart the database service process. In a cluster environment, perform the same configuration in taos. CFG on each node and restart the database service process.

Scenario 2: VNodes are not evenly distributed

In this case, it’s easier to batch create a lot of tables, and at some point you might say, “Gee? Why there are 6 VNodes on this data node and 2 vNodes on the other node?”

Before we explain this situation, we need to know another important TDengine concept, mNode (management node).

The official documentation for MNode is clear:

“Management node (MNode): A virtual logical unit responsible for monitoring and maintaining the running status of all data nodes and balancing loads between nodes. Management nodes are also responsible for storage and management of metadata (including users, databases, tables, and static tags). Therefore, they are also called Meta nodes. In a cluster, multiple MNode duplicates can be configured to ensure high availability of MNodes. The number of duplicates is determined by the system configuration parameter numOfMnodes. The valid number ranges from 1 to 3.”

It can be seen that mNodes and Vnodes are distributed in master-slave mode. Therefore, mNodes are also counted in data load balancing. One Mnode is equivalent to n VNodes, and the N is controlled by the following parameter.

The following command is used to check the allocation of mNodes:

Now we can see why scenario 2 is the case — a data node with only two VNodes must have an MNode working on it.

That’s what took the vNode slot that should have been here.

The tail down

This is enough, but as a writer I would like to chat with you.

TDengine is easy to use, but maximizing its value takes some research. The official documentation exists as an encyclopedia of TDengine, and to some extent it may not be suitable for concatenating real-world application scenarios. Therefore, in order to fill this gap, we will pay attention to the combination and series of products and actual scenes in the articles we push. The ultimate goal is to make our paper a closed loop specification for many practical use scenarios. This makes the TDengine user experience significantly better.

Empty words without proof, how can we achieve the series of different practical application scenarios. Quite simply, by the end of this article you have probably learned how to properly configure data distribution for a cluster. Combined with the article “How can TDengine users Optimize data write speed?” , you already have some understanding of writing data performance debugging. In Docker environment, TDengine client can not connect to the cluster. , you also have a certain understanding of TDengine cluster construction in docker environment.

Building TDengine’s knowledge system, like building TDengine’s cluster, needs to be gradually accumulated step by step.

Open source culture is all about helping each other, and the community has contributed a lot. So we want to do the same for you.