The author is BBB 0 Chen Yu

Recently, a serious flooding incident broke out in a community group in Tdengine. Several group friends chat without rest, can be said to forget to eat and sleep. So what is it that keeps them talking at four in the morning?

The topic is how to improve the clustering of TDEngine in Docker environment. “What? How can any of your users, other than your own official people, have been working overtime to discuss how to improve the clustering of Docker environments? That’s bogus.”

OK, let’s admit it: a user named Oliver was having a problem with his Docker Tdengine cluster, which failed to connect to the client. This then triggered a discussion between the two enthusiastic leaders in the group until they came up with a final solution.

This is how it happened:

The user’s database cluster is installed on this Linux server (IP :10.0.31.2), and the network where the container IP is located is the virtual network 172.19.0.0/16 created by Docker on the host machine. The hostname and node IP of the three containers are: TaosNode1 (172.19.0.41), TaosNode2 (172.19.0.42), and TaosNode3 (172.19.0.43).

Each node is configured as follows:

taosnode1: firstEp=taosnode1:6030,secondEp=taosnode2:6030,fqdn=taosnode1; Firstep = TaosNode1:6030,secondEp= TaosNode2:6030, FQDN = TaosNode2; Secondep = TaosNode2:6030, FQDN = TaosNode2; Firstep = TaosNode1:6030,secondEp= TaosNode2:6030, FQDN = TaosNode3; Secondep = TaosNode2:6030, FQDN = TaosNode3; Port mapping: 36030-36042:6030-6042 (TCP/UDP)

After a lot of fiddling with the official documentation, Oliver finally set up the cluster. After adding the nodes, he nervously tapped “Show dnodes”, and when three READY came into view, he felt comfortable.

The server is fine. Next it’s the client. He opened one of his Windows hosts with an IP of 10.0.31.5 (the same network segment as the cluster host), quickly installed a Tdengine client on it, added hosts information, did the routing, 2.8MB, fool-setup, easy to connect to the cluster at one go. “Show dnodes” again with three READY — again comfortable.

Oliver is satisfied, however, and soon discovers that things may not be as simple as they seem.

Due to business needs, he also needs to complete the client (10.0.2.61) to connect the server cluster across network segments (based on the cluster under the Docker environment of IP :10.0.31.2). Ping hosts, Telnet ports mapped by the cluster, and TAOS connections to the cluster are just as smooth as ever. Then he clicked “Show dnodes” again – to his surprise, the “DB error: Unable to establish connection” which is hated by all Tdengine users appeared. So he threw out his own question to the group.

The two enthusiastic students mentioned above appeared at this time. One is Tdengine’s external Contributor, Freemine. The other is Pigwing, a zealous big man who draws a knife to help the problem of Lu See.

Since the cluster itself does not have any usage problems, the only difference is that the client connects to the server across network segments. So, at first, the idea was to connect directly to Docker’s IP instead of using the host’s port. Unfortunately, the idea of connecting internal IP in Docker environment across network segments has not been realized.

It is then assumed that TDEngine relies on Endpoint (EP) to identify the data node, where EP=FQDN+ Port. But the client connection has been successful, but the data cannot be operated, in the case of FQDN is correct, we guess that there is a problem with the port in the cluster, so we did not get the topology information of the cluster. Then, from the initial understanding of the environment to the step-by-step troubleshooting, the three persevering engineers discussed in the group from April 22 to April 25. At the latest, people were online at 4:00 am.

Finally, after a lot of trial and error, at 1am on April 24, Freemine came up with a final solution that worked.

Done, after testing, all is well!

So, what’s the difference between Freemine’s clustering solution and the original clustering solution?

The process is tortuous, but in the end, a closer look at the two solutions reveals that the only difference between them is in port configuration. Freemine’s solution is to change the value of the serverport on each single machine. The ServerPort of the TaosNode1 node is 6030 — the port 6030 of the mapped host; The ServerPort of the TaosNode2 node is 7030 — the port 7030 of the mapped host; The ServerPort of the TaosNode3 node is 8030 – the port 8030 of the mapped host.

The original serverPort of each node of Oliver is the default 6030 without modification, which is 16030,26030,36030 when mapped to the host machine. This configuration does not cause problems when the client connects to the same segment of the cluster host, but rather when it connects across segments.

How could a small change seem to make such a big difference? According to?

In fact, when the client and the server belong to the same network segment, after adding routing, the client can directly access the internal Docker. This way, the IP address can be correctly resolved as needed. Such as: TaosNode1 (172.19.0.41), TaosNode2 (172.19.0.42), TaosNode3 (172.19.0.43). With different IP addresses, even if the port is the same as 6030, TDengine can still distinguish between different nodes.

However, things are different when you cross network segments. For the client and server of different network segments, the client needs to connect to the server through the real route, but the internal Docker network we set is not registered in the real route, so the client naturally cannot access the internal Docker network. Therefore, when TAOSC needs information about the different nodes provided by the cluster, FQDN can no longer properly resolve IP addresses. At this time, it is necessary to realize the distinction of different nodes through ports.

This is why port 6030 can no longer be used simultaneously on nodes in a Docker environment.

Therefore, when you use a consistent port map inside and outside Docker host, and the serverPort parameter is set differently for each node, the cluster can differentiate between nodes by port. In this way, the client can get the topology information to operate the cluster smoothly.

This is the final answer to the whole “case”.

To sum up, for users, building a Tdengine cluster in Docker environment is quite a bit of water. Due to the complexity of the environment, this is not a very recommended approach to clustering. Therefore, caution should be exercised regarding the use of Tdengine in Docker environments.

Finally, we would like to say that as an open source product, the active and professional community is what we at Taos Data focus most on. There is currently no documentation on TDEngine cluster setup in Docker environment on the official website. But the active thinking of these community users clearly fills a gap.

My sincere thanks to Oliver, Freemine and Pigwing. We sincerely hope that you can continue to be active in the forefront group of big data technology in the Internet of Things in the future, and we also hope that more friends can participate in it.

Click “here” to view the Tdengine cluster setup notes compiled by Oliver in Docker environment.