Elasticsearch (Elasticsearch

.

This is a question that a Tencent bigshot launched a discussion on the wechat group of Elasticsearch technology exchange in April 2020. The answer was initially discussed before, but it was not detailed enough or explained thoroughly, so it has not been written.

This time, with practical verification, to be clear.

1, the problem

Still don’t quite understand the difference between seed_hosts and cluster.initial_master_nodes.

  • 1. Master eligible nodes are available on seed_hosts.
  • Master eligible data nodes are available on the screen
  • 3. How are potential machines found?
  • 4. Master eligible nodes
  • 5. Are these nodes always present when the cluster is initially started?
  • If you log in to the cluster, you can configure one of these nodes. If you log in to the cluster, you can add master eligible nodes.
  • 7. After adding a few more, is it ok to remove several of the initial_master?
  • If the current master of a cluster is 7, its quorum is 4. Does ES support slowly dropping nodes and quorum slowly dropping?
  • 9. If you remove 3 nodes slowly and the original cluster works normally, then the network partitions are together after the three nodes are restarted, will it form a cluster by itself?

I hope you guys have some… .

2. Break it down

The core of the question is: What is the difference between seed_host and cluster.initial_master_nodes?

2.0 Cognitive premise

To avoid cognitive bias, use common English words as basic definitions:

  • Discovery: Discovery is the process of discovering other nodes between nodes before forming a cluster. This process runs when you start the Elasticsearch node, or when the node thinks the master node has failed, and continues until the master node is found or a new master node is selected.
  • Master-eligible Nodes: Indicates the candidate primary node
  • Master-ineligible Nodes: indicates that the primary node is not a candidate
  • Co-ordinate-only nodes: coordinates only nodes
  • Data-only Nodes: indicates only data nodes
  • Seed hosts providers: the list of seed hosts
  • Voting Configuration: Voting configuration
  • I split my brain
  • Initial quorum — Initial quorum is required only when the entire cluster is first started

2.1 Responsibilities of the Primary Node

The primary node is responsible for cluster-wide lightweight operations such as:

  • Create or drop indexes
  • Track which nodes are part of the cluster
  • Determine which shards are assigned to which nodes.

Having stable primary nodes is important for cluster health.

The candidate primary node becomes the primary node through the primary election process.

In a cluster: There is only one master node after the election.

2.2 split brain

The following split brain is my popular explanation:

Suppose that in the process of primary node election in 2.1, two or more primary nodes appear in a cluster, that is to say, a cluster is formally divided into two or more isolated clusters, which is called split brain.

2.3 Two basic tasks of candidate master node

There are two basic tasks that candidate host nodes must perform together:

  • Election master node
  • Changing cluster Status

Even if some nodes fail, the above two activities should be kept running normally.

2.4 Voting Configuration

Each Elasticsearch cluster has a set of voting configurations, which is a collection of candidate primary nodes.

When will it be used?

  • First: elect the master node;
  • Second: submit the new cluster state.

When are decisions made? A decision is made only after more than half of the nodes in the voting configuration have responded.

General: The voting configuration is the same as all candidate primary sets in the cluster. However, in some cases it can be different.

Here’s one of the most frequently asked questions:

  • To ensure that the cluster is still available, do not stop half or more of the nodes in the voting configuration at the same time.
  • The cluster works as long as more than half of the voting nodes are available.

Such as:

  • If there are three or four candidate host nodes, the cluster can tolerate one candidate primary node becoming unavailable.
  • If there are two or fewer candidate primary nodes, they must all remain available.

You will see the conclusion further in practical examples.

2.5 Configure an odd number of candidate primary nodes during Cluster planning

There should usually be an odd number of candidate primary nodes in a cluster.

If there are even numbers, Elasticsearch excludes one of them from the voting configuration to ensure that it has an odd size.

To put it more colloquially:

Four candidate primary nodes are essentially the same as three candidate primary nodes, allowing only one candidate primary node to fail.

2.6 Discovery. seed_hosts and initial_master_nodes function

 

7.x 7. Before X
discovery.seed_hosts discovery.zen.ping.unicast.hosts
cluster.initial_master_nodes minimum_master_nodes  min_master_count
  • The ins and outs of discovery.seed_hosts

6. X 5. X corresponding name: discovery. Zen. Ping. Unicast. Hosts.

Take a look at the screenshot below for comparison: Except for the name, the interpretation part is exactly the same.

In a multi-node cluster, discovery.seed_hosts should be configured as a candidate primary node.

  • cluster.initial_master_nodes

This is also a 7.x feature, different from the previous setting of the number of candidate primary nodes min_master_count.

Vernacular: Set the host name list of the candidate host node.

On 7.x nodes, the discovery.zen.minimum_master_nodes setting is allowed but ignored.

When the cluster is first started, cluster.initial_master_nodes must be set to perform cluster boot.

During cluster initialization, cluster.initial_master_nodes should contain the name of the candidate primary nodes and be defined on each candidate primary node in the cluster.

Essential differences:

  • Cluster. initial_master_nodes: used only when the cluster is started for the first time.
  • Discovery. seed_hosts: required for each startup.

2.7 Discovery process interpretation

The Discovery process starts with one or more seed host lists and any known candidate host addresses in the cluster.

The process takes place in two stages:

  • First, detect the seed address.

Each node probes the seed address by connecting to each address and trying to identify whether the node it connects to is a candidate master node.

  • Second, if successful, it shares its list of all known candidate host nodes with the remote node, and the remote node in turn reciprocates.

The node will then probe all the new nodes just discovered, request their peers, and so on.

If the node is not a candidate primary node, it continues the Discovery process until the elected primary node is found. If the selected primary node is not found, the node will retry after the default value of 1s.

If the node is a candidate primary, it continues the Discovery process until it finds the primary of the election, or it finds enough candidate primary nodes to complete the election. Again, if neither of these methods proceed quickly, the node will retry after 1s.

This is a bit convoluted, so I need to read the English document several times to deepen my understanding of Discovery.

2.8 ElasticSearch. yml Configuration Caution

A production deployment of 7.x Elasticsearch now requires at least one of the following Settings to be specified in the ElasticSearch.yml configuration file:

  • discovery.seed_hosts
  • discovery.seed_providers
  • cluster.initial_master_nodes
  • discovery.zen.ping.unicast.hosts
  • discovery.zen.hosts_provider

2.9 Non-candidate primary nodes are ignored during discovery

In earlier versions prior to 7.x, it was possible to use non-candidate hosts as seed nodes during discovery or to transmit information indirectly between eligible hosts.

Clusters like this that rely on non-candidate primary nodes are very fragile and cannot automatically recover from certain failures.

After 7.x, Discovery only involves candidate primary nodes in the cluster and does not rely on non-candidate primary nodes as in earlier versions.

How to configure this in 7.x? Discovery. seed_hosts or discovery.seed_providers should be set to the addresses of all candidate primary nodes in the configuration.

2.10 About the timeout period for fault detection

7.x By default, if a cluster node fails to respond to three consecutive pings (each timed out after 10 seconds), the cluster fault detection subsystem will now treat it as a failed node.

Therefore, nodes with response times exceeding 30 seconds may be removed from the cluster.

7. Prior to X, the default timeout for each ping was 30 seconds, so unresponsive nodes might remain in the cluster for more than 90 seconds.

2.11 Deleting candidate primary nodes sometimes requires an exclusion vote

If you wish to remove half or more of the candidate primary nodes from the cluster, you must first exclude the affected nodes from the voting configuration using the Voting configuration Exclusion API.

The exclusion apis are as follows:

POST _cluster/voting_config_exclusions? node_names=<node_names> POST _cluster/voting_config_exclusions? node_ids=<node_ids> DELETE _cluster/voting_config_exclusionsCopy the code

If you remove less than half of the candidate hosts at the same time, there is no vote exclusion.

If only non-candidate primary nodes are removed (for example, only data nodes or only coordination nodes), no voting exclusion is required.

Similarly, if a node is added to a cluster, there is no vote to exclude it.

3. Practice

3.1 Scenario 1: One Master Node and one Data node

Data node configuration:

  • Seed host and initial_master_nodes only set the primary node configuration.

The results are as follows:

3.2 Scenario 2: One Master Node, two Data Nodes

Two data node configurations:

  • Seed host and initial_master_nodes only set the primary node configuration.

3.3 Scenario 3: Three nodes are both primary nodes and data nodes

The same configuration for the three nodes is as follows:

Notice that at this point, I forcibly kill Node-1? Guess what?

If it’s down, you’re wrong! The cluster has been re-selected:

The result is as follows: Node 2 becomes the primary node.

This is not recommended in actual service scenarios. The data node becomes the master node without setting node. Master: true.

3.4 Scenario 4: Three nodes are both primary nodes and data nodes

The same configuration for the three nodes is as follows:

Kill node 2 and node 1 one by one and see what happens?

Kill node 2 first: node 1 becomes the primary node.

Kill node 1 again: the cluster is no longer accessible or usable.

At this point, the error log is as follows:

Core error Description: The candidate primary node requires at least two nodes.

an election requires at least 2 nodes  …..  which is not a quorum.

The detailed error is as follows:

 [node- 3] master not discovered or elected yet, an election requires at least 2 nodes with ids from [0bozQB4VRZWB4TuzjRahAw, Z7PxWN_bQEeeI6KOlQT8pw, AWDZHrxaTd2qmOB1e8kadQ], have discovered [] which is not a quorum; discovery will continue using [172.21. 014.:9300.172.21. 014.:9302] from hosts providers and [{node- 3}{Z7PxWN_bQEeeI6KOlQT8pw}{ejRvW9egTrum3G5DrwmrIA}{172.21. 014.} {172.21. 014.:9303}{ml.machine_memory=8200851456, xpack.installed=true, ml.max_open_jobs=20}, {node- 1}{AWDZHrxaTd2qmOB1e8kadQ}{FFIDQcynQMGHqPwCW9lFfw}{172.21. 014.} {172.21. 014.:9300}{ml.machine_memory=8200851456, ml.max_open_jobs=20, xpack.installed=true}] from last-known cluster state; node term 8, last-accepted version 92 in term 8
Copy the code

As the official documentation shows:

To ensure that the cluster is still available, you must not stop half or more of the nodes in the voting configuration at the same time. The cluster can work as long as more than half of the voting nodes are available.

As mentioned earlier: The candidate primary node is configured in the voting configuration!

3.5 Scenario 5: Three Nodes are both primary nodes and data nodes

Modify from the old configuration, comment out: initial_master_nodes configuration. The configuration is as follows:

The cluster result is as follows:

The cluster can also start normally. See section 2.6 for a better understanding.

4, then answer the question of Tencent bigwigs

  • 1. Master eligible nodes are available on seed_hosts.

Yes, you must candidate the primary node.

  • Master eligible data nodes are available on the screen

Theory can, practice does not recommend.

  • 3. How are potential machines found?

The Discovery process is the process of Discovery.

  • 4. Master eligible nodes

Yes.

  • 5. Are these nodes always present when the cluster is initially started?

Note for large clusters: An odd number of candidate primary nodes should be considered in the cluster planning stage.

The candidate primary node must be properly configured before the initial startup of the cluster.

  • If you log in to the cluster, you can configure one of these nodes. If you log in to the cluster, you can add master eligible nodes.

Yes, but not recommended.

For multi-node clusters, you are advised to do it in one step.

  • 7. After adding a few more, is it ok to remove several of the initial_master?

This parameter can be used only when the cluster is started for the first time. See case 5 for details.

However, for the sake of standard management, it is ok to leave configuration untouched.

  • If the current master of a cluster is 7, its quorum is 4. Does ES support slowly dropping nodes and quorum slowly dropping?

To put it another way, if there are seven candidate primary nodes, that means that at least half of the valid clusters will survive.

That is, the cluster is still alive after killing three candidate primary nodes.

  • 9. If you remove 3 nodes slowly and the original cluster works normally, then the network partitions are together after the three nodes are restarted, will it form a cluster by itself?

No, remove the 3 nodes, will continue to join the original cluster ah!

5, summary

Such concepts are really hard to understand from the documentation alone, even if you’re still a little confused after reading this article.

Don’t differentiate concepts just for the sake of distinguishing them. Put theoretical concepts into practice and then combine them with theory, and you will understand them better.

Feel free to write down your thoughts in the comments!

Reference:

www.elastic.co/guide/en/el…

www.elastic.co/guide/en/el…


Add elastic6 (only a few slots left) and work with BAT to improve Elastic!

Recommended reading:

Blockbuster | into Elasticsearch methodology cognitive listing (National Day update edition in 2020)

The official documentation for your Elasticsearch puzzle is already available……

You can pass the Elastic certification exam with a driver’s license!

Concentrate on a technique, do the ultimate! — Elastic Certified Engineer

Upgrade these ten points and you are the boss!

Cognitive Upgrading – Don’t be a starter!


Scan promo codes to learn more dry goods in a shorter time!

About 50% + ****Elastic certified engineers in China are from here!

Play Elasticsearch with 850+ Elastic fans around the world!