Today the main study analysis of the main process, first look at the overall flow chart:

The theoretical knowledge

As a distributed system, there will be a coordinator to ensure data consistency and some governance work throughout the system. There are two ways of thinking about this coordination strategy:

  1. Try to avoid inconsistency
  2. How are definitions reconciled once they occur

ES is the first way of thinking and solves how to deal with network failures. Common models are master-slave and distributed hash table (DHT)

DHT is a distributed storage method. A type of information that can be uniquely identified by key value is stored on multiple nodes according to a certain convention/protocol. In this way, it can effectively avoid the entire network breakdown caused by a single failure of a centralized server (such as Tracker). It can support thousands of nodes per hour to leave and join, it can work in heterogeneous networks without understanding the underlying network topology, query response time is about 4 to 10 hops (transit times)

In general, the number of nodes in an ES cluster is much smaller than the number of connections that a single node can maintain, and the grid environment does not have to deal with node joining and leaving frequently, so the master-slave mode is more suitable for ES.

ES election algorithm

Commonly used election algorithms include Bully, which is relatively simple, and Paxos, which is complex and powerful.

Leant algorithm

Each node has a unique ID, and then sorts all node ids in the cluster. The node with the smallest ID is selected as the Master. Bully algorithm: Suppose the current Master faked death due to overload, then the second largest ID is elected as the new Master, then the old Master recovers, is elected as Master again and then faked death due to overload……

Paxos algorithm

Paxos is very complex to implement but very powerful, especially in terms of flexibility in when and how elections are conducted, which gives it a big advantage over the simple Bully algorithm because in real life, there are more failure patterns than network connection anomalies.

ES uses Bully and has made some optimizations to it:

  1. Each node counts the smallest ID, elects it as a temporary Master, and then votes on that Master;
  2. Each node collects the number of votes. When the number of votes exceeds the specified number, the node becomes the Master and broadcasts cluster information to the nodes that join the cluster.

Legal number: Number of nodes qualified for Master: (n/2 + 1)

The process description

After understanding the above related principles, it is detailed election description

  1. After the service is started, ping all the nodes and get a fullPingResponses. This node is added to the list.
  2. Create two lists
    • ActiveMasters: Iterating through the fullPingResponses we just did and then saving each node’s perceived Master into the list, normally there is only one Master;
    • CandidateMasters: Go through the fullPingResponses and add the qualified Master nodes to this list;
  3. If activeMasters are not empty, choose from activeMasters; otherwise, choose from candidateMasters. When selecting from the candidate list, judge whether the size of the candidate list is greater than the quorum; otherwise, it will fail.
  4. When selecting from a list, sort the list directly, and then select the node with the smallest ID as a temporary node;
  5. Each node sends a joinRequest request to its Master. There are two scenarios
    • When this node is Master, the node collects statistics and counts the number of joinRequest requests. If the number of joinRequest requests reaches the quorum within the specified time (30s by default, configurable), the node releases cluster information and replies to the joinRequest request, and the election is completed. Otherwise, the election fails.
    • If the node is not the Master node, the node rejects the joinRequest from other nodes and sends the joinRequest request to the node that it considers the Master node. Then, the node waits. If no reply is received within the specified time (1 minute, which can be configured) or three abnormal retries fail, the election fails and the node tries again. If the response does not contain Master information or the Master information is not the selected temporary Master node, the election fails.

The node to check

After going through the above five processes, the election is successful

  • The Master node enables NodesFaultDetection (NodesDF) to detect whether the node that joins the cluster is active.
  • The Node enables MasterFaultDetection (MasterDF) to detect whether the Master Node in the cluster is active.

When NodesDF is triggered, if the number of nodes in the current cluster is less than the quorum, the system automatically discards the Master identity (to prevent brain splitting). Then the election is triggered. When MasterDF is triggered, the election is triggered again. These probes resolve the data node and Master node exceptions.

The last

The above is all of this analysis, if there is any wrong place hope readers friends can correct, very grateful.

To be continued !!!!