Elasticsearch cluster

Cluster Cluster

An Elasticsearch cluster consists of one or more nodes, each identified by a common cluster name.

The Node Node

An instance of Elasticsearch is a Node. A machine can have multiple instances of Elasticsearch, and each instance should be deployed on a different machine. The node type of Elasticsearch can be set using node.master or node.data.

Master: indicates whether the node is qualified as the active node. Node. data: indicates whether the node stores data

Note: A value of true for this property does not mean that the node is the primary node. The true master node is elected by multiple nodes that qualify as master nodes. Therefore, this attribute simply indicates whether the node is eligible for primary node election.

Master node + Data Node (default)

node.master: true

node.data: true

Copy the code

A node qualifies as a master node and stores data. At this time, if a node is elected as the real master node, it also needs to store data, which puts more pressure on this node. This is the default configuration for each node of Elasticsearch, which is fine in a test environment. In practice, it is not recommended to set this as it is equivalent to mixing the roles of the master node and the data node together.

Data nodes

node.master: false

node.data: true

Copy the code

Nodes are not eligible to become master nodes, do not participate in elections, and only store data. You need to configure several such nodes in a cluster to store data and provide storage and query services. The main consumption of disk, memory.

The master node

node.master: true

node.data: false

Copy the code

Does not store data, has the master node qualification, can participate in the election, may become a true master node. Common server (CPU, memory consumption is average).

Client node

node.master: false

node.data: false

Copy the code

It does not become the primary node and does not store data. It is mainly used for load balancing when massive requests are received. Ordinary server (if you want to do group aggregation operation, it is recommended that this node also allocate more memory)

In the production environment, if you do not modify the role information of Elasticsearch nodes, clusters are prone to split brain in a high data volume and high concurrency scenario.

The Index Index

A cluster can have multiple indexes, each of which is a collection of documents of the same format (Elasticsearch 6.x no longer supports multiple types under one index).

Shard Shard

Each index has one or more shards, and each shard stores different data. A shard can be a primary shard or a replica shard. A replica shard is a copy of the primary shard. The number of replicated shards for an index can be dynamically adjusted, and the replicated shard is never on the same node as its primary shard (in case of a single point of failure).

Copying sharding has two functions:

Improved recovery: When the master shard fails, a replicated shard becomes the master shard.

Improved performance: GET and search requests can be handled by both master and replication shards;

Cluster health

Green: All primary and replicated shards are available
Yellow: All major shards are available, but not all replication shards are available
Red: Not all major shards are available

When the cluster state is red, it still provides normal services, and it will execute requests in the existing live shards. We need to repair the failed shards as soon as possible to prevent the loss of query data.

Build a cluster in Windows (3 nodes, all primary nodes + data nodes)

Downloading the Installation package

www.elastic.co/downloads/e…

Unzip and make 3 copies (each starting an instance)

Edit config/ elasticSearch.yml under each file

Adjust parameters of ElasticSearch. yml based on the following instructions. Other parameters on Node2 and Node3 are the same as those on Node1.

node1

The default is elasticSearch

cluster.name: es



# node name

node.name: node1



The default value is true

node.master: true



The default value is true

node.data: true



The default location is the data folder in the root directory of es

path.data: E:\elasticsearch\node1\data



The default location is the logs folder in the es root directory

path.logs: E:\elasticsearch\node1\logs



# Configure the address for accessing this node

Network. The host: 0.0.0.0



The default HTTP port is 9200

http.port: 9200



Set TCP port for interaction between nodes. Default is 9300

transport.tcp.port: 9300



Configure the IP addresses of all the machines used to form the cluster

Discovery. Zen. Ping. Unicast. Hosts: [" 127.0.0.1:9300 ", "127.0.0.1:9301", "127.0.0.1:9302"]



Configure the minimum number of master nodes in the current cluster. For a cluster with more than two nodes, the recommended value is greater than 1

discovery.zen.minimum_master_nodes: 2

Copy the code

node2

path.data: E:\elasticsearch\node2\data

path.logs: E:\elasticsearch\node2\logs

http.port: 9201

transport.tcp.port: 9301

Copy the code

node3

path.data: E:\elasticsearch\node3\data

path.logs: E:\elasticsearch\node3\logs

http.port: 9202

transport.tcp.port: 9302

Copy the code

Now that the cluster is configured, let’s start each instance separately.

According to the comments in the configuration file:

Prevent the “split brain” by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1)

So we configured discovery.zen.minimum_master_nodes: 2, so two primary nodes must be started successfully for the cluster to take effect.

test

Go to elasticSearch-6.2.1-1 and run bin\ elasticSearch. bat to start the first node. You can see from the log that it did not succeed because not enough master nodes were found.

When the second master node is successfully started, the cluster status becomes normal.

You can view the cluster status by using the elasticSearch-head plug-in. If the cluster health value is green, the cluster is running properly. Currently there is no data in the cluster, so indexes and sharding are not visible.

Elasticsearch works with Kibana + X-Pack for cluster data analysis and monitoring. Elasticsearch-head plugin is used here, which is a relatively small tool. For details about how to install elasticSearch, see ElasticSearch -head installation Guide

Add test data:

As can be seen from the screenshot, there are currently 3 nodes in total, including one index test. The test index has 5 primary shards (with bold border) and 5 replication shards (without bold border), which are evenly distributed to each node.

We try to kill node3. After node3 exits from the cluster, the cluster redistributes shards for a short period of time, depending on the master and the replicated shards not being on the same Node.

If we continue to kill node2, then the whole cluster hangs. Cluster health: not connected. Because the number of primary nodes currently available is 1 < discovery.zen.minimum_master_nodes set to 2.

We try to set discovery.zen.minimum_master_nodes to 1, and then restart a node. There is an Unassigned node, cluster health value: yellow (5 of 10). In this case, all the master shards are available, there are unusable replication shards, 5 replication shards are not allocated to the node, but the cluster is available, we can handle any request, but all the operations fall on the master shard, and may cause a single point of failure. When we start the second node, everything goes back to normal and the data from the primary shard is synchronized to the replication shard.

In the actual production environment, each node may have different node types. We add two nodes on the basis of three nodes and adjust the values of Node. master and Node. data to finally set up two master nodes, two data nodes, and one client node.

Node1 and node2 are the nodes with primary node voting authority, where Node1 is elected as the master node. Node3 and Node4 are data nodes, so data shards are only allocated to these two nodes. Node5 is the client node that ultimately acts as a load balancer for requests.

For details about elasticSearch 5.x and 6.x Installation Problems in a Linux OS, see Troubleshooting.