• Let’s start with chestnuts, a bunch of candy that we want to sort by color

  • Let’s take some maxima, minima, average (note: aggregation can be nested inside)

Hot and cold cluster architecture

  • Elastic nodes can be configured in either of the following ways: Specifying which nodes are hot and which are cold

Configure by elasticSearch.yml

node.attr.hotwarm_type: hot # hot
node.attr.hotwarm_type: warm # cold
Copy the code
  • Specified when the index is created
PUT /logs_2019-10-01
{
  "settings": {
    "index.routing.allocation.require.hotwarm_type": "hot"."number_of_replicas": 0
  }
}


PUT /logs_2019-08-01
{
  "settings": {
    "index.routing.allocation.require.hotwarm_type": "warm"."number_of_replicas": 0}}Copy the code
  • Specify by template
PUT _template/logs_2019-08-template
{
  "index_patterns": "logs_2019-08-*"."settings": {
    "index.number_of_replicas": "0"."index.routing.allocation.require.hotwarm_type": "warm"
  }
}
PUT _template/logs_2019-10-template
{
  "index_patterns": "logs_2019-10-*"."settings": {
    "index.number_of_replicas": "0"."index.routing.allocation.require.hotwarm_type": "hot"}}Copy the code

Re-learn the Elastic cluster

  • cluster

Cluster: An ES cluster consists of one or more nodes, and each cluster is identified by a cluster name

  • node

A node, an ES instance is a node, a machine can have multiple instances, so it is not said that a machine is a node, most of the time each node runs in a separate environment or virtual machine.

  • index

Index, which is a collection of documents

  • shard

ES is a distributed search engine. Each index has one or more shards. The data of the index is allocated to each shard, which is like a bucket of water filled with N cups. 4 main shards (not including backup), each node will be allocated 2 shards, later you add 2 shards, you will have 1 shard on the 4 nodes, this process is called relocation, ES sense automatic completion), shards are independent, the behavior of a Search Request, Each shard executes this Request. In addition, each shard is a Lucene Index, so a shard can store only Integer.MAX_VALUE – 128 = 2,147,483,519 docs. You are advised to set the size of a single fragment to 30-50 GB. The default maximum number of documents per shard is 2 billion. Half of the host’s memory size and 31GB, take the minimum value.

  • replica

The primary shard and the standby shard do not exist on the same node (to prevent a single point of failure). By default, 5 shards are created per index and one backup is created (i.e. 5primary+5replica=10 shards). If you have only one node, Then none of the 5 replicas can be allocated (unassigned), and the cluster status becomes Yellow. The main functions of replica are as follows: 1. Disaster recovery: if the primary fragment is lost, the replica fragment is lifted up to become a new principal fragment, and new replicas are created according to the new principal fragment. The cluster data is safe. Improve query performance: The data of replica and primary shards are the same, so for a query, both the primary and standby shards can be searched. In a proper range, the performance of multiple replicas is better (but the resource usage also increases [CPU /disk/heap]). In addition, the Index request can only occur on the primary shard, but the Replica cannot execute the Index request. For an index, the number of shards (number of primary shards, number_of_shards) cannot be adjusted unless the index is rebuilt, but the replica number (number_of_replicas) can be adjusted at any time.

The storage space

If you run out of disk space on your Elasticsearch cluster nodes, this will affect cluster performance. Once the available storage space falls below a certain threshold limit, it begins to block writes, which in turn affects data entry into the cluster. Many students may be encountered the following error: ElasticsearchStatusException [Elasticsearch exception [type = cluster_block_exception, tiny = blocked by: [FORBIDDEN/12/index Read -only/allow This indicates that the disk is about to be full.

Three default alert water levels for the disk.

  • Low alert water level

The default value is 85% of the disk capacity. Elasticsearch does not allocate new shards to nodes whose disk usage exceeds 85%. It can also be set to an absolute byte value (such as 500MB) to prevent Elasticsearch from allocating shards when less than the specified amount of free space is available. This setting does not affect the primary shards of newly created indexes, especially those that have never been allocated before.

cluster.routing.allocation.disk.watermark.low
Copy the code
  • High alert water level

The default value is 90% of the disk capacity. Elasticsearch will attempt to redistribute shards (move data from the current node to another node) on nodes whose disk usage exceeds 90%. It can also be set to an absolute byte value to reallocate a node from a node if it is less than the specified amount of free space. This setting affects the allocation of all shards, whether or not they were previously allocated.

cluster.routing.allocation.disk.watermark.high
Copy the code
  • Flood warning level

The default value is 95% of the disk capacity. Elasticsearch enforces a read-only index block (index.blocks. Read_only_allow_delete) for each index. This is a last resort to prevent nodes from running out of disk space. Read-only mode Manually delete the disk when the disk space is sufficient. Therefore, monitoring the available storage space in the cluster is critical.

cluster.routing.allocation.disk.watermark.flood_stage
Copy the code

Document deletion

Documents in ElasticSearch cannot be modified and are not modifiable, so all updates mark existing documents as deleted, including deleted documents, not immediately. When we search again, we will search all of them and filter out the documents with the delete mark. Therefore, the space occupied by the index is not immediately freed up by the API’s operation of disk space, until the next segment merge is physically deleted and disk space is freed up. On the other hand, when the document tag deletion process is queried, it also consumes disk space. In this case, you will find that the disk usage increases instead of being freed when the API operation is triggered.

In a typical production environment, indexes that operate using this API are large, with tens or even hundreds of millions of documents. The index size is several hundred GIGABytes or even several Terabytes. Therefore, you are advised to perform this operation during off-peak hours or at night because deleting a large amount of data consumes a large amount of I/O CPU resources, which may adversely affect the production cluster. Make sure that the cluster disk has a certain amount of spare capacity during the delete process, because mark deletes take up disk space. This operation has a high failure rate if disk space is insufficient.

Elasticsearch has a background thread that periodically merges segments according to Lucene’s merge rules, usually without the user having to worry or take any action. Deleted documents are not actually deleted until they are merged. Until then, it still consumes the JVM heap and the operating system’s file cache, disk, and other resources. In certain cases, we need ES to force segment merges to free up a lot of system, disk, and other resources. POST/index_name _forcemerge.

The _forcemerge command forces segment merging and deletes all documents marked for deletion. Segment merging requires complex CPU consumption, and large I/O resources, so do it only if your ElasticSearch cluster is in a maintenance window and has adequate I/O space (e.g., SSDS); Otherwise, cluster crashes and data loss may occur.

Document refresh

The Elasticsearch Refresh operation is the process of making a document searchable. By default, it refreshes once per second. Change the default refresh interval for Elasticsearch from 1 second to 30 seconds if the primary goal is to adjust the index for ingestion speed. After 30 seconds, this makes the document visible for searching, optimizing indexing speed.

  • You can also specify the refresh rate for an index
PUT my_index/_settings
{
  "index": {
    "refresh_interval": "30s"}}Copy the code

A copy of the

Copy To avoid data loss caused by the failure of the primary shard, the primary shard and the replica do not exist on the same node. However, this will affect the efficiency of writing, right? Because the copy needs to be synchronized from the master shard, so during data initialization, you can disable the copy first, and then enable the copy after initialization

Field type selection

Enable slow query logs

It is recommended that you enable slow query logging in your Elasticsearch cluster to address performance issues and catch queries that take longer to run or exceed a set threshold. For example, if your search SLA is 2 seconds, you can configure the search query so that any query that exceeds this threshold will be logged.

PUT my_index/_settings
{
    "index.search.slowlog.threshold.query.warn" : "2s"
}
Copy the code

Set user name and Password

  • Use Docker to start the standalone service
zhangguofu@zhangguofudeMacBook-Pro Downloads $ docker run -d --name es -p 9212:9200 -p 9313:9300 -e "discovery.type=single-node"Elasticsearch: 7.2.0 be9b65b0aa5770c5421752e935c54343c17cd9b86edfbe9f09b721a6e2d3bbcaCopy the code
  • Go to the container configuration file
zhangguofu@zhangguofudeMacBook-Pro Downloads $ docker exec -it be bash
[root@be9b65b0aa57 elasticsearch]# ls
LICENSE.txt  NOTICE.txt  README.textile  bin  config  data  jdk  lib  logs  modules  plugins 
Copy the code
  • Refer to the article click to see

  • Restart the container and add the username and password (you need to restart if you haven’t set it before, and you don’t need to restart if you haven’t changed the elastice.yml file)

Multi-node Configuration

  • To configure user names and passwords for multiple Elastic nodes, we need to issue certificates as described in this article

  • Download Elastic package, unzip it, and make a copy of it to make an Elastic cluster. Let’s look at the catalog

  • We configure the elastic1, namely the elasticsearch node 1. Yml (yml) in elasticsearch1 / config/elasticsearch.

# Cluster name
cluster.name: cluster2 
The name of the node
node.name: node1
Can be used as the primary node
node.master: true 
Can be used as a data node
node.data: true 
# Remote connection is not supported
cluster.remote.connect: false 
127.0.0.1 is not available for LAN access.Network. The host: 172.16.131.4# Port for external services
http.port: 9200
Segment opening for cluster data exchangeTransport, port: 9300# seed node 9201 9301 is the port number of other nodes
discovery.seed_hosts: ["172.16.131.4:9300"."172.16.131.4:9301"]
# primary node initialized (note here, change later)
cluster.initial_master_nodes: ["172.16.131.4:9300"."172.16.131.4:9301"]
Copy the code
  • Configuring Node 2
# Cluster name
cluster.name: cluster2 
The name of the node
node.name: node2
Can be used as the primary node
node.master: true 
Can be used as a data node
node.data: true 
# Remote connection is not supported
cluster.remote.connect: false 
127.0.0.1 is not available for LAN access.Network. The host: 172.16.131.4# Port for external services
http.port: 9201
Segment opening for cluster data exchangeTransport, port: 9301# Seed node
discovery.seed_hosts: ["172.16.131.4:9300"."172.16.131.4:9301"]
# primary node initialized (note here, change later)
cluster.initial_master_nodes: ["172.16.131.4:9300"."172.16.131.4:9301"]
Copy the code

After the configuration is complete, we start the nodes separately

  • Note: SINCE I did not configure the node once before, I directly started it, and then added a new node to it, and then deleted the elastic1/data/ Nodes directory and restarted it to take effect, it is estimated that if I started it for the first time, if I started it on a single node, the new node would not be detected later

Generate a certificate

  • Shut down each service node
  • For elasticSearch.yml, you need to add the following configuration information
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true

Copy the code
  • So in the cluster, we need to join firstxpack.security.enabled: trueThis configuration indicates that the cluster is to use secure encryption.
  • Generate a certificate
Create a certificate
bin/elasticsearch-certutil ca
Generate certificate and private key
bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12

Copy the code
  • This is generated in the elastic1 directoryelastic-certificates.p12Copy this file to the config directory of each node, configure it in each elasticSearch.yml, and start each node
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate 
If you want to specify the relative or absolute path, you can specify either the relative or absolute path (the relative default path is config).
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12 
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12 

Copy the code
  • When I started up, I found an error
{"error": {"root_cause": [{"type":"master_not_discovered_exception"."reason":null}],"type":"master_not_discovered_exception"."reason":null},"status": 503}Copy the code
  • If the primary node is not found, the primary node will be specified. I will use ElasticSearch2, which is the primary node of 2 nodes, to see the configuration of node 2
cluster.name: cluster2 
node.name: node2
node.master: true 
node.data: true 
cluster.remote.connect: falseNetwork. Host: 172.16.131.4 http.port: 9201 transport.port: 9301 discovery.seed_hosts: ["172.16.131.4:9300"."172.16.131.4:9301"]
# is designated as the primary node directly here
cluster.initial_master_nodes: node2
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12 

Copy the code
  • After all the startup is complete, I generate the password in the directory of the master node. During my testing, if the master node changes, for example, to become node 1, then I need to set the username password for node 1
bin/elasticsearch-setup-passwords interactive

Copy the code

  • Once you’ve set your password, when you go to 9200 again, you’re going to have to type in your username and password, which is elastic, and your password is the password that you set