Super full Elasticsearch performance tuning tips, value favorites!

Original text: : elasticsearch. Cn/article / 620…

If you want to tune elasticSearch, you’re going to get a case by case answer. If you want to tune ElasticSearch, you’re going to get a case by case answer. Today is a brief overview of elasticSearch using tuning, the following is my daily experience, if there is any error, please help to correct.

I. Tuning configuration files

elasticsearch.yml

Memory locking

Bootstrap. memory_lock: true Allows the JVM to lock memory and prevent OS swapping out.

zen.discovery

By default, Elasticsearch is configured to use unicast discovery to prevent nodes from inadvertently joining the cluster. Multicast discovery should never be used in production, otherwise you end up with a node accidentally being added to your production environment simply because they received an incorrect multicast signal. ES is a P2P distributed system. Using the Gossip protocol, any request of the cluster can be sent to any node of the cluster. Then, ES will find the node that needs to be forwarded and communicate with it. In ES1. x version, multicast is enabled in ES by default. After es is enabled, the same instances of cluster names and default ports in the LAN can be quickly added to a large cluster. After ES2. x is enabled, unicast can be changed to avoid security problems and network storms.

Unicast discovery. Zen. Ping. Unicast. Hosts, suggest writing all nodes within cluster and port, if the new instance in the cluster, the new instance only need to write the current cluster instance, can be automatically added to the cluster, and then deal with the original instance configuration, the new instance to join the cluster, No need to restart the original instance; Configurations related to node zen: discovery.zen. Ping_timeout: The timeout setting for discovering the survival of other nodes during the master election mainly affects the election time. This parameter takes effect only when a master node is joined or elected. Discovery.zen. The default value is 3sdiscovery.zen. Minimum_master_nodes: Minimum number of nodes to participate in the master election. If the number of nodes that can be selected as master of a cluster is smaller than the minimum number, the cluster cannot be selected as master.

Fault detection

Fault detection can be performed in two cases. In the first case, the master ping all other nodes in the cluster to check whether the nodes are active. The second method is as follows: Each node in the cluster pings the master to check whether the master is alive and whether to initiate an election. For fault detection, you need to set the following configurations: discovery.zen. Fd. ping_interval Specifies the frequency at which a node is pinged. The default value is 1s. Ping_timeout Time for waiting for a ping response. The default value is 30s. In a cluster, the master checks all nodes and the node checks whether the Master is running properly. Discovery.zen. Fd. ping_retries Ping failure/timeout the node is considered to fail. The default value is 3.

The queue number

Does not recommend blindly increasing es queue number, if it is occasional because data spurt, causes the queue congestion, increase the queue size you can use the memory to the cache data, if it’s persistent data block in the queue, enlarging the queue size in addition to increase memory footprint, and would not improve writing data rate, but may increase es downtime, The amount of data that can be lost in memory. In what cases should I increase the queue size? GET /_cat/thread_pool, observe queue and Rejected in the API. If there is a rejected queue or a persistent queue, adjust the size of the queue as appropriate.

Memory usage

Set related parameters indices of memory fusing, adjusted according to actual condition, prevent writing or query pressure too high lead to OOM, indices, breaker. Total. Limit: 50%, the cluster level of circuit breaker, the default is 70% of the JVM heap; Indices. Breaker. Request. Limit: 10%, the circuit breaker of a single request limits, the default is 60% of the JVM heap; Indices. Breaker. Fielddata. Limit: 10%, fielddata breaker limitation, the default 60% of the JVM heap.

Adjust the cache usage based on actual conditions to prevent the cache from occupying too much JVM memory. The parameter is static and must be configured on each data node. Indices. The queries. Cache. Size: 5%, the control filter cache memory size, the default is 10%. Accept a percentage value, 5%, or an exact value, such as 512MB.

Create a shard

If the cluster size is large, you can prevent the metadata of all shards in the cluster from being scanned when creating a shard. This improves the shard allocation speed. Cluster. Routing. Allocation. Disk. Include_relocations: false, the default is true.

2. System-level tuning

JDK version

Select the JDK version that matches the official recommendation.

JDK Memory Configuration

First of all, set -xms and -xmx to the same value to avoid further memory allocation during the running process. At the same time, if the system memory is less than 64 GB, it is recommended to set a little less than half of the machine memory, leaving the rest for the system. At the same time, the JVM heap recommendation is not to exceed 32GB (the value varies slightly depending on JDK versions), otherwise the JVM will waste memory due to pointer compression. See:

Swap partition

Turn off swap partitions to prevent performance degradation due to memory swapping (in some cases, death is better than slow)

File handle

Lucene uses a lot of files. At the same time, Elasticsearch uses a lot of sockets to communicate between nodes and HTTP clients. All of this requires a sufficient number of file descriptors. By default, Linux runs a single process with 1024 file handles open, which is obviously not enough. Therefore, the number of file handles must be increased. Ulimit -n 65536

mmap

Elasticsearch uses a mixture of NioFs and MMapFs for a variety of files. Make sure you configure the maximum number of mappings so that you have enough virtual memory available for the MMApped file. This can be set temporarily: sysctl -w vm.max_map_count=262144 or you can set it permanently in /etc/sysctl.conf by changing vm.max_map_count.

disk

If you are using SSDs, make sure your system I/O scheduler is configured correctly. When you write data to the hard disk, the I/O scheduler decides when to actually send the data to the hard disk. The scheduler under most default NIX distributions is called CFQ (Completely Fair Queue). But it is optimized for rotating media: the inherent properties of a mechanical hard disk mean it is more efficient to write data to a hard disk based on a physical layout. This is inefficient for SSDS, although mechanical hard drives are not involved here. However, deadline or NOop should be used. The deadline scheduler is optimized based on the write wait time, and the NOOP is just a simple FIFO queue. echo noop > /sys/block/sd/queue/scheduler

Disk mount

Mount -o noatime,data=writeback,barrier=0,nobh /dev/sd* /esdata* Data =writeback, not journal; Barrier =0, barrier is closed synchronously because journal is closed; Nobh: disable buffer_HEAD to prevent the kernel from affecting data I/OS

Other precautions for disks

Use RAID 0. Striping RAID increases disk I/O at the obvious cost that when one hard drive fails, the whole drive fails. Do not use mirroring or parity RAID because replicas already provide this capability. In addition, use multiple hard disks and allow Elasticsearch to allocate data to multiple path.data directory configurations. Do not use remotely mounted storage, such as NFS or SMB/CIFS. This introduced latency is completely antithetical to performance.

3. Optimize the use mode of ElasticSearch

When elasticSearch has no apparent problems with its configuration and es is still very slow to use, we need to use the first command to locate the problem:

hot_threads

GET /_nodes/hot_threads&interval=30s

Check whether the resource consumption is normal by checking the TOP threads that use the most resources. Generally, bulk and Search threads may be caused by services. However, if the merge thread consumes a large number of resources, the merge thread consumes a large number of resources. You should consider whether the index or brush interval is too small, batch write size is too small.

pending_tasks

GET /_cluster/pending_tasks

There are tasks that can only be handled by the master node, such as creating a new index or moving shards across the cluster. Since there can only be one master node in a cluster, only this master node can handle cluster-level metadata changes. 99.9999% of the time, this is fine, and the queue of metadata changes remains essentially zero. In some rare clusters, metadata changes occur faster than the master node can handle, causing waiting operations to accumulate into queues. The pending_Tasks API is used to determine what operations are blocking the es queue. For example, if the cluster is abnormal, there will be a large number of shards in recovery. If the cluster is creating a large number of new fields, there will be a large number of put_mappings. Dynamic mapping needs to be disabled.

Fields to store

There are currently three types of ES: doc_VALUES, fieldData, and storefield. In most cases, you do not need to store all three types. You can adjust them according to the actual scenario: Doc_values is the most commonly used column storage. For fields that do not need word segmentation, doc_values can be enabled for storage (and only keyword fields are reserved) to save memory. Of course, enabling DOC_values will have a certain impact on query performance, but, The performance cost is relatively small, and it is worthwhile;

Fielddata is built and managed 100% in memory and is resident in the JVM heap, so it can be used for fast queries, but this also means that it is inherently non-scalable and has a lot of edge cases to watch out for. You can turn fieldData off if you don’t need to parse the fields.

Storefield is mainly used for _source field. By default, es will store doc data as _source field when data is written to ES. You can quickly obtain the original structure of doc from _source field. You can disable the _source field;

_all, ES in the version before 6.x, the default is to concatenate the written field into a large string, and the field is divided into words, which is used to support the full text retrieval of the whole DOC. If the name of the DOC field is known, it is recommended to close the field to save storage space and avoid the full text retrieval without the field key.

But no score is required in log scenarios. Therefore, you are advised to disable this rule on search norms.

tranlog

After Elasticsearch 2.0, when index, BULK, DELETE, and UPDATE are complete, the translog is refreshed to the disk and 200 OK is returned to the request to ensure data is not lost. This change improves data security and of course reduces performance a bit. If you don’t care about this possibility and want performance to take precedence, you can set the following parameters in the index template:

{
    "index.translog.durability": "async"
}
Copy the code

Index. translog.sync_interval: For high-volume clusters where the occasional loss of a few seconds of data is not a serious problem, using asynchronous fsync may be beneficial. For example, the written data is cached in memory and fsync is performed every 5 seconds (5s by default). Values less than 100ms are not allowed. Index.translog.flush_threshold_size: Translog stores all operations that have not been safely saved in Lucene. Although these operations can be read, they need to be reindexed if they are to be turned off and must be restored. This setting controls the maximum total size of these operations to prevent long recovery times. When the set maximum size is reached, a refresh occurs and a new Lucene commit point is generated, which is 512MB by default.

refresh_interval

The default value is 1s. You can set -1 to disable the refresh. For scenarios requiring high write rate, you can increase the corresponding duration to reduce the generation of DISK I/O and segments.

Disabling Dynamic Mapping

Disadvantages of dynamic mapping:

The metadata of the cluster is constantly changed, which causes the cluster to be unstable.
The data type may be inconsistent with the actual type.
For some abnormal fields or scanning fields, the mapping is frequently modified, causing uncontrollable services.

The optional values of dynamic Mapping are as follows: true: Dynamic extension is supported. When new data has new field attributes, data is automatically added. False: Dynamic extension is not supported. Dynamic expansion is not supported. When new data has new fields, an error is reported and data fails to be written

Batch write

Batch requests obviously increase the write rate, and this rate can be quantified. The official recommendation is that the number of physical bytes of data per batch should be between 5 and 15MB. Document counts are not a good indicator of batch size. For example, if you are batch indexing 1000 documents at a time, remember the following fact: 1000 1 KB documents add up to 1 MB. 1000 100 KB documents add up to 100 MB. That’s a whole different batch size. Batch requests need to be loaded into memory on the coordination node, so the physical size of batch requests is much more important than document counts. Start with a batch request size of 5-15 MB and slowly increase this number until you see no performance improvement. Then start increasing the concurrency of your batch writes (multi-threading, etc.). Monitor your nodes with iostat, top, and ps tools to see when resources have reached a bottleneck. If you start to receive EsRejectedExecutionException, your cluster can’t continue: at least one resource to the bottleneck. Either reduce concurrency, or provide more constrained resources (such as switching from mechanical disks to SSDS), or add more nodes.

Index and shard

The metadata is stored in the master node. The metadata is synchronized to all nodes in the cluster. When a new field or index is created, the metadata is obtained, changed and synchronized. In this case, the response of the cluster will be affected. Therefore, pay attention to the index and number of shards of the cluster. Suggestions are as follows: 1. Use the Shrink and Rollover apis to generate a relatively appropriate number of data shards. 2. Select an index name based on the data magnitude and performance requirements. For example, an index is generated by month: test-YYYYmm; an index is generated by day: test-YYYYMMDD. 3. Control the size of a single shard. In normal cases, it is recommended that the size of a single shard be smaller than 50GB in log scenarios and be smaller than 20GB in online service scenarios.

segment merge

Segment merging is computationally heavy and eats a lot of disk I/O. Merges operate regularly in the background because they can take a long time to complete, especially for larger segments. This is usually fine because the probability of a large segment merging is very small. If it is found that the merge takes up a lot of resources, can be set: index. The merge. The scheduler. Max_thread_count: 1 especially mechanical disk in concurrent I/O support more bad, so we need to reduce each index concurrent access to the disk number of threads. This setting allows max_thread_count + 2 threads to perform disk operations simultaneously, i.e. set to 1 to allow three threads. For SSDS, you can ignore this setting. The default is math.min (3, runtime.geTruntime ().availableProcessors() / 2), which works fine for SSDS. The force_merge command is used to forcibly merge segments during peak business periods to reduce the number of segments and reduce memory consumption. Disable the cold index and enable it when services need it. If the indexes that are not used all the time can be periodically deleted or backed up to the Hadoop cluster.

Automatically generate _id

When the writer uses a specific ID to write data to ES, ES checks whether the same ID exists under the corresponding index. This operation consumes more and more documents as the number of documents increases. Therefore, if there is no strong service demand, it is recommended to use the ID automatically generated by ES to speed up the write rate.

routing

If a large amount of data is used to query services, the ES side creates multiple shards and allocates the shards to multiple instances in a cluster to share pressure. Normally, a query process merges all shards, merges the query results, and returns the query results to the query end. In this case, the es will only query the corresponding shard, which greatly reduces the overhead of merging data and scheduling the full shard.

Using the alias

When producing indexes that provide services, remember to use aliases instead of directly exposing index names to avoid service interruption due to service changes or reindex of index data.

Avoid wide table

Defining too many fields in an index is one case where mapping can explode. This can lead to out-of-memory errors and hard to recover conditions. This problem may be more common than expected, index.mapping.total_fields

Avoid sparse indexes

As the index is sparse, the delta value of the corresponding adjacent document ID will be large. Lucene uses delta coding compression based on the document ID, which reduces the compression rate and leads to the increase of the index file. Meanwhile, the keyword and array type of ES adopt the structure of DOC_values, and each document will occupy a certain space. Even if the fields are null, sparse indexes increase the disk size, resulting in low query and write efficiency.

Learning Materials sharing

12 sets of microservices, Spring Boot, Spring Cloud core technology data, this is part of the data directory:

Spring Security authentication and authorization
Actual Practice of Spring Boot Project (Background service architecture and operation and maintenance architecture of small and medium-sized Internet companies)
Spring Boot Project practice (enterprise rights management project)
Spring Cloud Microservice Architecture Project (Distributed transaction solution)
.

Public account background replyarch028Information obtained: :