ElasticSearch performance optimization

Hardware level

Physical memory: ES query performance depends on the OS’s Page Cache, so the larger the physical memory (except for the Java heap) is the better. It is best to store all (mostly hot) index data in memory, so that most queries are progressive in memory.
The Linux server needs to disable swap memory. Swap partition is used to swap some memory data to the swap space when the memory is insufficient. In this way, the system will not be oom or even fatal when the memory is insufficient.

When batch data is written, you can set the number of copies to 0 to speed up the index. After data is written, you can reset the number of copies.

Cold heat data separation, separate the cold heat data and data, can exist in different clusters, cluster using different hardware configuration, such as thermal data may be equipped with a large memory and high performance of SSD, and data access due to less cold and almost no longer write can use cheap machine, it can save the cost, In addition, it can prevent the interference of occasional access to cold data to cache to ensure the query performance of hot data.

ES stores only required index fields. For fields that do not need indexes, they can be stored in other storage such as HBase (query primary key rowkey and go to HBase to query detailed data). Reduce the size of Document as much as possible, so that more Document information can be stored in the memory.
For complex associated query, the aggregated results can be directly written into ES, so that the results can be directly obtained to avoid extra calculation during query (this does not conflict with 1, but is more a choice of business requirements).
If there is no special business requirements it is best to use the system to automatically generate document id, if it is the custom of index id in the index reached a certain amount of time, will write performance fell sharply, whether this is due to all the insert index are duplicate checking (such as a series of other operations), the check at the time of the great amount of index can have significant performance loss, So the recommended approach is to set the document ID to auto-generated.

Data preheating, some hot data can be preheated before query, so that the data can be cached in memory as far as possible, to speed up the query performance;
Bulk DATA can be read and written in batches (using THE Bulk APi), reducing I/O times and bandwidth.
Avoid deep paging for paging queries and use the Scroll Api if traversal is required.
If the real-time requirement of ES is not very high, the refresh_interval can be set to be larger, so as to avoid the generation of a large number of segments, which will affect the retrieval performance, and a large number of Segment merging will also cause performance loss.