ES performance tuning is no silver bullet, don’t expect to tune a parameter that can handle all the slow performance scenarios. Filesystem Cache is a powerful tool in ES performance optimization

Data written to ES is written to disk files. When querying data, the operating system automatically caches the data in disk files to filesystem Cache.

The ES search engine relies on the underlying Filesystem cache. If filesystem cache is given more memory to accommodate all idX segment file index data files, the search will use the memory and the performance will be very high. Filesystem cache performance is an order of magnitude higher than disk drive performance, ranging from a few milliseconds to hundreds of milliseconds. For ES to perform well, the machine’s memory should hold at least half of the total data in the best case. The best case scenario is to store a small amount of data in ES, and the indexes to search for. If filesystem cache has 100 GB of memory, keep the index data under 100 GB, and almost all of the data will be searched in memory, which is very high performance. Write only a few fields in ES, such as ES ID,name, and age, and save other fields in mysql/hbase. It is recommended to use ES + hbase. Hbase is suitable for online storage of massive data. Massive data can be written to hbase. Instead of complex search, simple tasks such as query by ID or range can be performed. Search for the NAME and age from ES to obtain the DOC ID. Then query the complete data corresponding to each DOC ID in hbase and return the data to the front-end. The size of filesystem cache must be smaller than or equal to the size of FILESYSTEM cache. Data preheating

The amount of data written to each machine in the ES cluster is still twice that of filesystem cache, which can be used to warm up data. For frequently accessed data, it is best to create a special cache preheating subsystem. The hot data is accessed at regular intervals and stored in filesystem cache. The next time the data is accessed, performance will be much better. Hot and cold separation

ES can do a horizontal split similar to MySQL, that is, a large amount of data that is accessed infrequently, a separate index, and then hot data that is accessed frequently, a separate index. It is best to write cold data to one index and hot data to another to ensure that hot data is kept in the Filesystem OS cache as much as possible after being warmed up. Document Model design

There are often complex associative queries for MySQL. How to play in ES, ES inside the complex associated query as far as possible do not use, performance is generally not good. It is best to complete the association in the Java system first and write the associated data directly to ES. When searching, there is no need to use ES search syntax to complete associative search like join. Document model design is very important, do not consider using ES to do some complicated operations. If you do that, try to do it at document model design time, at write time. In addition, avoid complex operations such as join/nested/parent-child search. Paging performance optimization

If you want to query page 100, the first 1000 pieces of data stored on each shard will be retrieved on a coordination node. If there are 5 shards, then there will be 5000 pieces of data. Then coordinate nodes to merge and process these 5000 pieces of data, and finally obtain 10 pieces of data on page 100. In a distributed system, looking up 10 data items on page 100, it is not possible to go from 5 shards, looking up 2 data items per shard, and finally to the coordinated node merging into 10 data items. You have to look up 1000 entries from each shard, sort, filter, and so on, and then page out again to get the data on page 100. When turning pages, the deeper you go, the more data each shard returns, and the longer it takes to coordinate node processing, which is very frustrating. So when you do pages in ES, you’ll see that the further you go, the slower you go. Is there a solution? Deep paging is not allowed (the default deep paging performance is poor)

Tell the product manager that the system does not allow page-turning, and the deeper the default page-turning, the worse the performance. It’s kind of like the recommendation items in the app that keep pulling down page after page

Similar to micro-blog, scroll API can be used to scroll a page of micro-blog. Scroll is going to take a snapshot of all the data at once, and then every time you scroll backwards you’re going to get the next page through the cursor scroll_id, and the performance is going to be much higher than the paging performance mentioned above, basically in milliseconds. This is suitable for scenarios where you can’t jump to any page, like a twitter drop-down page. The scroll parameter must be specified during initialization to tell ES how long to save the context of this search. You need to make sure that users don’t keep scrolling for hours or they might fail due to timeout. In addition to using the Scroll API, you can also use search_after. The idea is to use the results of the previous page to help retrieve the data of the next page, which also doesn’t allow arbitrary page-turning. Initialization requires a field with a unique value as the sort field.