What are the clusters, nodes, indexes, documents, types in ElasticSearch?

A cluster is a collection of one or more nodes (servers) that collectively hold the entire data and provide joint indexing and search capabilities across all nodes. A cluster is identified by a unique name, which is ElasticSearch by default. This name is important because if a node is set to join a cluster by name, it can only be part of the cluster. A node is a single server that is part of a cluster. It stores data and participates in cluster indexing and search functions. Indexes are like “databases” in a relational database. It has a map that defines multiple types. An index is a logical namespace that maps to one or more master shards and can have zero or more replica shards. MySQL => MySQL; ElasticSearch => index, the document is like a row in a relational database. The difference is that each document in the index can have a different structure (field), but should have the same data type for common fields. MySQL => Databases =>Tables => Columns/Rows ElasticSearch => Indices => Types =>Tables => Columns/Rows ElasticSearch => Indices => Types =>Tables => Columns/Rows

What are shards in ElasticSearch?

In most environments, each node runs in a separate box or virtual machine. Index – In Elasticsearch, an index is a collection of documents. Sharding – Because Elasticsearch is a distributed search engine, indexes are usually split into elements called sharding that are distributed across multiple nodes.

Es cluster architecture, index data size, how many shards

“number_of_shards”: “5”, “number_of_replicas”: “1”,

Tuning means

1.1 Optimization during the design phase

  • Based on service incremental requirements, the index is created based on the date template and rolled over API.
  • Use aliases for index management;
  • The force_merge operation is performed at dawn every day to release space.
  • Hot data is stored on SSD to improve retrieval efficiency. Cold data is periodically shrink to reduce storage;
  • Life cycle management of index is adopted by curator of index.
  • Only for the fields that need word segmentation, the reasonable setting of word segmentation;
  • In the Mapping stage, the attributes of each field are fully combined, such as whether to be retrieved and stored.

1.2 Write Tuning

  • The number of copies before writing is set to 0.

  • Before writing, disable refresh_interval and set it to -1.

  • During the write process, bulk write is adopted.

  • Restore the number of copies and refresh interval after writing;

  • Use automatically generated ids whenever possible.

1.3 Query Tuning

  • Disable wildcard;

  • Disable batch terms (hundreds of scenarios);

  • Make full use of the inverted index mechanism to keyword type as much as possible;

  • When the amount of data is large, the index can be determined based on time before retrieval;

  • Set a proper routing mechanism.

1.4 Other Tuning Deployment tuning and service tuning.

What is the inverted index of ElasticSearch

Traditionally, our retrieval is to find the position of corresponding keywords through the article one by one. The inverted index, through the word segmentation strategy, forms the mapping relation table between words and articles, and this dictionary + mapping table is the inverted index. With the inverted index, it can realize o (1) time complexity of the efficient retrieval of articles, greatly improving the retrieval efficiency. An inverted index, as opposed to which words are included in an article, starts with a word and records which documents that word has appeared in. It consists of two parts — a dictionary and an inverted list. The underlying realization of inversion indexes is based on FST (Finite State Transducer) data structure. The data structure that Lucene has used extensively since version 4+ is FST. FST has two advantages:

  1. Small space footprint. By reusing the prefixes and suffixes of words in the dictionary, the storage space is reduced.
  2. Fast query speed. O(len(STR)) query time complexity.

Select * from elasticSearch; select * from elasticSearch

Index data planning should be well planned in the early stage, just as the so-called “design first, coding later”, so as to effectively avoid the sudden surge of data resulting in insufficient cluster processing capacity caused by online customer search or other services affected. The tuning is as follows:

  1. Dynamic index level

Create index based on template + time + Rollover API rolling, example: design phase definition: blog index template format: blog_index_ timestamp form, increasing data every day. The advantage of this is that the data volume does not surge so that the data volume of a single index is very large, close to the 32nd power -1 of upper limit 2, and the index storage reaches TB+ or even larger. Once a single index is large, storage and other risks come with it, so think ahead + avoid early.

  1. Storage level

Hot data (for example, data generated in the latest three days or one week) is stored separately, and other data is stored separately. If cold data is not written to new data, you can periodically perform force_merge plus shrink compression to save storage space and search efficiency.

  1. Deployment level

Once there is no planning, this is a contingency strategy. Combined with the dynamic expansion feature of ES itself, dynamic new machines can relieve the cluster pressure. Note: If the master node and other planning is reasonable, dynamic new machines can be completed without restarting the cluster.

How does ElasticSearch implement master voting

Prerequisites:

  • Only nodes that are candidate primary nodes (master: true) can become primary nodes.

  • The purpose of the minimum number of primary nodes (MIN_master_nodes) is to prevent brain splitting.

  1. For Elasticsearch, the ZenDiscovery module is responsible for Ping (the RPC used by nodes to find each other) and Unicast (the Unicast module contains a host list to control which nodes need to be pinged).

  2. Sort all nodes that can become master (node.master: true) by the nodeId dictionary. Each election is based on the order of the nodes that it knows, and then select the first node (0th bit).

  3. If the number of votes for a node reaches a certain value (n/2+1) and the node elects itself, that node is master. Otherwise, a new election will be held until the above conditions are met.

  4. Note: The master node is responsible for cluster, node, and index management, not document level management. The DATA node can disable the HTTP function.

Write the Elasticsearch index document in detail

By default, the coordination node participates in the calculation using the document ID (routing is also supported) to provide the appropriate shard for the route. shard = hash(document_id) % (num_of_primary_shards)

  1. When the shard node receives a request from the coordinator node, it writes the request to the Memory Buffer and then to the Filesystem Cache periodically (every second by default). This process from Momery Buffer to Filesystem Cache is called refresh;
  2. Of course, in some cases, Momery Buffer and Filesystem Cache data may be lost. ES ensures data reliability through the translog mechanism. The data in Filesystem cache is flushed when the data is written to disk.
  3. During Flush, the buffer in memory is cleared, the content is written to a new segment, the segment’s fsync creates a new commit point and flusher the content to disk, and the old translog is deleted and a new Translog is started.
  4. Flush is triggered when it is timed (default: 30 minutes) or when translog becomes too large (default: 512 MB).

Description of Elasticsearch query process?

  1. The search is performed as a two-phase process called Query Then Fetch;
  2. During the initial query phase, the query is broadcast to each shard copy (master shard or replica shard) in the index. Each shard performs a search locally and builds a priority queue matching documents of size from + size.

PS: Filesystem Cache is queried during search, but some data is still stored in Memory Buffer, so the search is performed in near real time.

  1. Each shard returns the IDS and sorted values of all documents in its respective priority queue to the coordinator node, which merges these values into its own priority queue to produce a globally sorted list of results.
  2. Next comes the fetch phase, where the coordination node identifies which documents need to be fetched and submits multiple GET requests to the related shard. Each shard loads and enriches the document, then returns the document to the coordination node if necessary. Once all documents have been retrieved, the coordination node returns the result to the client.
  3. Supplement: The search type of Query Then Fetch refers to the data of the fragment when scoring the document relevance, which may not be accurate when the number of documents is small. DFS Query Then Fetch adds a pre-query processing. Ask Term and Document Frequency, this score is more accurate, but performance deteriorates.

Describe how Elasticsearch updates and deletes documents.

  1. Delete and update are also write operations, but documents in Elasticsearch are immutable and therefore cannot be deleted or changed to show their changes.
  2. Each segment on disk has a corresponding.del file. When the delete request is sent, the document is not actually deleted, but is marked as deleted in the.del file. The document will still match the query, but will be filtered out of the results. When segments are merged, documents marked as deleted in the. Del file will not be written to the new segment.
  3. When a new document is created, Elasticsearch assigns a version number to the document, and when the update is performed, the old version of the document is marked as deleted in the.del file, and the new version of the document is indexed to a new segment. Older versions of documents still match the query, but are filtered out of the results.

How to optimize the Settings for Linux when Elasticsearch is deployed

  1. Disable cache swap;

  2. Heap memory is set to: Min (node memory /2, 32GB);

  3. Set the maximum number of file handles.

  4. Adjust thread pool + queue size according to service needs;

  5. Disk storage RAID – Raid 10 can be used to improve the performance of a single node and avoid storage failures of a single node.

How to avoid the brain split problem?

  1. When the cluster has at least three master candidates, the split brain problem can be solved by setting the minimum number of votes (discovery.zen.minimum_master_nodes) to exceed half of all candidate nodes.
  2. When the number of candidates is two, only the master candidate can be changed, and the other candidates can be used as data nodes to avoid the brain-splitting problem.

For the GC side, what should I look out for when using Elasticsearch?

  1. The index of the inverted dictionary needs to be resident in memory and cannot be GC, so the segmentMemory growth trend on the data node needs to be monitored.
  2. To determine whether heap is sufficient, the heap usage of a cluster must be continuously monitored based on actual application scenarios.
  3. All types of caches, field cache, filter cache, indexing cache, bulk queue, etc., should be set to a reasonable size and the heap should be sufficient in the worst-case scenario, i.e., when all types of caches are full. Is there heap space available for other tasks? Avoid using clear Cache to free memory.
  4. Avoid searches and aggregations that return large result sets. The Scan & Scroll API can be used for scenarios that require a large amount of data pulling.
  5. Cluster STATS resides in memory and cannot be horizontally expanded. A large cluster can be divided into multiple clusters connected by a tribe Node.

How to implement Elasticsearch aggregation for large data (tens of millions of magnitude)?

The first approximation aggregation provided by Elasticsearch is cardinality metrics. It provides the cardinality of a field, the number of distinct or unique values for that field. It is based on the HLL algorithm. HLL will first for our input hash algorithm, and then according to the result of the hash operation base of the bits do probability estimation is obtained. It features configurable precision to control memory usage (more precision = more memory); Small data sets are very accurate; We can configure parameters to set the fixed amount of memory required for deduplication. Whether unique values are in the thousands or billions, the amount of memory used depends only on the precision of your configuration.

How does Elasticsearch guarantee read/write consistency in concurrent cases?

  1. Optimistic concurrency control can be used by version numbers to ensure that the new version is not overwritten by the old version, leaving the application layer to handle specific conflicts;
  2. In addition, for write operations, the consistency level supports quorum/ One /all, which defaults to quorum, that is, write operations are allowed only when most shards are available. But even if most are available, there may be a failure to write to the copy due to network reasons, so that the copy is considered faulty and the shard is rebuilt on a different node.
  3. For read operations, you can set replication to sync(the default), which makes the operation return only after both master and replica sharding is complete. If Replication is set to ASYNc, you can also query the master shard by setting the search request parameter _preference to primary to ensure that the document is the latest version.

The tree in the dictionary?

The core idea of Trie is to use the common prefix of the string to reduce the cost of query time to improve efficiency. It has three basic properties:

  1. The root node contains no characters, and each node except the root node contains only one character.

  2. From the root node to a node, the characters passing through the path are concatenated to be the string corresponding to the node.

  3. All children of each node contain different characters.

  4. As you can see, the number of nodes at each level of the trie tree is 26^ I. So to save space, we can also use dynamic linked lists, or we can use arrays to simulate dynamics. And space costs no more than the number of words times the length of words.

  5. Implementation: for each node to open a letter set size array, each node to hang a linked list, using left son and right brother notation to record the tree;

  6. For the Chinese dictionary tree, the child nodes of each node are stored in a hash table, so as not to waste too much space, and the query speed can retain the complexity of the hash O(1).

How is spelling correction implemented?

Spelling correction is based on editing distance. Edit distance is a standard method of representing the minimum number of operation steps required to convert from one string to another through insert, delete, and replace operations

This article is organized based on network articles