This is the first day of my participation in Gwen Challenge

This article is participating in “Java Theme Month – Java Development in Action”, see the activity link for details

Have you ever used ElastricSearch in your project? How much do you know about it? Elasticsearch is a distributed, scalable, near-real-time search and analytics engine. Today we will unveil its mystery (hereinafter referred to as ES).

ES is an open source search engine written in Java. It uses Lucene to index and search internally. By encapsulating Lucene, IT hides the complexity of Lucene and provides a simple and consistent RESTful API instead. However, ES is more than Lucene, and more than just a full-text search engine, it has the following features:

  • A distributed real-time document store where each field can be indexed and searched;
  • A distributed real-time analysis search engine;
  • Capable of extending hundreds of service nodes and supporting PB level of structured or unstructured data.

I. Structured data and unstructured data

First of all, we will data can be divided into two kinds, one kind is “structured data” : also known as the row data, refers to a fixed format or limited length of the data, is a two-dimensional table structure to express the logic and implementation of data, strictly follow the length of data format and specification, mainly through the relational database to store and manage, such as database, metadata, etc; One type is “unstructured data” : also known as full-text data, variable length or no fixed format, not suitable for two-dimensional table performance by the database, including all formats of office documents, XML, HTML, Word documents, emails, all kinds of reports, pictures and audio and video information.

If you want to be more specific, XML and HTML can be classified as semi-structured data. Because they also have their own specific tag formats, they can be processed as structured data as needed, or plain text can be extracted as unstructured data.

According to two kinds of data classification, search is also divided into two kinds:

(1) Structured data search: because of their specific structure, we can generally store and search through two-dimensional tables of relational databases (MySQL, Oracle, etc.), and also build indexes.

(2) Unstructured data search: that is, full-text data search, there are mainly two methods: sequential scanning, full-text search;

  • Sequential scan: Searches for specific keywords in sequential scan mode. For example, I give you a newspaper and let you find out where the words “AH Q” appeared in the newspaper. You definitely need to scan the newspaper from cover to cover and mark where the keyword appears and where it appears. This method is undoubtedly the most time-consuming and inefficient.
  • Full-text search: Extract some information from unstructured data, reorganize it, and make it into a certain structure (this structure is called “index”), and then search the structured data, so as to achieve the purpose of searching relatively fast. The main work of this approach is the creation of the index in the early stage, but it is fast and efficient for the late search.

Second, the Lucene

Lucene, a subproject of apache Software Foundation 4 Jakarta, is an open source full text search engine toolkit, but it is not a full text search engine, but a full text search engine architecture, providing a complete query engine and indexing engine, Partial text analysis engine (In English and German). Lucene’s goal is to provide software developers with an easy-to-use toolkit to facilitate the implementation of full-text search functions in the target system, or to build a complete full-text search engine on this basis. Lucene is an open source library for full text retrieval and searching, supported and provided by the Apache Software Foundation.

The open source available full-text search engines based on Lucene that I am familiar with mainly include “Solr and ES” : ES itself has the characteristics of distribution and easy installation and use, while Solr’s distribution needs to be realized by a third party, for example, ZooKeeper is used to achieve distributed coordination management.

Next, let’s talk about an important query structure that Lucene uses to enable full-text search: the “inverted index.”

For example,

  • Java is the best programming language.
  • PHP is the best programming language.
  • Javascript is the best programming language.

Use a word splitter to break the content field of each document into separate terms (we’ll call them terms or terms), create a sorted list of all non-repeating terms, and then list which document each Term appears in.The results of

This structure consists of a list of all non-repeating words in a document, with a document list associated with each word. This structure of “attribute values to determine the location of records is called an inverted index.” A file with an inverted index is called an inverted file.

Convert the above into a graph to illustrate the structure of the inverted index

The concept is as follows:

  • Term: The smallest unit of storage and query in an index. In English, it is a word, but in Chinese, it usually refers to the word after the participle.
  • Term Dictionary: or Dictionary, which is a collection of terms. The usual unit of index for search engines is words. A dictionary is a collection of strings made up of all the words that have appeared in a collection of documents. Each index item in the dictionary contains some information about the word itself and a pointer to an inverted list.
  • Post list: A document is usually composed of several words. An inversion list records which documents and where a word appears. Each record is called a Posting. Inversion lists not only record document numbers, but also store information such as word frequency.
  • Inverted files: An Inverted list of all Inverted words is often stored sequentially in a File on disk called an Inverted File, which is a physical File that stores Inverted indexes.

Dictionary and inversion list are two important data structures in Lucene, which are the cornerstone of fast retrieval. The dictionary and the inverted file are stored in two parts, the dictionary in memory and the inverted file on disk.

3. Concept explanation in ES

In order to better understand ES, the concept of ES is introduced below:

(1) Near Realtime (NRT) : Near Realtime. First, there is a small delay (default: 1 second) from writing data to data can be searched. Second, search and analysis based on ES can reach the second level;

(2) Cluster: An ES Cluster consists of one or more ES nodes. You can add each node to the Cluster by setting the same cluster.name (default value: ElasticSearch). Ensure that different cluster names are used in different environments, otherwise you will end up with nodes joining the wrong cluster. The cluster construction of ES is very simple. It does not need to rely on third-party coordination and management components, and it realizes the cluster management function internally.

(3) Node: An ES service startup instance is a Node. The node name is set by node.name. If this is not set, a random universal unique identifier (randomly assigned by default) is assigned to the node at startup. The default node is added to a cluster named “ElasticSearch”. If you start a bunch of nodes, they will automatically form a ElasticSearch cluster, but a node can also form a ElasticSearch cluster.

(4) Index: a simple understanding is a database, Index has a name, contains a pile of similar structure of document data, an Index contains many documents, an Index represents a class of similar or the same document. For example, if you create a product index, a product index, you might have all of the product data in it, all of the product documents. An index is a place where relevant data is kept. An index is actually a logical namespace that points to one or more physical shards.

(5) What Type of food do you like? Each index can have one or more types. Type is a logical data classification in index, and document under a type has the same field. For example, blog system has an index that can define user data type, blog data type, Comment on the data type.

(6) Document &field: Document is a line of data, the smallest data unit in ES, a Document can be a customer data, a commodity classification data, usually represented by JSON data structure, each type under index, can store multiple documents. A document contains multiple fields, each of which is a data field.

4. Election principle of ES cluster

① Node role

The nodes are divided into master node, candidate master node, data node, and coordination node. The candidate master node and data node are specified in the configuration file, while the master node is elected by the candidate master node, and the coordination node can be any node. The following describes their functions:

Data node: stores Data and performs related operations, such as adding, deleting, modifying, querying, and aggregating Data. Therefore, Data nodes have high requirements on machine configurations and consume large AMOUNTS of CPU, memory, and I/O resources. Often as a cluster grows, more data nodes need to be added to improve performance and availability.

“Candidate primary node” : can be elected as the Master node. Only the candidate primary node has the right to vote and be elected in the cluster. Other nodes do not participate in the election.

“Master node” : responsible for creating indexes, dropping indexes, tracking which nodes are part of the cluster, deciding which shards to assign to related nodes, tracking the status of nodes in the cluster, and so on. The active node and other nodes check each other by Ping each other. The active node pings all other nodes to check whether any node is down. Other nodes also use the Ping command to check whether the active node is available. A stable primary node is very important to the health of the cluster.

“Coordinating node” : although do to node role to distinguish, but the user’s request can be sent to any one node, and collected by the node is responsible for distributing requests, results of operations, and do not need to forward the master node, this node can be called coordinate nodes, coordinating node does not need to specify and configuration, any node in the cluster can act as the role of coordinating node.

Each node in ES can be either a candidate primary node or a data node. / config/elasticsearch. Yml can be set, the default is true.

Node. master: true // Whether to be a candidate primary node node.data: true // Whether to be a data nodeCopy the code

However, because data nodes consume a lot of CPU and memory CORE I/O, if a node is both a data node and a master node, the master node may be affected and the status of the entire cluster may be affected. Therefore, in order to improve the health of the cluster, we should divide and isolate the nodes in the ES cluster in terms of roles, and several machine groups with low configuration can be used as candidate primary node groups.

② Discovery and election mechanism

Discovery Mechanism (Zen Discovery)

How can ES connect different nodes to the same cluster with the same cluster.name setting? The answer is Zen Discovery. Zen Discovery is a built-in default Discovery module for ES (the Discovery module is responsible for discovering nodes in the cluster and electing Master nodes). It provides unicast and file-based discovery, and can be extended to support cloud environments and other forms of discovery through plug-ins. Zen Discovery is integrated with other modules: for example, all communication between nodes is done using the Transport module, and nodes use the Discovery mechanism to Ping other nodes.

ES is configured to use unicast discovery by default to prevent nodes from inadvertently joining the cluster, and only nodes running on the same machine will automatically cluster. If the cluster’s nodes are running on different machines and use unicast, you can provide the ES node with a list of nodes it should try to connect to — a unicast list. When the node contacts a member of the unicast list, it gets the status of all nodes in the entire cluster, and then it contacts the Master node and joins the cluster. This means that the unicast list does not need to contain all the nodes in the cluster, it just needs enough nodes when a new node contacts one and talks to it. If you use Master candidate nodes as a unicast list, you only need to list three. This configuration is in the elasticSearch.yml file:

discovery.zen.ping.unicast.hosts: ["host1"."host2:port"]

Copy the code

ES node starts to Ping, if discovery. Zen. Ping. Unicast. The hosts are set, the set of Host Ping, or try to Ping the local ports. ES supports the startup of multiple nodes on the same host. The Response of Ping contains the basic information about the node and the node that the node considers to be the Master node.

Master the election

The distributed system approach ensures that the elected master is recognized by quorum’s master-eligible nodes, thus ensuring that there is only one master. This quorum is configured through the following configuration:

conf/elasticsearch.yml:
    discovery.zen.minimum_master_nodes: 2
Copy the code

(1) Master elects who to initiate it and when?

A master election is, of course, initiated by a master-eligible node when a master-eligible node finds that it meets the following criteria:

  • The current status of the master-eligible node is not Master;
  • The master-eligible node queries other nodes in the cluster through the Ping operation of the ZenDiscovery module. None of the nodes is connected to the master node.
  • There are more than minimum_master_nodes that are not connected to master.

A master election can be initiated when a node finds that the majority of master-eligible nodes, including itself, believe that the cluster does not have a master.

(2) When it is necessary to elect master, who is to be elected?

If none of the nodes has a considered Master, the candidate primary node is selected from all the candidate primary nodes in a simple lexicographical order of ID (ID is randomly generated when the node is first started), taking the first one. The reason for this design should be to make the election result as stable as possible, do not want to be master but can not be elected.

(3) When will the election be successful?

Minimum_master_nodes: If the number of primary candidates does not reach the minimum limit, the process is repeated until there are enough primary candidates to start the election. If there is only one Local node, you elect yourself. If the current node is selected as Master, wait until the number of nodes reaches discovery.zen.minimum_master_nodes and then service is available. If the current node is not the Master, try to join the Master.

For example:

When a master-eligible node(we assume Node_A) initiates an election:

Master: Node_A sends a join request to Node_B. Master: Node_A sends a join request to Node_B

  • If Node_B becomes Master, Node_B adds Node_A to the cluster and advertises the latest Cluster_state. The latest Cluster_state contains Node_A’s information, which is equivalent to a normal new node joining. For Node_A, when the new Cluster_state is published to Node_A, Node_A will complete the join.
  • If Node_B is running for Master, Node_B will treat the join as a vote. In this case, Node_A waits to see if Node_B can become a true Master until it times out or another Master is selected.
  • If Node_B does not think it is the Master(it is not and will not be), then Node_B will reject the join. In this case, Node_A initiates the next round of elections.

B. Suppose Node_A selects itself as Master:

In this case, NodeA will wait for other nodes to join, that is, wait for the votes of other nodes. When more than half of the votes are collected, NodeA considers itself as the master node, and then changes the master node in the Cluster_state to itself and announces the message to the cluster.

③ Split brain phenomenon

If multiple Master nodes are elected in a cluster due to network or other reasons, data update is inconsistent, which is called split brain. In this case, different nodes in the cluster have different choices for Master nodes, resulting in multiple Master contention.

Split-brain problems can be caused by several factors:

  • “Network problem” : Some nodes cannot access the Master due to the network delay between clusters. The Master is considered to have died and a new Master is elected. The shards and replicas on the Master are marked red and new Master shards are assigned.
  • “Node load” : The role of the Master node is both Master and Data. When there is a large number of visits, ES may stop responding (in a state of suspended state), resulting in a large area of delay. At this time, other nodes can not get the response from the Master node, and consider that the Master node has hung down, so they will re-select the Master node.
  • Memory reclamation: The primary node serves as both Master and Data. If the ES process on the Data node occupies a large amount of memory, the JVM reclaims a large amount of memory, causing the ES process to lose response.

In order to avoid the occurrence of brain split phenomenon, we can start from the reasons to make optimization measures through the following aspects:

  • Adjust response time appropriately to reduce misjudgment: Run discovery.zen.ping_timeout to set the response time of the node status. The default value is 3s, and you can adjust the response time appropriately. If the Master does not respond within the response time range, the node is considered dead.
  • “Election trigger” : We need to set the value of the discovery.zen.munimum_master_nodes parameter in the configuration file of the nodes in the candidate cluster. The default value is 1. The official recommended value is (master_eligibel_nodes/2)+1, where master_eligibel_nodes is the number of candidate primary nodes. This can not only prevent the occurrence of split brain, but also maximize the high availability of the cluster, because the election can be normal as long as there are not more than discovery.zen.munimum_master_nodes candidate nodes alive. When the value is smaller than this value, the election behavior cannot be triggered, the cluster cannot be used, and the fragmentation disorder will not be caused.
  • “Role separation” : that is, the role separation between the candidate master node and the data node mentioned above can reduce the burden of the master node, prevent the pseudo-death of the master node, and reduce the misjudgment of the “dead” master node.

Ah Q plans to make a systematic study and explanation of ES knowledge, so we will continue to output the relevant knowledge of ES in the future. If you are interested, you can follow GZH “Ah Q Said code”, or add a friend of Ah Q qingqing-4132, ah Q is looking forward to your arrival!

Background message to get Java dry goods materials: study notes and big factory interview questions