Introduction to ElasticSearch

Spring Security has been serialized for a while now, so I will take the time to put together a series of tutorials for you. Recently, I have had a bit of a rest. Although I can go to bed early without writing a tutorial, I feel bored and need to find something to do. I feel it is time to start a new journey

I wanted to write an ES tutorial during the National Day of 2018 when ElasticSearch’s parent company went public, but it didn’t work. It’s always been a bit of a headache for me. Recently I happened to have a time gap and wondered if I could finish the series.

Unlike previous tutorials, this one is going to be a mix of video and graphics. Video is the main, supplemented by pictures and texts. I will upload the video to Baidu network disk, there will be a link to download the corresponding video at the end of the article.

ElasticSearch is currently very popular, it can be used for site search, log analysis, and can be used as a NoSQL database.

Next, let’s start our es journey through the following simple introduction

Songo made a video about this article, which reads:

Video download link: pan.baidu.com/s/1bIvtBv9O… Extraction code: PM94

1.Lucene

Lucene is an open source, free, high-performance, pure Java full-text search engine that can be regarded as the best full-text search toolkit in the open source field.

In practical development, Lucene is suitable for almost any scenario that requires full text retrieval, so Lucene has developed many language versions, such as C++, C#, Python and so on.

Lucene was upgraded to Apache’s top open source project back in 2005. It was written by Doug Cutting, and some of you may not have heard of him, but you certainly have heard of his other great work, Hadoop.

However, it is important to note that Lucene is only a toolkit, not a complete search engine, and developers can build complete search engines based on Lucene. ElasticSearch is better in distributed and big data environments than Solr and ElasticSearch.

Lucene has the following features:

simple
cross-language
Powerful search engine
Fast index speed
Index files are compatible with different platforms

2.ElasticSearch

ElasticSearch is a distributed, scalable, near-real-time, high-performance search and data analysis engine. ElasticSearch is written in Java. By wrapping Lucene further, it hides the complexity of search and allows developers to perform full-text searches using a simple RESTful API.

ElasticSearch works well in distributed environments, which is why it’s so popular. It supports structured or unstructured massive data processing at PB level

Overall, ElasticSearch has three main features:

Data collection
The data analysis
Data is stored

ElasticSearch features:

Distributed file storage.
Distributed search engine for real-time analysis.
High scalability.
Pluggable plug-in support.

3. The installation

3.1 Installing a Single Node

Go to Es and find Elasticsearch:

www.elastic.co/cn/elastics…

Then click the download button and choose the appropriate version to download directly.

Decompress the downloaded file. The meanings of the decompressed directory are as follows:

directory	meaning
modules	Dependent module directory
lib	Third party dependent library
logs	Log Output directory
plugins	Plug-in directory
bin	Directory of executable files
config	Configuration file Directory
data	Data storage directory

Startup mode:

Go to the bin directory and run the./ ElasticSearch command to start elasticSearch.

If started is displayed, the startup is successful.

The default listening port is 9200. Therefore, enter localhost:9200 to view node information.

The name of the node and the name of the cluster (default is ElasticSearch) are configurable.

Open the config/ elasticSearch. yml file to configure the cluster name and node name. The configuration mode is as follows:

cluster.name: javaboy-es
node.name: master
Copy the code

After the configuration is complete, save the configuration file and restart es. After the restart, refresh the localhost:9200 page to view the latest information.

Es support matrix:

www.elastic.co/cn/support/…

3.2 HEAD plug-in Installation

Elasticsearch-head allows you to view cluster information visually.

This section describes the two installation methods.

3.2.1 Installing the Browser Plug-in

Search for ElasticSearch-head in the App Store for Chrome and click Install.

Download elasticSearch-head, download elasticSearch-head

3.2.2 Downloading and installing the plug-in

Four steps

git clone git://github.com/mobz/elasticsearch-head.git
cd elasticsearch-head
npm install
npm run start

The following page is displayed:

Notice that the cluster data is not visible at this point. The reason is that the cluster data is requested in a cross-domain manner. By default, the cluster does not support cross-domain, so the cluster data is not seen here.

Add the following information to the es config/elasticsearch.yml config file to enable cross-domain support:

http.cors.enabled: true
http.cors.allow-origin: "*"
Copy the code

After the configuration is complete, restart es and the head will have data.

3.3 Distributed Installation

Assumptions:

A master from 2
The master port is 9200. The slave port is 9201 and 9202

Alter master config/elasticsearch.yml

node.master: true
network.host: 127.0. 01.
Copy the code

After the configuration, restart the master.

Decompress the ES package and name it Slave01 and Slave02 respectively to represent two slave computers.

Configure them separately.

Slave01 / config/elasticsearch. Yml:

The cluster names must be the same
cluster.name: javaboy-es
node.name: slave01
network.host: 127.0. 01.
http.port: 9201
discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
Copy the code

Slave02 / config/elasticsearch. Yml:

The cluster names must be the same
cluster.name: javaboy-es
node.name: slave02
network.host: 127.0. 01.
http.port: 9202
discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
Copy the code

Then start Slave01 and Slave02 respectively. Once started, you can view cluster information on the HEAD plug-in.

3.4 Kibana installation

Kibana is an analytics and data visualization platform from Elastic that allows you to search and view data stored in ES.

The installation steps are as follows:

Download Kibana: www.elastic.co/cn/download…
Unpack the
Configure the es address information (optional. If es is the default address and port, you do not need to configure it. The specific configuration file is config/kibana.yml)
Run the./bin/kibana file to start
localhost:5601

Once Kibana is installed, when you first open it, you can choose to initialize or not use the test data provided by ES.

4. Core concepts of ElasticSearch

4.1 Top 10 Core ElasticSearch Concepts

4.1.1 Cluster (Cluster)

One or more servers installed with ES nodes are grouped together, which is called a cluster. These nodes hold data and provide search services together.

A cluster has a name that uniquely identifies the cluster. This name is called the cluster name. The default cluster name is ElasticSearch.

You can configure the cluster name in the config/ elasticSearch.yml file:

cluster.name: javaboy-es
Copy the code

In a cluster, nodes can be in three states: green, yellow, and red:

Green: The node is in the healthy state. All master and replica shards work properly.
Yellow: indicates that the node is in warning state. All master shards are currently running, but at least one replica shard is not working properly.
Red: The cluster cannot work properly.

4.1.2 Node (Node)

A server in a cluster is a node that stores data and participates in the indexing and search functions of the cluster. To add a node to a cluster, you only need to set the cluster name. By default, if we start multiple nodes and multiple nodes can discover each other, they will automatically form a cluster, which is provided by ES by default, but this approach is not reliable and may cause brain splitting. Therefore, you are advised to manually configure the cluster information.

4.1.3 Index

Indexes can be understood in two ways:

noun

A collection of documents with similar characteristics.

The verb

Index data and perform index operations on data.

4.1.4 Type (Type)

A type is a logical category or partition on an index. Prior to ES6, you could have multiple types in an index. Starting with ES7, you could only have one type in an index. In ES6.x, compatibility is still maintained and the single index multiple type structure is still supported, but this is no longer recommended.

4.1.5 Document (Document)

A unit of data that can be indexed. For example, a user’s documentation, a product’s documentation, and so on. The documents are in JSON format.

4.1.6 Sharding (Shards)

Indexes are stored on nodes. However, due to the space size and data processing capacity of nodes, the processing effect of a single node may not be ideal. In this case, we can fragment indexes. When creating an index, we need to specify the number of shards. Each shard is itself a fully functional and independent index.

By default, one shard is automatically created for an index and a copy is created for each shard.

4.1.7 Replicas

A copy, also known as a backup, is a backup of the master shard.

4.1.8 Settings

Definition information for indexes in the cluster, such as the number of index fragments, number of copies, and so on.

4.1.9 Mapping

Mapping stores information such as the storage type, word segmentation, and whether to store index fields.

4.1.10 Analyzer

Definition of field word segmentation.

4.2 ElasticSearch Vs Relational database

Relational database	ElasticSearch
The database	The index
table	type
line	The document
column	field
Table structure	Mapping
SQL	DSL(Domain Specific Language)
Select * from xxx	GET http://
update xxx set xx=xxx	PUT http://
Delete xxx	DELETE http://
The index	The full text indexing

This article is actually the scene recorded Es video tutorial notes, notes some relatively simple, friends can also refer to the video, video download link: pan.baidu.com/s/1bIvtBv9O… Extraction code: PM94