Share ElasticSearch with ElasticSearch!

(1) Introduction

The goal of ElasticSearch is to search. When the amount of data is small, we can search the data in the relational database through the index, but if the amount of data is large, the search efficiency will be very low, this time we need a distributed search engine. Elasticsearch is a Lucene-based search server. It provides a distributed multi – user – capable full – text search engine based on RESTful Web interface.

ES is mainly used for full-text search, structured search and analysis. ES is widely used. For example, Wikipedia and Github all use ES to search.

(2) Understanding of core concepts

2.1 Data Structure

Since ES is for searching, it must also need to store data. In relational databases such as Mysql, data is stored according to the following logic:

A database has multiple tables. Each table has multiple rows of data. Each row consists of multiple columns.

The storage in ES looks like this:

An index (indeces) is equivalent to a database, each index has multiple types (equivalent to table structures), each index has multiple documents (equivalent to rows), and each document consists of multiple fields (equivalent to fields).

You can think of ES as a document-oriented database. Here’s a graph to illustrate the similarities between ES and a relational database:

It is worth noting that types will be slowly abandoned in THE ES7.x release and completely abandoned in the 8.x release.

2.2 Indeces and Documents

An index in ES is not the same thing as an index in Mysql. An index in ES is a collection of documents. An index is a database.

As mentioned earlier, ES is document-oriented. Documents are the most important unit in ES. Documents are strips of data. There are several important concepts in the document:

1. A document contains multiple keys: values

2. A document is just a JSON string

2.3 shard

ES is a distributed search engine, sharding is a collection of data distributed into multiple shards. An index is a copy of each shard, which can also handle query requests.

Now assume that the cluster has two nodes, and set the number of shards to 5 and the number of copies to 1, then the data storage structure will look like this, to ensure that the copies and shards are on different nodes:

2.4 Inverted index

Why ES search so fast, and the use of inverted index also has a certain relationship. Inverted indexes establish a mapping between participles and documents. Below through a simple example to explain what is the inverted index

In the original data, we associated labels by document ID, but we need to traverse all documents when querying. By inverting the index, we can find the best match by keyword.

(3) Basic operations of ES

ES operates in a Restful style, making it easy for programmers accustomed to writing CRUD. ES operations can be performed using Kibana or directly invoked using Postman, because ultimately it is a restful operation. Here I use the ES plugin for Idea to call directly.

3.1 Creating a Document

PUT http://ip:port/ Index name/type name/document ID

{

"key":"value"
Copy the code

}

Since the type name will be removed in a later release, we can use _doc to represent the default type:

PUT http://ip:port/ Index name /_doc/ Document ID

The operation screenshot is shown below

After creating an index with put, we can see the corresponding data in the head

3.2 Creating indexes with data types

We did not specify a specific data type when we created the data in 3.1, but of course we can specify a data type for the index

PUT http://ip:port/ Indicates the index name

Parameter Example:

{

“mappings”: {

"properties": {

  "name": {

    "type": "text"

  },

  "address": {

    "type": "text"

  }

}
Copy the code

}

}

The core data types in ES are as follows:

(1) The string type is text, keyword

(2) Number types: long, INTEGER, short, byte, double, float, half_float, SCALed_float

(3) Date

(4) Date nanosecond: date_nanos

(5) Boolean: Boolean

I’ll put you into a Binary

(7) Range: integer_range, float_range, long_range, double_range, date_range

3.3 Viewing Index or Document Data

The index and document information can be viewed with a GET request:

GET http://ip:port/ index name # view index

GET http://ip:port/ index name/type name/document ID

3.4 Modifying Data

As with creating data, the PUT operation updates the original data:

PUT http://ip:port/ Index name/type name/document ID

{

"key":"value"
Copy the code

}

If it is modified, the version in the response result will be increased.

Another way is to use a Post request:

POST http://ip:port/ Index name/type name/document ID /_update

Parameter examples:

{

“doc”: {

"name": "javayz4"
Copy the code

}

}

This is preferred, because if you forget to add a key with the PUT method, the update becomes a new one

3.5 Deleting Data

DELETE data by DELETE

DELETE http://ip:port/ index name/type name/document ID # DELETE a specific document

DELETE http://ip:port/ index name # DELETE index

(4) ES search operation

The most important thing about ES is its search operation.

4.1 Simple Search

Take the search parameters directly to the link:

GET http://ip:port/ Index name /_search? q=key:value

The results are as follows:

4.2 Passing Parameters through Param

In addition to placing the parameters in links, you can also pass the parameters in the JSON request body, where FROM and size are paging parameters, query passes the query criteria, _source represents the columns to be displayed in the result, and all are displayed if not written.

GET http://ip:port/ Index name /_search

Parameter Example:

{

“from”: 0,

“size”: 20,

“query”: {

"match": {

  "name": "javayz2"

}
Copy the code

},

“_source”: [“name”,”address”]

}

In addition to the arguments in the above example, there are many other arguments that can be used, such as sorting:

“sort”: [

{

"age": {

  "order": "desc"

}
Copy the code

}

]

Multi-condition query: must indicates that both of the following conditions must be met. You can also fill in should indicating that either condition must be met, or must_not indicating the opposite value of must

“query”: {

"bool": {

  "must": [

    {

      "match": {

        "name": "javayz"

      }

    },

    {

      "match": {

        "address": "hz"

      }

    }

  ]

}
Copy the code

}

If you have collections in your data, you can query multiple conditions with Spaces:

Highlighting queries are also supported during the query process

“highlight”:{

“pre_tags”: ““,

“post_tags”: ““,

“fields”: {

"name": {}
Copy the code

}

}

(5) word participle

The so-called word segmentation is to divide a paragraph into key words and search according to these keywords. A good word divider is the Chinese IK word divider.

The basic use

Download link: github.com/medcl/elast…

Download the same version as your ES, create a new IK folder in plugin directory, decompress the downloaded files to ik directory, and restart.

The IK word divider provides two algorithms:

1. Ik_smart: minimum segmentation

2, IK_MAX_WORD: the most fine particle division

First, the least shard is the least shard given according to the dictionary:

Ik_max_word is the finest partition and will give the most results:

{

“analyzer”: “ik_max_word”,

“Text “:” I’m a Java engineer”

}

Results:

{

“tokens”: [

{" token ":" I ", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0}, {" token ":" ", "start_offset" : 1, "end_offset": 2, "type": "CN_CHAR", "position": 1 }, { "token": "java", "start_offset": 2, "end_offset": 6, "type": "ENGLISH", "position" : 2}, {" token ":" engineer ", "start_offset" : 6, "end_offset" : 9, "type" : "CN_WORD", "position" : 3}, {" token ", "engineering", "start_offset" : 6, "end_offset" : 8, "type" : "CN_WORD", "position" : 4}, {" token ": "Teacher," "start_offset" : 8, "end_offset" : 9, "type" : "CN_CHAR", "position" : 5}Copy the code

]

}

Well, today’s article is here, I hope to help you confused screen!