Elastic Search goes from installation to easy to use

1. Download the image

Docker pull docker. Elastic. Co/elasticsearch/elasticsearch: 7.3.2Copy the code

2. Run

Development environment running, stand-alone version

docker run -it -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" Docker. Elastic. Co/elasticsearch/elasticsearch: 7.3.2Copy the code

It is recommended to use the official Kibana production tool docker installation

Docker run - it - link (id) ES container d69781008e33: elasticsearch -p 5601:5601 kibana: 7.3.2Copy the code

When finished, open localhost:5601

3. The way to operate data

PUT New example: Add a document with id=1 to the index named Twitter. Format description: PUT INDEX/_doc/ID

PUT twitter/_doc/1 { "user": "GB", "uid": 1, "city": "Beijing", "province": "Beijing", "country": "China" response {} "_index" : "twitter" and "_type" : "_doc", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 }Copy the code

Parse the returned data structure

field instructions
_index INDEX indicates a library
_type TYPE Indicates the TYPE of the table
_id ID, ID primary key
_version Version, the version of the document that is updated, +1 for each update
result Result of the current execution
_shard SHARD Indicates the fragment information
_seq_no Document Version
_primary_term
GET Gets the INDEX information, including document mapping and sharding informationCopy the code

GET the query

Get the data indexed with Twitter ID =1

GET twitter / _doc {/ 1 / response "_index" : "twitter" and "_type" : "_doc", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "user" : "GB", "uid" : 1, "city" : "Beijing", "province" : "Beijing", "country" : "China" } }Copy the code
field instructions
_source Resources, data section

Get data from the user part of the source

GET twitter/_doc/1? "_index" _source = {user response: "twitter", "_type" : "_doc", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "user" : "GB" } }Copy the code

Obtain data in batches When you obtain data in batches, you need to specify the INDEX and ID to be queried, and you can also query some sources

GET _mget { "docs":[ { "_index":"twitter", "_id":1 }, { "_index":"twitter", "_id":2, "_source" : [" user ", "city"] {}}] response "docs" : [{" _index ":" twitter "and" _type ":" _doc ", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "user" : "GB", "uid" : 1, "city" : "Beijing", "province" : "Beijing", "country" : "China" } }, { "_index" : "twitter", "_type" : "_doc", "_id" : "2", "_version" : 1, "_seq_no" : 1, "_primary_term" : 1, "found" : true, "_source" : { "city" : "Beijing", "user" : "GB" } } ] }Copy the code

Another way of writing mget

Id in the query index array GET twitter / _mget {" ids: "[" 1", "2", "3"]}Copy the code

POST update When adding data, we usually use PUT and specify the ID, but if we want the ID to grow automatically, we need to use POST

POST twitter/_doc { "user": "GB", "uid": 1, "city": "Beijing", "province": "Beijing", "country": "China"} response {"_index" : "twitter", "_type" : "_doc", "_id" : "ranetnYB1rIShBts5iqO", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 3, "_primary_term" : 1 }Copy the code

When we need a local update, we need to add _update, syntax: POST INDEX/_update/ID

POST twitter / _update / 1 {" doc ": {" city" : "chengdu"}} {response "_index" : "twitter" and "_type" : "_doc", "_id" : "1", "_version" : 2, "result" : "noop", "_shards" : { "total" : 0, "successful" : 0, "failed" : 0 } }Copy the code

Upsert = insert or update, update if present, insert if not present. The syntax is to add doc_AS_upsert to the request body

POST Twitter /_update/5 {"doc":{"user": "GB", "uid": 1, "city": "Beijing", "province": "Beijing", "country", "China"}, "doc_as_upsert" : true} {response "_index" : "twitter" and "_type" : "_doc", "_id" : "5", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 8, "_primary_term" : 1} that is, a new record is insertedCopy the code

HEAD simple confirmation

HEAD twitter/_doc/1
200 - OK

Copy the code

DELETE DELETE a document DELETE INDEX/_doc/ID

DELETE twitter / _doc response {/ 5 "_index" : "twitter" and "_type" : "_doc", "_id" : "5", "_version" : 2, "result" : "deleted", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 9, "_primary_term" : 1 }Copy the code

Delete a query using POST INDEX/_delete_by_query

POST  twitter/_delete_by_query
{
  "query":{
    "match":{
      "city":"Changsha"
    }
  }
}

Copy the code

PATCH local update

Batch _bulk encapsulates many requests into one request for batch processing to improve the execution efficiency. The payload cannot be long, and ranges from 5M to 15M

POST _bulk { "index" : { "_index" : "twitter", "_id": 1}} {"user":" Double elms - Zhang SAN ","message":" Nice weather today, Walk to ", "uid" : 2, "age" : 20, "city", "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "haidian district in Beijing, China", "location" : {" lat ":" 39.970718 ", "l On ":" 116.325747 "}} {" index ": {" _index" : "twitter", "_id" : 2}} {" user ":" dongcheng district - liu ", "message" : "in yunnan, the next stop!" , "uid" : 3, "age" : 30, "city" : "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "China Beijing dongcheng district stylobate factory three 3", "location" : {" lat ":" 39.904313 "," Lon ":" 116.412754 "}} {" index ": {" _index" : "twitter", "_id" : 3}} {" user ":" dongcheng district - li si ", "message" : "happy birthday!" , "uid" : 4, "age" : 30, "city" : "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "China Beijing dongcheng district", "location" : {" lat ":" 39.893801 ", "says lon" : "1 16.408986 "}} {" index ": {" _index" : "twitter", "_id" : 4}} {" user ", "chaoyang district - old jia", "message" : "123, gogogo", "uid" : 5, "age" : 35, "city" : "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "Beijing chaoyang district built in China Abroad ", "location" : {" lat ":" 39.718256 ", "says lon" : "116.367910"}} {" index ": {" _index" : "twitter", "_id" : 5}} {"user":" user","message":"Happy BirthDay My Friend!" , "uid" : 6, "age" : 50, "city" : "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "chaoyang district in Beijing, China international trade", "location" : {" lat ":" 39.918256 ", "says lon" : "116.467910"}} {" index ": {" _index" : "twitter", "_id" : 6}} {"user":" hongqiao - Lao Wu ","message":" friends come today my birthday, friends come, what happy!" , "uid" : 7, "age" : 90, the "city" : "Shanghai", "province", "Shanghai", "country" : "Chinese", "address" : "China Shanghai minhang district", "location" : {" lat ":" 31.175927 ", "says lon" : "1 21.383328}}"Copy the code

Other commands

  1. Open/close Index

Opening and closing an index consumes a lot of resources. After closing an index, read/write operations are blocked. 2. Freeze/unfreeze index To unfreeze an index, the operation blocks writing to the index

Query classification

There are two types of queries in ES, query and aggregation. Query can perform full-text search, and aggregation can perform statistics and analysis. Of course, you can do either query or aggregation on a single request

query

_search

GET INDEX/_search hits; value represents the number of queries; relation represents the correlation relation max_scoure score; represents the matching score; approximately approaches the search value; the higher the score is.

GET twitter/_search { "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : "hits" : {0}, "total" : {" value ": 2, the" base ":" eq "}, "max_score" : 1.0, "hits" : [{" _index ": "Twitter," "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : {" user ":" GB "and" uid ": 1," city ": "Beijing", "province" : "Beijing", "country" : "China" } }, { "_index" : "twitter", "_type" : "_doc", "_id" : "CRx_x3YBqDTqoEh67ELY", "_score" : 1.0, "_source" : {" user ":" GB "and" uid ": 1," city ":" Beijing ", "province" : "Beijing", "country" : "China" } } ] } }Copy the code

Paging queries are also supported in the format: GET INDEX/_search? size=PAGE_SIZE&from=PAGE_INDEX

GET twitter/_search? size=2&from=1Copy the code

Source_filtering document filtering can specify the data to return, for example, we only need to return the user field in the document

GET twitter/_search
{
  "_source": ["user"],
  "query": {
    "match_all": {}
  }
}
Copy the code

It is also possible to specify data fields that are not returned. Through includes, excludes

GET twitter/_search
{
  "_source": {
    "includes": [
      "user*",
      "location*"
    ],
    "excludes": [
      "*.lat"
    ]
  },
  "query": {
    "match_all": {}
  }
}
Copy the code

_count Counts the queried data using _count

The GET twitter / _count {" query ": {" match" : {" user ":" GB "}}} {response "count" : 7, "_shards" : {" total ": 1, "successful" : 1, "skipped" : 0, "failed" : 0 } }Copy the code

The following are the query features and functions in ES, which are not detailed at the moment. You can query official documents and official blogs if necessary

1. Match Query Matches the query based on a certain field

2.Ids query Queries by ID

3. Multi_match Query multi-field matching

4.Prefix Query Matches the Prefix based on a field

5.Term Query matches precisely based on a certain field

Compound Query A compound query that combines the above types of queries together

Location query ES unique location query based on map, which can perform range fuzzy search

The wildcard

Wildcard supported by ES

character instructions For example,
* An exact match ‘* sea’ will be queried with prefix all and suffix ‘sea’

SQL support

GET /_sql {“query”:”SQL statement “}

GET /_sql
{
  "query":"select * from twitter where user='GB'"
}

Copy the code

aggregation

In actual production application scenarios, we usually do not need specific data, but need a general panel or statistical analysis data, and USUALLY THE BI department needs to analyze and make decisions on this part of data. Analyzing the data requires an aggregation framework that provides aggregated data based on search queries, and multiple aggregations can be combined.

Bucketing stores buckets, which build a series of aggregations of buckets, each of which is closely linked to documentation standards. When the aggregation is performed, the documents that match the conditions in the context are dropped into the appropriate bucket, and when we’re done, we get a list of buckets, each of which has a set of documents belonging to it. Aggregation can be associated with aggregation on Bucketing, which means that aggregation can be nested.

Metrics, aggregations that track and calculate metrics for a set of documents.

Martrix matrices, a series of aggregates that run on multiple fields and generate matrix results based on values extracted from the requested document fields.

Pipeline aggregates the output of other aggregates and their associated measures

Aggregation operations

The format of the aggregate request is generally

GET twitter/_search
{
    "size": 0,
    "aggs": {
    "file_name": {
        "aggs_type": {
                    <aggs_body>
            }
        }
    }
}
Copy the code
field The name of the instructions
size Results the size If we don’t care about the specific results of the search, but only the aggregated results, we can set it to 0
aggs The aggregation Aggs is short for Aggregations
file_name Aggregate field name User-defined name of an aggregation result
aggs_type Aggregate type Common types are range, Max, min, AVG and so on
aggs_body Aggregation type parameter The parameters are different for each aggregation type

Data preparation

DELETE twitter PUT twitter { "mappings": { "properties": { "DOB": { "type": "date" }, "address": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "age": { "type": "long" }, "city": { "type": "keyword" }, "country": { "type": "keyword" }, "location": { "type": "geo_point" }, "message": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "province": { "type": "keyword" }, "uid": { "type": "long" }, "user": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": POST _bulk 256}}}}}} {" index ": {" _index" : "twitter", "_id" : 1}} {" user ":" zhang ", "message" : "the weather is good today, Walk to ", "uid" : 2, "age" : 20, "city", "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "haidian district in Beijing, China", "location" : {" lat ":" 39.970718 ", "l On ":" 116.325747 "}, "DOB" : "1999-04-01"} {" index ": {" _index" : "twitter", "_id" : 2}} {" user ":" liu ", "message" : "in yunnan, the next stop!" , "uid" : 3, "age" : 22, "city" : "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "China Beijing dongcheng district stylobate factory three 3", "location" : {" lat ":" 39.904313 "," Lon ":" 116.412754 "}, "DOB" : "1997-04-01"} {" index ": {" _index" : "twitter", "_id" : 3}} {" user ":" bill ", "message" : "happy birthday!" , "uid" : 4, "age" : 25, "city" : "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "China Beijing dongcheng district", "location" : {" lat ":" 39.893801 ", "says lon" : "1 16.408986 "}, "DOB" : "1994-04-01"} {"index":{"_index":"twitter","_id":4}} {" user ", "old jia", "message" : "123, gogogo", "uid" : 5, "age" : 30, "city" : "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "China Beijing chaoyang district jianguomen", "Location" : {" lat ":" 39.718256 ", "says lon" : "116.367910"}, "DOB" : "1989-04-01"} {" index ": {" _index" : "twitter", "_id" : 5}} {" user ":" wang ", "message" : "Happy BirthDay to My Friend!" , "uid" : 6, "age" : 26, "city" : "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "chaoyang district in Beijing, China international trade", "location" : {" lat ":" 39.918256 ", "says lon" : "116.467910"}, "DOB" : "1993-04-01"} {" index ": {" _index" : "twitter", "_id" : 6}} {" user ":" Lao wu ", "message" : "friends come today is my birthday, friends come, what birthday happy it!" , "uid" : 7, "age" : 28, "city", "Shanghai", "province", "Shanghai", "country" : "Chinese", "address" : "China Shanghai minhang district", "location" : {" lat ":" 31.175927 ", "says lon" : "1 21.383328 "}, "DOB" : "1991-04-01"}Copy the code

Several common types of aggregation are described below

The Range aggregation

GET twitter/_search {"size": 0, "aggs": {"ageGroup": {"range": {"field": "age", "ranges": [ { "from": 20, "to": 22 }, { "from": 22, "to": 25 },{ "from": 25, "to": Response: 30}]}}}} {" aggregations ": {" ageGroup" : {" buckets ": [{" key" : "20.0-22.0", "from" : 20.0, "to" : 22.0, "doc_count" : 1}, {" key ":" 22.0-25.0 ", "from" : 22.0, "to" : 25.0, "doc_count" : 1}, {" key ": "" from 25.0 to 30.0", ": 25.0," to ": 30.0," doc_count ": 3}]}}}Copy the code

In the response result, we can see that there are many buckets in the result set of the aggregation, so that we can understand the definition in the previous concept here.

As mentioned above, buckets can be nested, or “aggregated again within an aggregate”. We can calculate the maximum and minimum values in the range after calculating the range statistics.

GET twitter/_search { "size": 0, "aggs": { "age": { "range": { "field": "age", "ranges": [ { "from": 20, "to": 22 }, { "from": 22, "to": 25 }, { "from": 25, "to": 30 } ] }, "aggs": { "avg_age": { "avg": { "field": "age" } }, "min_age": { "min": { "field": "age" } }, "max_age":{ "max": { "field": "Age"}}}}}} {response "aggregations" : {" age ": {" buckets" : [{" key ":" 20.0-22.0 ", "from" : 20.0, "to" : 22.0, "doc_count" : 1, "max_age" : {" value ", 20.0}, "avg_age" : {" value ", 20.0}, "min_age" : {" value ": 20.0}}, {" key ":" 22.0-25.0 ", "from" : 22.0, "to" : 25.0, "doc_count" : 1, "max_age" : {" value ": 22.0}, "avg_age" : {" value ", 22.0}, "min_age" : {" value ": 22.0}}, {" key" : "25.0-30.0", "from" : 25.0, "to" : 30.0, "doc_count" : 3, "max_age" : {"value" : 28.0}, "avg_age" : {"value" : 26.3333333333332}, "min_age" : {"value" : 25.0}}]}}}Copy the code

In the request, the top aggregate name ageGroup uses the range type aggregate, and then the aggregate name avg_age uses the AVG type, the aggregate name min_age uses the min type, and the aggregate name max_age uses the Max type.

Filters the aggregation

Filter, each bucket is associated with a filter, and the documents collected in each bucket are matched by the filter

GET twitter/_search { "size": 0, "aggs": { "by_cities": { "filters": { "filters": { "beijing": { "match": { "city": }} "Beijing", "Shanghai" : {" match ": {" city" : "Shanghai"}}}}}}} {response "aggregations" : {" by_cities ": {" buckets" : { "beijing" : { "doc_count" : 5 }, "shanghai" : { "doc_count" : 1 } } } } }Copy the code

In the aggregation request above, we add two filters, one of which is Beijing and one of which is Shanghai

Fiter polymerization

The aggregation of individual filters can be understood as a special form of fiters

For the evaluation of Beijing age GET twitter / _search {" size ": 0," aggs ": {" Beijing" : {" filter ": {" match" : {" city ":" Beijing "}}, "aggs" : { "avg_age": { "avg": { "field": "age" } } } } } }Copy the code

Date_range polymerization

Range is for numeric types, and date_range is a time aggregate

Query birth_search from January 1989 to January 1990 GET Twitter /_search {"size": 0, "AGgs ": {"birth_range": {"date_range": {"field": "DOB", "format": "yyyy-MM", "ranges": [ { "from": "1989-01", "to": "1990-01" } ] } } } }Copy the code

Terms polymerization

Query the frequency of occurrences of keywords through term aggregation

GET twitter/_search {"query": {"match": {"message": "happy birthday"}}, "size": 0, "aggs": {" city ": {" terms" : {" field ", "city", "size" : 10}}}} {response "aggregations" : {" city ": {"doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [{"key" : "Beijing ", "doc_count" : 0 2}, {"doc_count" : 1}]}}Copy the code

In the request body of the terms aggregate, size=10 refers to the first 10 documents counted, not the documents that appear 10 times. You can also order when you aggregate.

Histogram Aggregation

Even if the aggregation is derived from a bar chart, it is segmented in a bar chart.

GET twitter/_search { "size": 0, "aggs": { "age_histogram": { "histogram": { "field": "age", "interval": {2}}}} response "aggregations" : {" age_histogram ": {" buckets" : [{" key ": 20.0," doc_count ": 1}, {" key" : 22.0, "doc_count" : 1}, {" key ": 24.0," doc_count ": 1}, {" key" : 26.0, "doc_count" : 1}, {" key ": 28.0, "doc_count" : 1}, {" key ": 30.0," doc_count ": 1}]}}}Copy the code

Interval is the interval, and the bar chart above shows the columnar polymerization at an interval of 2 years.

Date_histogram histogram is date_histogram histogram

Column polymerization is performed based on date or range values

GET /_search {"size": 0, "AGgs ": {" age_AGgs ": {" date_AGgs ": {" date_AGGS ": {"field": "DOB", "interval": "year" } } } }Copy the code

Cardinality polymerization

It can be regarded as the number of a certain field type. For example, there are only two types of city field: Beijing and Shanghai

GET twitter/_search {"size": 0, "AGgs ": {"city_num": {"cardinality": {"field": "city"}}}} response {"aggregations" : { "city_num" : { "value" : 2 } } }Copy the code

Stats polymerization

Get the entire statistics for the age field

GET twitter/_search {"size": 0, "AGgs ": {"city_num": {"stats": {"field": "age"}}}} response {"aggregations" : {" city_num ": {" count" : 6, "min" : 20.0, "Max" : 30.0, "avg" : 25.166666666666668, "sum" : 151.0}}}Copy the code

You can see count, min, Max, AVg, sum, et cetera. Extended_stats can also be extended to show the square deviation, variance, standard deviation, and standard deviation range.

The Percentile polymerization

Percentage aggregation, which computes one or more percentiles from fields in a document. Percentiles are often used to find outliers.

GET twitter/_search { "size": 0, "aggs": { "NAME": { "percentiles": { "field": "age", "percents": [25, 50, 75, 99] {}}}} response "aggregations" : {" NAME ": {" values" : {" 25.0 ", 22.0, "50.0", 25.5, "75.0" : 28.0, "99.0" : 30.0}}}}Copy the code

As can be seen from the aggregation results, 25% of the people are 22 years old, 50% are 25.5 years old, 75% are 28 years old, and 99% are 30 years old. Sometimes we need to know exactly what proportion is achieved in a given standard, and then we need to use Percentile Ranks polymerization

GET twitter/_search { "size": 0, "aggs": { "age_40_percentage": { "percentile_ranks": { "field": "age", "values": [24]}}}} {response "aggregations" : {" age_40_percentage ": {" values" : {" 24.0 ", 37.5}}}}Copy the code

Aggregate the percentage of 24-year-olds, and you can see that it’s 37.5%.

Missing the aggregation

As ES is not so closely related to the fields in the relational database, if we add a field and want to query the documents without this field, Missing aggregation is needed at this time.

Analyzer

ES Analyzer parsing, a new document stored in ES goes through the following parts: Char Filters sort the document’s characters, such as HTML tags, which are rectifiers, and then the Tokenizer word splitter, which splits the string. Tokenizer Filter can be used to normalize or change the token. Tokenizer Filter can be used to normalize or change the token.

You can use _analyze to query the result of the parser parsing a character

GET twitter/_analyze {"text": ["Happy Birthday"], "Analyzer ": "standard"} response {"tokens" : [{"token" : "happy", "start_offset" : 0, "end_offset" : 5, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "birthday", "start_offset" : 6, "end_offset" : 14, "type" : "<ALPHANUM>", "position" : 1 } ] }Copy the code

“Happy Birthday” is parsed using a standard participle, and the result is that Happy and Birthday are split and indexed separately.

There are several commonly used word dividers

The word segmentation name instructions For example,
standard Standard word splitter, default word splitter
english English word divider After parsing, happi and birthdai are generated
whitespace Space word divider A. Happy b. Birthday c. Happy D. Birthday
simple Simple word divider Can recognize delimiters, such as’.
keyword Keyword segmentation Will use the entire text as a participle
You can also customize a word divider and so on.

Reference: Elastic Chinese community blog “Elastic: rookie started guide elasticstack.blog.csdn.net/article/det…