Three,`Elasticsearch`An introduction to

1. `Elasticsearch`The basic concept

1.1 document

ElasticsearchIs document-oriented, which is the smallest unit of all searchable data
- Log entries in log files
- Details about a movie/a record
- MP3A song/song on the playerPDFDetails in the document
The document will be serialized intoJsonFormat, save inElasticsearchIn the
- JsonObjects consist of fields
- Each field has a corresponding field type (string/numeric/Boolean/date/binary/range type)
There is one for each documentUniqueID
- You can specify it yourselfID
- Or byElasticsearchAutomatically generate

1.1.1 Json document

A document contains a series of fields. Similar to a record in a database table
JsonDocument, flexible format, do not need to define the format
- The type of the field can be specified or passedElasticsearchAutomatically calculated
- Support data/support nesting

1.1.2 Document metadata

{
    "index" : "movies",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 14.69302,
    "_source" : {
        "year" : 1995,
        "@version" : "1",
        "genre" : [
            "Adventure",
            "Animation",
            "Children",
            "Comedy",
            "Fantasy"
        ],
        "id" : "1",
        "title" : "Toy Story"
    }
},
Copy the code

Metadata, used to annotate relevant information about a document
- _index: Indicates the index name of the document
- _type: The name of the type to which the document belongs
- _id: Document uniqueID
- _source: Original documentJSONdata
- _all: consolidates all field contents into this field, has been abolished
- _version: Indicates the version of a document
- _score: Relevance score

1.2 the index

{
  "movies" : {
    "settings" : {
      "index" : {
        "creation_date" : "1604218204918",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "F9-uy1bUTemm1Hs_LaDMQQ",
        "version" : {
          "created" : "7010099"
        },
        "provided_name" : "movies"
      }
    }
  }
}
Copy the code

indexAn index is a container for documents, a combination of a class of documents
- indexEmbodies the concept of logical space: each index has its ownMappingDefinition, which defines the field name and field type that contains the document
- ShardRepresents the concept of physical space: data in an index is scattered amongShardIn the
The index ofMappingwithSetting
- MappingDefine the types of document fields
- SettingDefine different data distributions

1.2.1 Different semantics of indexes

Noun: aElasticsearchIn a cluster, you can create many different indexes
Verb: Save a document toElasticsearchAlso known as indexing (indexing)
- ESThe process of creating an inverted index
Noun: aB treeIndex, an inverted index

1.2.2 `Type`(I don’t know what that is.)

1.2.3 `ES`and`RDBMS`contrast

RDBMS	Elasticsearch
Table	index(Type)
Row	Document
Column	Filed
Schema	Mapping
SQL	DSL

Traditional relational databases andElasticsearchThe difference between
- Elasticsearch
  - Schemaless: Data organization is more flexible
  - Correlation: Calculate the degree of correlation
  - High performance full text search
- RDBMS
  - The transaction
  - Join

1.3 `REST API`

1.4 `kibana`Upper index management

GET movies/_count // GET movies/_count // POST movies/_search{} //_cat indices API // V&s =index GET /_cat/indices? V&health =green // GET /_cat/indices? V&s =docs. Count :desc GET /_cat/ movies*? pri&v&h=health,index,pri,rep,docs.count,mt //How much memory is used per index? GET /_cat/indices? v&h=i,tm&s=tm:descCopy the code

1.5 Availability and scalability of distributed systems

1.5.1 high availability

Service availability – Running nodes stop services
Data availability – If some nodes are lost, data will not be lost

1.5.2 Scalability

Increase in requests/data growth (distribute data across all nodes)

1.6 `ES`Distributed characteristics of

1.6.1 `ES`The benefits of distributed architecture

Horizontal storage capacity expansion
Improves system availability. Some nodes stop services, but the services of the whole cluster are not affected

1.6.2 `ES`Distributed architecture of

Different clusters are distinguished by different names, the default nameelasticsearch
Through configuration files, or on the command line-E cluster.name=node1To set
A cluster can have one or more nodes

1.7 `ES`node

A node is aElasticsearchAn instance of the
- It’s essentially oneJavaprocess
- You can run more than one machineElasticsearchProcesses, but production environments generally recommend running only one on a machineElasticsearchThe instance
Each node has a name, configured through a configuration file, or at startup time-E node.name=node1The specified
Each node is assigned one after it is startedUIDAnd stored in thedatadirectory

1.8 `Master-eligible nodes`and`Master Node`

Each node starts with one by defaultMaster eligiblenode
- You can set thenode.master:falseban
Master-eligibleNodes can participate in the main process and becomeMasternode
When the first node starts, it elects itselfMasternode
The state of the cluster is stored on each node, onlyMasterA node can modify the cluster status
- Cluster State, which maintains the necessary information about a Cluster
  - All node information
  - All indexes and their associatedMappingandSettinginformation
  - Fragmented routing information
- Any node can modify the information, resulting in data inconsistency

1.8 `Data Node & Coordinating Node`

Data Node
- A node where data can be stored is calledData Node. Responsible for saving shard data. It plays a crucial role in data expansion
Coordinating Node
- Be responsible for acceptingClient, distributes the requests to the appropriate nodes, and finally aggregates the results together
- Each node does this by defaultCoordinating NodeThe responsibility of the

1.9 Other Node Types

Hot 和 Warm Node
- Different hardware configurationsData Node, used to implementHot 和 WarmArchitecture to reduce the cost of cluster deployment
Machine Learing Node
- Running machine learningJobFor exception detection
Tribe Node
- (5.3 Start usingCross Cluster Serarch)Tribe NodeConnect to differentElasticsearchClusters, and support for treating these clusters as a separate cluster

1.10 Configuring the Node Type

A node can play multiple roles in a development environment
In a production environment, you should have nodes for a single role (dedicated node)

The node type	Configuration parameters	The default value
master eligible	node.master	true
data	node.data	true
ingest	node.ingest	true
coordinating only	There is no	Each node defaults to`coordinating`Node. Set all other types to`false`
machine learing	node.ml	True (to enable x – pack)

1.11 shard (`Primary Shard` & `Replica Shard`)

Master sharding, to solve the problem of horizontal data scaling. With master sharding, data can be distributed across all nodes in the cluster
- A shard is a runLuceneAn instance of the
- The number of primary shards is specified at index creation time and cannot be changed later, unlessReindex
Copy, to solve the problem of high availability of data. Shard a copy of the master shard
- The number of duplicate fragments can be dynamically adjusted
- Increasing the number of copies can also improve the availability of the service to some extent (read throughput)
In a three-node cluster,blogsThe fragmentation distribution of the index
- Consider: How does adding a node or increasing the number of master shards affect the system?

1.12 Sharding Settings (The main shard is set at creation time and cannot be changed later, unless`reindex`)

Capacity planning is required for sharding in production environment
- The number of fragments is too small. Procedure
  - Nodes cannot be added to achieve horizontal scaling
  - The amount of data in a single fragment is too large, leading to data redistribution time (data skew)
- The number of fragments is too large.7.0For starters, the default master shard is set to 1, and we’re doneover-shardingThe problem of
  - It affects the relevance scoring of search results and the accuracy of statistical results
  - Excessive fragments on a single node waste resources and affect performance

1.13 Checking the Cluster Health Status

GET _cluster/health
Copy the code

Green: Master shards and replicas are allocated normally
Yellow: Primary fragments are allocated correctly, but duplicate fragments are not allocated correctly
Red: Primary sharding failed to be allocated
- For example, the disk capacity of the server exceeds the threshold85%To create a new index

2. Basic CRUD and batch operation of documents

2.1 CRUD of documents

type	API
Index	`PUT my_index/_doc/1` {“user”:”mike”,”comment”:”xxxx”}
Create	`PUT my_index/_create/1` {“user”:”mike”,”comment”:”xxxx”} `POST my_index/_doc`(No ID specified, automatically generated) {“user”:”mike”,”comment”:”xxxx”}
Read	`GET my_index/_doc/1`
Update	`POST my_index/_update/1` {“doc”:{“user”:”mike”,”comment”:”xxxx”}}
Delete	`DELETE my_index/_doc/1`

TypeFirst name, convention_doc
Create: Fails if the ID already exists
Index: If the ID does not exist, create a new document. Otherwise, delete the existing document first, and then create a new document, and the version will increase
Update: The document must already exist, and the update will only make incremental changes to the corresponding field

2.1.1 `Create`A document

Support automatic document generationIdAnd specified documentsIdOne of two ways
By calling thepost users/_doc
- The system automatically generates itdocument Id
useHTTP PUT user/_create/1When created,URIDisplay the specified_create, if theidThe file already exists, operation failed

//create document. automatically generate _id POST users/_doc {"user" : "Mike", "post_date" : "2020-11-15T19:38:42", "message" : "trying out Kibana" } //create document. Specify the Id. PUT Users /_doc/1? op_type=create { "user" : "Mike", "post_date" : "2020-11-15T19:38:42", "message" : "trying out Elasticsearch" }Copy the code

2.1.2 `Get`A document

Find the document and returnHTTP 200
- Document meta information
  - _index/_type/
  - Version information, sameIdEven if deleted,VersionThe number will continue to increase
  - _sourceContains all the original information of the document by default
Unable to find document, returnHTTP 404

//Get document by Id
GET users/_doc/1
Copy the code

2.1.3 `Index`The document

+1 PUT users/_doc/1 {"user" : "Mike"}Copy the code

IndexandCreateThe difference: Index the new document if it doesn’t already exist. Otherwise, existing documents are deleted and new documents are indexed. Version information +1

2.1.4 `Update`The document

UpdateMethod does not delete the original document, but performs a real data update
Postmethods/PayloadNeed to includedocIn the

POST users/_update/1/ {"doc":{"post_date" : "2020-11-15T19:59:42", "message" : "trying out Elasticsearch" } }Copy the code

2.2 Batch Operations

2.2.1 `Bulk API`

In aRESTReestablishing a network connection on a request is very performance consuming
Support at one timeAPICall to operate on different indexes
Four types of operations are supported
- Index
- Create
- Update
- Delete
Can be found inURISpecified in theIndex, also available on requestPayloadIn the
If a single operation fails, other operations will not be affected
The return result contains the result of each operation

// The Bulk operation is executed twice, Check every time the results of the first POST / / execution _bulk {" index ": {" _index" : "test", "_id" : "1"}} {" field1 ":" value1}" {"delete":{"_index":"test","_id":"2"}} {"create":{"_index":"test2","_id":"3"}} {"field1":"value3"} {"update":{"_id":"1","_index":"test"}} {"doc":{"field2":"value2"}}Copy the code

2.2.2 Batch Reading n/A`mget`

Batch operations can reduce the cost of network connections and improve performance

/ / mget operations GET / _mget {" docs ": [{" _index" : "test", "_id" : "1"}, {" _index ":" test ", "_id" : "2"}]}Copy the code

2.2.3 Batch Query –`msearch`

### ecommerce {} {match_all" : {}},"size":1} {"index" : "kibana_sample_data_flights"} {"query" : {"match_all" : {}},"size":2}Copy the code

2.2.4 Common Error Message Is Displayed

The problem	why
Unable to connect	The network is faulty or the cluster is down
Connection cannot be closed	The network or node is faulty
429	The Cluster is too Busy
4xx	Request size error
500	Cluster internal error

3. Forward and inverted index

Is 3.1

3.2 Inverted index

If you want to find a word, a specific page number in a book, that’s not enough, we can build onecontent->Document IdAn index structure can meet our needs

3.2.1 Core composition of inverted index

The inverted index contains two parts

Term Dictionary, which records the words of all documents and records the associations of words to inverted lists
- Word dictionaries are usually large enough to passB + treeorHash zipper methodImplementation to meet high-performance inserts and lookups
Posting lists record the combination of documents corresponding to words and consist of inverted index entries
- Posting Index entries
  - Document Id
  - Word frequency TF- The number of times the word appears in the document for relevance scoring
  - Position – The Position of a word in a document. Phrase query (phrase query)
  - Offset – Records the start and end positions of words to achieve highlighting

3.2.2 `Elasticsearch`An example of

3.2.3 `Elasticsearch`The inverted index of

ElasticsearchtheJsonEach field in the document has its own inverted index
You can specify that certain fields are not indexed
- Advantages: Saves storage space
- Disadvantages: Fields cannot be searched

Through 4.`Analyzer`For word segmentation

4.1 `Analysis`with`Analyzer`

Analysis: Text analysis is the translation of a whole text into a series of words (term\token), also called participles
AnalysisIs through theAnalyzerTo implement the
- You can useElasticsearchBuilt-in parsers/or customized parsers on demand
In addition to converting entries when data is written, matchesQueryStatement time also needs to use the same parser to analyze the query statement

4.2 `Analyzer`The composition of the

A word splitter is a component that specializes in word segmentation.AnalyzerIt consists of three parts
- Character Filters(For raw text processing, such as removalhtml)
- Tokenizer(Split into words according to the rules)
- Token FilterProcess the cut word, lower case, and delete itstopword, add synonyms)

5. `Elasticsearch`The built-in word divider

Standard Analyzer: Default word splitter, word segmentation, lowercase processing
Simple Analyzer: According to the non-letter segmentation (symbols are filtered), lowercase processing
Stop Analyzer: lowercase processing, stop word filtering (the.a.is)
Whitespace Analyzer: Split by space, not lowercase
Keyword Analyzer: Takes input as output, regardless of words
Patter Analyzer: Regular expression, default\w+(non-character split)
Language: provides word segmentation for more than 30 common languages
Customer Analyzer: Custom toggle

5.1 the use of`_analyzer API`

5.1.1 Direct designation`Analyzer`test

# specified directly Analyzer test GET _analyze {" Analyzer ":" standard ", "text", "Mastering Elasticsearch, Elasticsearch in Action"}Copy the code

5.2.2 Specify the fields of the index to test

POST Users /_analyze {"field":"message", "text":"Mastering Elasticsearch"}Copy the code

5.2.3 User-defined word segmentation for testing

POST _analyze {"tokenizer": "standard", "filter": ["lowercase"], "text": "Mastering Elasticsearch"}Copy the code

5.2 `Standard Analyzer`Word segmentation (`Elasticsearch`The default participle of

Default word divider
According to the word segmentation
Lower case processing

5.3 `Simple Analyzer`

By non-letter shards, all non-letter shards are removed
Lower case processing

# Simple Analyzer
GET _analyze
{
  "analyzer": "simple",
  "text":"2 runing Quick brown-foxes leap over lazy dogs in the summer evening"
}
Copy the code

5.3 `Whitespace Analyzer`

Divide by space

5.4 `Stop Analyzer`

Compared with theSimple Analyzer
Much morestop filter
- theA, isAnd so on

5.5 `Keyword Analyzer`

Regardless of the word, directly as a inputtermThe output

# Keyword Analyzer
GET _analyze
{
  "analyzer": "keyword",
  "text":"2 runing Quick brown-foxes leap over lazy dogs in the summer evening"
}
Copy the code

5.5 `Pattern Analyzer`

Word segmentation through regular expression
The default is\w+, non-character symbols are segmented

# Pattern Analyzer
GET _analyze
{
  "analyzer": "pattern",
  "text":"2 runing Quick brown-foxes leap over lazy dogs in the summer evening"
}
Copy the code

5.6 `Language Analyzer`

Word segmentation of different national languages

# Language Analyzer
POST _analyze
{
  "analyzer": "english",
  "text":"2 runing Quick brown-foxes leap over lazy dogs in the summer evening"
}
Copy the code

5.7 Difficulties in Chinese word segmentation

Chinese sentences, cut into one word (not one word)
In English, words are separated by natural Spaces
A Chinese sentence has different meanings in different contexts
- This apple is not very good/this apple is not very good!
example
- There is a point in what he says

5.8 `ICU Analyzer`

You need to installplugin
- Elasticsearch-plugin install analysis-icu
providesUnicodeBetter support for Asian languages

POST _analyze {"analyzer": "ICu_analyzer ", "text":" He's right"} GET /movies/_searchCopy the code

5. `Search API`An overview of the

URI Search
- inURLThe query parameter is used in
Request Body Search
- useElasticsearchA more complete JSON based format is providedQuery Domain Specific Language(DSL)

5.1 Specifying the Query index

grammar	The scope of
/_search	All indexes on the cluster (with`_search`To indicate that this is a search request.
/index1/_search	Specify the name of the index as index1
/index1,inde-2/_search	Multiple indexes can be specified, index1 and index-2
/index*/_search	An index that starts with index

5.2 `URI`The query

use"q"To specify the query string
“query string syntax”,KVKey/value pair

curl -XGET "http://elasticsearch:9200/kibana_sample_data_ecommerce/_search? q=customer_first_name:Eddie"Copy the code

5.3 `Request Body`The query

5.4 Search Result (Response)

5.5 correlation

5.5.1 Relevance of searches

Search is a conversation between the user and the search engine
What users care about is the relevance of search results
- Can you find all the relevant content
- How much relevant content is returned
- Whether the document is rated reasonably
- Balance results rankings with business requirements

5.5.2 Measuring relevance

Information Retrieval(A course in computer science)
- Precision(Precision)Return as few extraneous documents as possible
- Recall(Recall): Return as many related documents as possible
- Ranking: Whether it can be sorted by relevance

5.5.3 `Precision` and `Recall`

5.6 `URI Search`Through:`URI query`To implement the search

GET /movies/_search? q=2012&df=title&sort=year:desc&from=o&size=10&timeout=1s { "profile":true }Copy the code

q: Specifies the query statementQuery String Syntax(Query String Syntax)
df: Default field. If this parameter is not specified, all fields are queried
Sort: sorting/from 和 sizeUsed for paging
Profile: Can you see how queries are executed

5.6.1 Field Query VS Normal Query

q=title:2012 / q=2012

GET /movies/_search? Q = 201&df =title {"profile":true} # q=2012 { "profile":true }Copy the code

5.6.2 `Term` VS `Phrase`

Beautiful MindEquivalent toBeautiful OR Mind, meaning that as long as there is a contain, it will return
"Beautiful Mind"That is equivalent toBeautiful AND Mind.PhraseQuery (Note: to usePhraseQuery, need to enclose the content in quotes), but also require the same order; It means to include both words
Grouping and quotation marks
- title:(Beautiful AND Mind)grouping
- title:"Beautiful Mind"

# Term VS Phrase # Phrase Find GET /movies/_search exactly? q=title:"Beautiful Mind" { "profile": "true" } GET /movies/_search? Q =title:Beautiful Mind {"profile": "true"} # TermQuery: Bool Query GET /movies/_search? q=title:(Mind Beautiful) { "profile": "true" }Copy the code

TermQuery uses parentheses () to represent a grouping, otherwise it would be the case above, where two terms together default to an OR relationship. PhraseQuery requires all terms to be contained in the same order

5.6.3 Boolean operation and grouping operation

AND/OR/NOTor&&/||/!
- Must be capitalized
- title:(matrix NOT reloaded)
grouping
- +saidmust
- -saidmust_not
- title:(+matrix -reloaded)

GET /movies/_search? q=title:(Mind Beautiful) { "profile": "true" } # AND GET /movies/_search? q=title:(Mind AND Beautiful) { "profile": "true" } # NOT GET /movies/_search? q=title:(Beautiful NOT Mind) { "profile": "true" } # + GET /movies/_search? q=title:(Beautiful %2BMind) { "profile": "true" }Copy the code

5.6.4 Range queries and arithmetic symbols

Range queries
- Interval represents:[]The closed interval,{}Open interval
  - year:{2019 TO 2018}
  - year:[* TO 2018]
Math symbols
- year:>2010
- year:(>2010 && <=2018)
- year:(+>2010 +<=2018)

# all movie years must >=1980 GET /movies/_search? q=year:>=1980 { "profile": "true" }Copy the code

5.6.5 Wildcard query, regular expression, fuzzy matching and approximate query

Wildcard query (Do not use wildcard query because it is inefficient and occupies large memory. Especially at the front)
- ?Stands for 1 character,*Stands for 0 or more characters
  - title:mi? d
  - title:be*
Regular expression
- title:[bt]oy
Fuzzy matching and approximate query
- title:befutifl~1
- Title: "Lord rings" ~ 2

GET /movies/_search? Q =title:b* {"profile": "true"} # q=title:beautifl~1 { "profile": "true" } GET /movies/_search? q=title:"lord Ring"~2 { "profile": "true" }Copy the code

5.7 `Request Body`with`Query DSL`Introduction to the

5.7.1 `Request Body Search`

Pass the query statementHTTP Request BodySent to theElasticcsearch
Query DSL

5.7.2 paging

FromStarting at 0, 10 results are returned by default
The cost of retrieving the later pages is higher

5.7.3 sorting

It is best to sort on the “number” and “date” fields
Because for sorting multi-value types or parsed fields, the system will pick a value that is not known

POST kibanA_sample_data_ecommerce /_search {" profile": "true", "sort": [{"order_date": "desc"}], "query": { "match_all": {} } }Copy the code

5.7.4 `_source filtering`

if_sourceIf there is no storage, only metadata for matching documents will be returned
_sourceWildcard characters are supported:_source["name*,"desc"]

//source filstering POST kibana_sample_data_ecommerce/_search { "_source": ["order_date"], "query": { "match_all": {}}}Copy the code

5.7.5 Script Fields

Use case: There are different exchange rates in the order, and the order price needs to be sorted according to exchange rates

// GET kibana_sample_data_ecommerce/_search {"script_fields": {"new_field": {"script": { "painless", "source": "doc['order_date'].value+'_hello'" } } }, "query": { "match_all": {} } }Copy the code

5.7.6 Using query Expressions:`Match Query`We learned earlier`Term Query`.`Phrase Query`)

//matchQuery OR
POST movies/_search
{
  "query": {
    "match": {
      "title": "last Christmas"
    }
  }
}
//matchQuery AND
POST movies/_search
{
  "query": {
    "match": {
      "title": {
        "query": "last Christmas",
        "operator": "and"
      }
    }
  }
}
Copy the code

5.7.7 Phrase Search:`Match Phrase`The query

//matchPhrase
POST movies/_search
{
  "query": {
    "match_phrase": {
      "title": "one love"
    }
  }
}
//matchPhrase slop:1
POST movies/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "one love",
        "slop": 1
      }
    }
  }
}
Copy the code

5.8 Summary of Several Query methods

5.9 `Query String`and`Simple Query String`The query

5.8.1 `Query String Query`

similarURI Query

5.8.1 `Simple Query String Query`

similarQuery String, but ignores incorrect syntax and supports only partial query syntax
Does not supportAND, OR, NOTIs treated as a string
TermThe default relationship betweenOR, you can specifyOperator
Support partial logic
- +Instead ofAND
- |Instead ofOR
- -Instead ofNOT

PUT /users/_doc/1 { "name":"Ruan Yiming", "about":"java,golang,node,swift,elasticsearch" } PUT /users/_doc/2 { "name":"Li Yiming", "about":"Hadoop" } # Query String POST users/_search { "query": { "query_string": { "default_field": "name", "query": "Ruan AND Yiming" } } } POST users/_search { "query": { "query_string": { "fields": ["name","about"], "query": "(Ruan AND Yiming) OR (Hadoop)"}}} # Simple Query String default operator is OR POST users/_search {" Query ": { "simple_query_string": { "query": "Ruan AND Yiming", "fields": ["name"] } } } POST users/_search { "query": { "simple_query_string": { "query": "Ruan Yiming", "fields": ["name"], "default_operator": "AND" } } }Copy the code

5.10 Differences between ES match, Match_PHRASE, query_string, and term

www.cnblogs.com/chenmz1995/…

6. `Dynamic Mapping`And common field types

6.1 what is`Mapping`

MappingLike in a databaseschemaIs defined as follows
- Defines the name of the field in the index
- Define the data types of fields, such as strings, numbers, booleans……
- The relevant configuration of fields, inverted indexes, (Analyzed or Not Analyzed,Analyzer)
MappingtheJsonDocuments are mapped toLuceneFlat format required
aMappingBelonging to an indexType
- Each document belongs to oneType
- aTypeThere is aMappingdefine
- 7.0 starting, do not need inMappingSpecified in the definitiontypeinformation

6.2 `Elasticsearch`The data type of the field

A simple type
- Text/Keyword
- Date
- Integer/Floating
- Boolean
- IPv4 & IPv6
Complex types: objects and nested objects
- Object type/nested type
Special type
- geo_point & geo_shape/percolator

6.3 what is`Dynamic Mapping`

When a document is written, an index is automatically created if it does not exist
Dynamic MappingWe don’t need to define it manuallyMappings.ElasticsearchThe field type is automatically calculated based on the document information
But sometimes the calculations are wrong, such as geographical location information
When the type is not set correctly, some functions will not work properly, for exampleRangeThe query

6.4 Automatic type identification

`JSON`type	`Elasticsearch`type
string	Match the date format and set to`Date` Configuration Number Set to`float`or`long`This option is disabled by default Set to`Text`And increase`keyword`sub-fields
Boolean value	`boolean`
Floating point Numbers	`float`
The integer	`long`
object	`Object`
An array of	Is determined by the type of the first non-null value
A null value	ignore

// Write to the document, View Mapping PUT mapping_test/_doc/1 {"firstName":"Chan", "lastName":"Jackie", "LoginDate ":" 2018-07-24T10:29:48.103z"} // Check the Mapping GET mapping_test/_mapping //Delete index Delete mapping_test PUT mapping_test/_doc/1 {"uid":"123", "isVip":false, "isAdmin":"true", "age":19, "Heigh ":180} # check mapping_test/_mappingCopy the code

6.5 Can I change it?`Mapping`The field type of

Two cases
- The new field
  - DynamicSet totrue, once the document with the new field is written,MappingIt was updated at the same time
  - DynamicSet tofalse.MappingThe new field data cannot be indexed, but the information will appear in the_sourceIn the
  - DynamicSet toStrictFailed to write the document
- Once data is written to an existing field, the field definition is no longer supported
  - LuceneThe implemented inverted index, once generated, is not allowed to be modified
- You must if you want to change the field typeReindex API, rebuild index
why
- If the data type of a field is changed, indexed entries cannot be searched
- But if you are adding new fields, there is no such effect

6.6 control`Dynamic Mappings`

	`"true"`	`"false"`	`"strict"`
Indexable document	`YES`	`YES`	`NO`
Field indexability	`YES`	`NO`	`NO`
`Mapping`Be updated	`YES`	`NO`	`NO`

whendynamicIs set tofalseWhen a new field is written, the data can be indexed but the new field is discarded
When set toStrictData write error in mode

Dynamic Mapping is supported by default. PUT dynamic_mapping_test/_doc/1 {"newField":"someValue"} POST dynamic_mapping_test/_search {"query": { "match": { "newField": "SomeValue"}} # change dynamic to false PUT dynamic_mapping_test/_mapping {"dynamic":false} # change anotherField to false PUT dynamic_mapping_test/_doc/10 {"anotherField":"someValue"} # However, this field cannot be searched, POST dynamic_mapping_test/_search {"query": {"match": {"anotherField": GET dynamic_mapping_test/_mapping # change dynamic to strict PUT Dynamic_mapping_test /_mapping {"dynamic":"strict"} HTTP Code 400 PUT dynamic_mapping_test/_doc/12 { "lastField":"value" }Copy the code

7 shows`Mapping`Settings and common parameters

7.1 How do I Define a Display`Mapping`

7.2 the custom`Mapping`Some suggestions

You can refer toAPIManual, pure handwriting
In order to reduce the input workload and reduce the probability of error, you can follow the following steps
- Create a temporaryindex, write some sample data
- By visitingMapping APIGets the dynamic of the temporary fileMappingdefine
- After modification, use this configuration to create your index
- Drop temporary index

7.3 Controlling whether the current field is indexed

Index: controls whether the current field is indexed. The default istrue. If set tofalse, the field is not searchable

DELETE dynamic_mapping_test # set index to false DELETE users PUT users {"mappings": {"properties": { "firstName":{ "type": "text" }, "lastName":{ "type": "text" }, "mobile":{ "type": "text", "index": false } } } } PUT users/_doc/1 { "firstName":"Ruan", "lastName":"Yiming", "mobile":"123456789" } POST /users/_search { "query": { "match": { "mobile": "123456789" } } }Copy the code

7.3 `index Options`

For the establishment of inverted indexes, ES provides four different levelsindex OptionsConfiguration to control the inversion of the contents of index records
- docsRecord:doc id
- freqsRecord:doc idandterm frequencies
- positionsRecord:doc id/term frequencies/term position
- offsetsRecord:doc id/term frequencies/term position/character offects
TextType Default recordpostions, the default values of other parameters aredocs
More records occupy more storage space

7.4 `null_value`

The need tonullValue to implement search
onlykeywordType support settingNull_Value

Null_value DELETE users PUT users {"mappings": {"properties": {"firstName":{"type": "text" }, "lastName":{ "type": "text" }, "mobile":{ "type": "keyword", "null_value": "NULL" } } } } PUT users/_doc/1 { "firstName":"Ruan", "lastName":"Yiming", "mobile":null } GET users/_search { "query": { "match": { "mobile": "NULL" } } }Copy the code

7.5 `copy_to`Set up the

_allin7In thecopy_toreplaced
Meet some specific search requirements
copy_toCopy the value of the field to the target field to achieve similar results_allThe role of
copy_toThe target field does not appear in_sourceIn the

# set Copy to DELETE users PUT users {"mappings": {"properties": {"firstName":{"type": "text", "copy_to": "fullName" }, "lastName":{ "type": "text", "copy_to": "fullName" } } } } PUT users/_doc/1 { "firstName":"Ruan", "lastName":"Yiming" } GET users/_search { "query": { "match": { "fullName": { "query": "Ruan Yiming", "operator": "and" } } } }Copy the code

When you index a document, if it contains firstName and lastName, we copy that value to the fullName field, and when you query it, you can use the fullName field

7.6 Array Types

ElasticsearchDoes not provide a specialized array type in. Any field, however, can contain multiple values of the same type

PUT users/_doc/1 {"name":"onebird", "interests":"reading"} PUT users/_doc/1 {"name":"twobirds", "interests":["reading","music"] } POST users/_search { "query": { "match_all": {} } } GET users/_mappingCopy the code

8. Multi-field features and`Mapping`To configure a custom`Analyzer`

8.1 Multi-field Type

Multi-field feature
- Vendor name to achieve accurate matching
  - Add akeywordA child of the field
- Use differentanalyzer
  - Different languages
  - Pingyin field search
  - It also supports specifying different values for search and indexanalyzer

8.2 `Exact Values`(Exact value) VS`Full Text`(Full text)

Exact values vs Full Text
- Exact Value: contains numbers/dates/a specific string (e.g. “Apple Store”)
  - ElasticsearchIn thekeyword
- Full-text, unstructured text data
  - ElasticsearchIn thetext

8.2.1 `Exact Values`Exact Values and Full Text do not need to be parsed.

ElasticsearchCreate an inverted index for each field
- Exact ValueNo special word segmentation is required when indexing

8.3 User-defined Participles

whenElasticsearchIf the built-in word divider cannot meet the requirements, you can customize a word divider. This is achieved by combining different components (the following three areAnalyzerSee 4.2 above.)
- Character Filter
- Tokenizer
- Token Filter

8.3.1 `Character Filters`

inTokenizerPrevious processing of text, such as adding delete and replace characters. Multiple configurations can be configuredCharacter Filters. Will affect theTokenizerthepositionandoffsetinformation
Some of your ownCharacter Filter
- HTML strip: removehtmlThe label
- Mapping: String substitution
- Pattern replace: Re match replacement

POST _analyze { "tokenizer": "keyword", "char_filter": ["html_strip"], "text": "<b>hello world</b>"} // Use char filter to replace (replace '-' with '_') POST _analyze {"tokenizer": "standard", "char_filter": [ { "type":"mapping", "mappings":["- => _"] } ], "text": "123-456,I-test! POST _analyze {"tokenizer": "standard", "char_filter": [ { "type":"mapping", "mappings": [":) => happy",":( => sad"] } ], "text": // GET _analyze {"tokenizer": "standard", "char_filter": [{"type": "pattern_replace", "pattern": "http://(.*)", "replacement": "$1" } ], "text": ["http://www.elastic.co"] }Copy the code

8.3.2 `Tokenizer`

Divide the original text into words according to certain rules (term or token)
ElasticsearchThe built-inTokenizers
- whitespace/standard/uax_url_email/pattern/keyword/path hierarchy
You can usejavaDevelop plug-ins and implement your ownTokenizer

# Tokenizer
POST _analyze
{
  "tokenizer": "path_hierarchy",
  "text": "/user/ymruan/a/b/c/d/e"
}
Copy the code

8.3.3 are included`Token Filters`

willTokenizerOutput words (term), add, modify, delete
built-inToken Filters
- Lowercase/stop/synonym(Add synonyms)

# Token Filters #whitespace and stop # split by Spaces and filter specific words, such as on in the, which are called stop. GET _analyze {"tokenizer": "whitespace", "filter": ["stop"], "text": ["The rain in Spain falls mainly on The plain."]} # GET _analyze {"tokenizer": "whitespace", "filter": ["lowercase","stop"], "text": ["The girls in China are playing this game!"] } # How to customize an Analyzer to meet your specific needs DELETE my_index PUT my_index {" Settings ": {"analysis": {" Analyzer ": { "my_custom_analyzer": { "typpe":"custom", "char_filter": [ "emoticons" ], "tokenizer": "punctuation", "filter": [ "lowercase", "english_stop" ] } }, "tokenizer": { "punctuation":{ "type": "pattern", "pattern": "[.,!?] " } }, "char_filter": { "emoticons": { "type":"mapping", "mappings": [ ":) => _happy_", ":( => _sad_" ] } }, "filter": { "english_stop": { "type":"stop", "stopwords": "_english_" } } } } } POST my_index/_analyze { "analyzer": "my_custom_analyzer", "text": "I'm a :) person,and you?" }Copy the code

9. `Index Template`and`Dynamic Template`

9.1 what is`Index Template`

As time goes on, your cluster will have more and more indexes. For example, if your cluster is used for log management, you will generate a new index for these logs every day, because this way you can manage the data better, and the cluster will have a better performance.
Index Template: Helps you set upMappingsandSettingsAnd automatically matches the newly created index according to certain rules
- Templates are only useful when an index is created. Modifying a template does not affect indexes that have been created
- You can set more than one index template, these Settings will bemergetogether
- You can specifyorderThe value of the controlmergingThe process of

9.1.1 `Index Template`The way in which

When an index is created
- applicationElasticsearchThe defaultsettingsandmappings
- applicationorderNumerical lowIndex TemplateIn the set
- applicationorderhighIndex TemplateThe previous Settings will be overwritten
- Specified by the user when the application creates an indexSettingsandMappingsAnd overrides the Settings in the previous template

9.1.2 Demo

Create 2Index Template
View View by nameTemplate
To see alltemplates,_tmplate/*
Create a temporary index and viewreplicaAnd data type inference
Set all names to be able toIndex TemplateWhen matched, view the generatedIndextheMappingsandSettings

PUT _template/template_default {" index_Patterns ": ["*"], "order" : 0, "version" : 1, "Settings ": {"number_of_replicas" : 1}} # create a second template. PUT /_template/template_test {"index_patterns": ["test*"], "order" : 1, "settings" : { "number_of_shards" : 1, "number_of_replicas" : 2 }, "mappings" : { "date_detection" : false, "numeric_detection" : GET /_template/template_default GET /_template/temp* Index start with test PUT testTemplate /_doc/1 {"someNumber" : "1", "someDate" : "2020/12/05"} GET testtemplate/_mapping GET testtemplate/_settings "settings": { "number_of_replicas": 5 } } PUT testmy/_doc/1 { "key" : "value" } GET testmy/_settingsCopy the code

9.2 what is`Dynamic Template`

According to theElasticseachIdentified data types, combined with field names, to dynamically set field types
- All string types are set toKeywordOr shut it downkeywordfield
- isThe leading fields are set toboolean
- long_Everything at the beginning is set tolongtype

9.2.1 `Dynamic Template`

Dynamic TemplateIs defined in some indexMappingIn the
TemplateThere is a name
The matching rule is an array
Set to match to fieldMapping

GET my_index/_mapping
DELETE my_index
PUT my_index
{
  "mappings": {
    "dynamic_templates": [
      {
        "string_as_boolean": {
          "match_mapping_type": "string",
          "match" : "is*",
          "mapping": {
            "type": "boolean"
          }
        } 
      },
      {
        "string_as_keywords": {
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "keyword"
          }
        }
      }
    ]
  }
}

PUT my_index/_doc/1
{
  "firstName" : "Ruan",
  "isVip" : "true"
}

GET my_index/_mapping
Copy the code

10. `Elasticsearch`Introduction to polymerization analysis

10.1 What is Aggregation?

ElasticsearchIn addition to search, provide targetedESThe function of statistical analysis of data
- Real time high
- Hadoop(T+1)
By aggregating, we get an overview of the data, analyzing and summarizing the whole set of data rather than looking for individual documents
- Number of rooms in Tsim Sha Tsui and Hong Kong Island
- Different price range, number of economy hotels and five-star hotels available for booking
High performance, requires only one statement, can be fromElasticsearchGet analysis results
- There is no need for the client to implement the analysis logic itself
KibanaIn a large number of visual reports, are aggregated analysis

10.2 Classification of aggregation

Bucket Aggregation: a collection of documents with columns that meet certain criteria (similar to Group BY)
Metric Aggregation: Some mathematical operations for statistical analysis of document fields (maximum, minimum, average)
Pipeline Aggregation: Performs secondary aggregation on other aggregation results
Matrix Aggregation: Supports multiple field operations and provides a result matrix

10.3 `Bucket` & `Metric`

Bucket: Can be understood asSQLIn theGroup
Metric: Can be understood asSQLIn theCount, you can perform a series of statistical methods

10.3.1 `Bucket`

Some examples
- Hangzhou belongs to Zhejiang/an actor belongs to male or female
- Nested relationship: Hangzhou belongs to Zhejiang belongs to China belongs to Asia
ElasticsearchThere are many types availableBucketTo help you divide documents in a variety of ways
- Term & Range(Time/age range/geographical location)

10.3.2 `Metric`

MetricComputes results based on data sets. In addition to supporting computations on fields, it also supports computations in scripts (painless script)
Most of theMetricIt is a mathematical calculation that outputs only one value
- min/max/sum/avg/cardinality
Part of themetricMultiple values can be output
- stats/percentiles/percentile_ranks

Elasticsearch Is a game about Elasticsearch. It’s a game about Elasticsearch.

Elasticsearch core technology and Practice ii

Three,ElasticsearchAn introduction to

1. ElasticsearchThe basic concept