Three,ElasticsearchAn introduction to

1. ElasticsearchThe basic concept

1.1 document

  • ElasticsearchIs document-oriented, which is the smallest unit of all searchable data
    • Log entries in log files
    • Details about a movie/a record
    • MP3A song/song on the playerPDFDetails in the document
  • The document will be serialized intoJsonFormat, save inElasticsearchIn the
    • JsonObjects consist of fields
    • Each field has a corresponding field type (string/numeric/Boolean/date/binary/range type)
  • There is one for each documentUniqueID
    • You can specify it yourselfID
    • Or byElasticsearchAutomatically generate

1.1.1 Json document

  • A document contains a series of fields. Similar to a record in a database table
  • JsonDocument, flexible format, do not need to define the format
    • The type of the field can be specified or passedElasticsearchAutomatically calculated
    • Support data/support nesting

1.1.2 Document metadata

{
    "index" : "movies",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 14.69302,
    "_source" : {
        "year" : 1995,
        "@version" : "1",
        "genre" : [
            "Adventure",
            "Animation",
            "Children",
            "Comedy",
            "Fantasy"
        ],
        "id" : "1",
        "title" : "Toy Story"
    }
},
Copy the code
  • Metadata, used to annotate relevant information about a document
    • _index: Indicates the index name of the document
    • _type: The name of the type to which the document belongs
    • _id: Document uniqueID
    • _source: Original documentJSONdata
    • _all: consolidates all field contents into this field, has been abolished
    • _version: Indicates the version of a document
    • _score: Relevance score

1.2 the index

{
  "movies" : {
    "settings" : {
      "index" : {
        "creation_date" : "1604218204918",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "F9-uy1bUTemm1Hs_LaDMQQ",
        "version" : {
          "created" : "7010099"
        },
        "provided_name" : "movies"
      }
    }
  }
}
Copy the code
  • indexAn index is a container for documents, a combination of a class of documents
    • indexEmbodies the concept of logical space: each index has its ownMappingDefinition, which defines the field name and field type that contains the document
    • ShardRepresents the concept of physical space: data in an index is scattered amongShardIn the
  • The index ofMappingwithSetting
    • MappingDefine the types of document fields
    • SettingDefine different data distributions

1.2.1 Different semantics of indexes

  • Noun: aElasticsearchIn a cluster, you can create many different indexes
  • Verb: Save a document toElasticsearchAlso known as indexing (indexing)
    • ESThe process of creating an inverted index
  • Noun: aB treeIndex, an inverted index

1.2.2 Type(I don’t know what that is.)

1.2.3 ESandRDBMScontrast

RDBMS Elasticsearch
Table index(Type)
Row Document
Column Filed
Schema Mapping
SQL DSL
  • Traditional relational databases andElasticsearchThe difference between
    • Elasticsearch
      • Schemaless: Data organization is more flexible
      • Correlation: Calculate the degree of correlation
      • High performance full text search
    • RDBMS
      • The transaction
      • Join

1.3 REST API

1.4 kibanaUpper index management

GET movies/_count // GET movies/_count // POST movies/_search{} //_cat indices API // V&s =index GET /_cat/indices? V&health =green // GET /_cat/indices? V&s =docs. Count :desc GET /_cat/ movies*? pri&v&h=health,index,pri,rep,docs.count,mt //How much memory is used per index? GET /_cat/indices? v&h=i,tm&s=tm:descCopy the code

1.5 Availability and scalability of distributed systems

1.5.1 high availability

  • Service availability – Running nodes stop services
  • Data availability – If some nodes are lost, data will not be lost

1.5.2 Scalability

  • Increase in requests/data growth (distribute data across all nodes)

1.6 ESDistributed characteristics of

1.6.1 ESThe benefits of distributed architecture

  • Horizontal storage capacity expansion
  • Improves system availability. Some nodes stop services, but the services of the whole cluster are not affected

1.6.2 ESDistributed architecture of

  • Different clusters are distinguished by different names, the default nameelasticsearch
  • Through configuration files, or on the command line-E cluster.name=node1To set
  • A cluster can have one or more nodes

1.7 ESnode

  • A node is aElasticsearchAn instance of the
    • It’s essentially oneJavaprocess
    • You can run more than one machineElasticsearchProcesses, but production environments generally recommend running only one on a machineElasticsearchThe instance
  • Each node has a name, configured through a configuration file, or at startup time-E node.name=node1The specified
  • Each node is assigned one after it is startedUIDAnd stored in thedatadirectory

1.8 Master-eligible nodesandMaster Node

  • Each node starts with one by defaultMaster eligiblenode
    • You can set thenode.master:falseban
  • Master-eligibleNodes can participate in the main process and becomeMasternode
  • When the first node starts, it elects itselfMasternode
  • The state of the cluster is stored on each node, onlyMasterA node can modify the cluster status
    • Cluster State, which maintains the necessary information about a Cluster
      • All node information
      • All indexes and their associatedMappingandSettinginformation
      • Fragmented routing information
    • Any node can modify the information, resulting in data inconsistency

1.8 Data Node & Coordinating Node

  • Data Node
    • A node where data can be stored is calledData Node. Responsible for saving shard data. It plays a crucial role in data expansion
  • Coordinating Node
    • Be responsible for acceptingClient, distributes the requests to the appropriate nodes, and finally aggregates the results together
    • Each node does this by defaultCoordinating NodeThe responsibility of the

1.9 Other Node Types

  • HotWarm Node
    • Different hardware configurationsData Node, used to implementHotWarmArchitecture to reduce the cost of cluster deployment
  • Machine Learing Node
    • Running machine learningJobFor exception detection
  • Tribe Node
    • (5.3 Start usingCross Cluster Serarch)Tribe NodeConnect to differentElasticsearchClusters, and support for treating these clusters as a separate cluster

1.10 Configuring the Node Type

  • A node can play multiple roles in a development environment
  • In a production environment, you should have nodes for a single role (dedicated node)
The node type Configuration parameters The default value
master eligible node.master true
data node.data true
ingest node.ingest true
coordinating only There is no Each node defaults tocoordinatingNode. Set all other types tofalse
machine learing node.ml True (to enable x – pack)

1.11 shard (Primary Shard & Replica Shard)

  • Master sharding, to solve the problem of horizontal data scaling. With master sharding, data can be distributed across all nodes in the cluster
    • A shard is a runLuceneAn instance of the
    • The number of primary shards is specified at index creation time and cannot be changed later, unlessReindex
  • Copy, to solve the problem of high availability of data. Shard a copy of the master shard
    • The number of duplicate fragments can be dynamically adjusted
    • Increasing the number of copies can also improve the availability of the service to some extent (read throughput)
  • In a three-node cluster,blogsThe fragmentation distribution of the index
    • Consider: How does adding a node or increasing the number of master shards affect the system?

1.12 Sharding Settings (The main shard is set at creation time and cannot be changed later, unlessreindex)

  • Capacity planning is required for sharding in production environment
    • The number of fragments is too small. Procedure
      • Nodes cannot be added to achieve horizontal scaling
      • The amount of data in a single fragment is too large, leading to data redistribution time (data skew)
    • The number of fragments is too large.7.0For starters, the default master shard is set to 1, and we’re doneover-shardingThe problem of
      • It affects the relevance scoring of search results and the accuracy of statistical results
      • Excessive fragments on a single node waste resources and affect performance

1.13 Checking the Cluster Health Status

GET _cluster/health
Copy the code

  • Green: Master shards and replicas are allocated normally
  • Yellow: Primary fragments are allocated correctly, but duplicate fragments are not allocated correctly
  • Red: Primary sharding failed to be allocated
    • For example, the disk capacity of the server exceeds the threshold85%To create a new index

2. Basic CRUD and batch operation of documents

2.1 CRUD of documents

type API
Index PUT my_index/_doc/1

{“user”:”mike”,”comment”:”xxxx”}
Create PUT my_index/_create/1

{“user”:”mike”,”comment”:”xxxx”}

POST my_index/_doc(No ID specified, automatically generated)

{“user”:”mike”,”comment”:”xxxx”}
Read GET my_index/_doc/1
Update POST my_index/_update/1

{“doc”:{“user”:”mike”,”comment”:”xxxx”}}
Delete DELETE my_index/_doc/1
  • TypeFirst name, convention_doc
  • Create: Fails if the ID already exists
  • Index: If the ID does not exist, create a new document. Otherwise, delete the existing document first, and then create a new document, and the version will increase
  • Update: The document must already exist, and the update will only make incremental changes to the corresponding field

2.1.1 CreateA document

  • Support automatic document generationIdAnd specified documentsIdOne of two ways
  • By calling thepost users/_doc
    • The system automatically generates itdocument Id
  • useHTTP PUT user/_create/1When created,URIDisplay the specified_create, if theidThe file already exists, operation failed

//create document. automatically generate _id POST users/_doc {"user" : "Mike", "post_date" : "2020-11-15T19:38:42", "message" : "trying out Kibana" } //create document. Specify the Id. PUT Users /_doc/1? op_type=create { "user" : "Mike", "post_date" : "2020-11-15T19:38:42", "message" : "trying out Elasticsearch" }Copy the code

2.1.2 GetA document

  • Find the document and returnHTTP 200
    • Document meta information
      • _index/_type/
      • Version information, sameIdEven if deleted,VersionThe number will continue to increase
      • _sourceContains all the original information of the document by default
  • Unable to find document, returnHTTP 404

//Get document by Id
GET users/_doc/1
Copy the code

2.1.3 IndexThe document

+1 PUT users/_doc/1 {"user" : "Mike"}Copy the code
  • IndexandCreateThe difference: Index the new document if it doesn’t already exist. Otherwise, existing documents are deleted and new documents are indexed. Version information +1

2.1.4 UpdateThe document

  • UpdateMethod does not delete the original document, but performs a real data update
  • Postmethods/PayloadNeed to includedocIn the

POST users/_update/1/ {"doc":{"post_date" : "2020-11-15T19:59:42", "message" : "trying out Elasticsearch" } }Copy the code

2.2 Batch Operations

2.2.1 Bulk API

  • In aRESTReestablishing a network connection on a request is very performance consuming
  • Support at one timeAPICall to operate on different indexes
  • Four types of operations are supported
    • Index
    • Create
    • Update
    • Delete
  • Can be found inURISpecified in theIndex, also available on requestPayloadIn the
  • If a single operation fails, other operations will not be affected
  • The return result contains the result of each operation

// The Bulk operation is executed twice, Check every time the results of the first POST / / execution _bulk {" index ": {" _index" : "test", "_id" : "1"}} {" field1 ":" value1}" {"delete":{"_index":"test","_id":"2"}} {"create":{"_index":"test2","_id":"3"}} {"field1":"value3"} {"update":{"_id":"1","_index":"test"}} {"doc":{"field2":"value2"}}Copy the code

2.2.2 Batch Reading n/Amget

  • Batch operations can reduce the cost of network connections and improve performance

/ / mget operations GET / _mget {" docs ": [{" _index" : "test", "_id" : "1"}, {" _index ":" test ", "_id" : "2"}]}Copy the code

2.2.3 Batch Query –msearch

### ecommerce {} {match_all" : {}},"size":1} {"index" : "kibana_sample_data_flights"} {"query" : {"match_all" : {}},"size":2}Copy the code

2.2.4 Common Error Message Is Displayed

The problem why
Unable to connect The network is faulty or the cluster is down
Connection cannot be closed The network or node is faulty
429 The Cluster is too Busy
4xx Request size error
500 Cluster internal error

3. Forward and inverted index

Is 3.1

3.2 Inverted index

If you want to find a word, a specific page number in a book, that’s not enough, we can build onecontent->Document IdAn index structure can meet our needs

3.2.1 Core composition of inverted index

The inverted index contains two parts

  • Term Dictionary, which records the words of all documents and records the associations of words to inverted lists
    • Word dictionaries are usually large enough to passB + treeorHash zipper methodImplementation to meet high-performance inserts and lookups
  • Posting lists record the combination of documents corresponding to words and consist of inverted index entries
    • Posting Index entries
      • Document Id
      • Word frequency TF- The number of times the word appears in the document for relevance scoring
      • Position – The Position of a word in a document. Phrase query (phrase query)
      • Offset – Records the start and end positions of words to achieve highlighting

3.2.2 ElasticsearchAn example of

3.2.3 ElasticsearchThe inverted index of

  • ElasticsearchtheJsonEach field in the document has its own inverted index
  • You can specify that certain fields are not indexed
    • Advantages: Saves storage space
    • Disadvantages: Fields cannot be searched

Through 4.AnalyzerFor word segmentation

4.1 AnalysiswithAnalyzer

  • Analysis: Text analysis is the translation of a whole text into a series of words (term\token), also called participles
  • AnalysisIs through theAnalyzerTo implement the
    • You can useElasticsearchBuilt-in parsers/or customized parsers on demand
  • In addition to converting entries when data is written, matchesQueryStatement time also needs to use the same parser to analyze the query statement

4.2 AnalyzerThe composition of the

  • A word splitter is a component that specializes in word segmentation.AnalyzerIt consists of three parts
    • Character Filters(For raw text processing, such as removalhtml)
    • Tokenizer(Split into words according to the rules)
    • Token FilterProcess the cut word, lower case, and delete itstopword, add synonyms)

5. ElasticsearchThe built-in word divider

  • Standard Analyzer: Default word splitter, word segmentation, lowercase processing
  • Simple Analyzer: According to the non-letter segmentation (symbols are filtered), lowercase processing
  • Stop Analyzer: lowercase processing, stop word filtering (the.a.is)
  • Whitespace Analyzer: Split by space, not lowercase
  • Keyword Analyzer: Takes input as output, regardless of words
  • Patter Analyzer: Regular expression, default\w+(non-character split)
  • Language: provides word segmentation for more than 30 common languages
  • Customer Analyzer: Custom toggle

5.1 the use of_analyzer API

5.1.1 Direct designationAnalyzertest

# specified directly Analyzer test GET _analyze {" Analyzer ":" standard ", "text", "Mastering Elasticsearch, Elasticsearch in Action"}Copy the code

5.2.2 Specify the fields of the index to test

POST Users /_analyze {"field":"message", "text":"Mastering Elasticsearch"}Copy the code

5.2.3 User-defined word segmentation for testing

POST _analyze {"tokenizer": "standard", "filter": ["lowercase"], "text": "Mastering Elasticsearch"}Copy the code

5.2 Standard AnalyzerWord segmentation (ElasticsearchThe default participle of

  • Default word divider
  • According to the word segmentation
  • Lower case processing

5.3 Simple Analyzer

  • By non-letter shards, all non-letter shards are removed
  • Lower case processing

# Simple Analyzer
GET _analyze
{
  "analyzer": "simple",
  "text":"2 runing Quick brown-foxes leap over lazy dogs in the summer evening"
}
Copy the code

5.3 Whitespace Analyzer

  • Divide by space

5.4 Stop Analyzer

  • Compared with theSimple Analyzer
  • Much morestop filter
    • theA, isAnd so on

5.5 Keyword Analyzer

  • Regardless of the word, directly as a inputtermThe output

# Keyword Analyzer
GET _analyze
{
  "analyzer": "keyword",
  "text":"2 runing Quick brown-foxes leap over lazy dogs in the summer evening"
}
Copy the code

5.5 Pattern Analyzer

  • Word segmentation through regular expression
  • The default is\w+, non-character symbols are segmented

# Pattern Analyzer
GET _analyze
{
  "analyzer": "pattern",
  "text":"2 runing Quick brown-foxes leap over lazy dogs in the summer evening"
}
Copy the code

5.6 Language Analyzer

  • Word segmentation of different national languages

# Language Analyzer
POST _analyze
{
  "analyzer": "english",
  "text":"2 runing Quick brown-foxes leap over lazy dogs in the summer evening"
}
Copy the code

5.7 Difficulties in Chinese word segmentation

  • Chinese sentences, cut into one word (not one word)
  • In English, words are separated by natural Spaces
  • A Chinese sentence has different meanings in different contexts
    • This apple is not very good/this apple is not very good!
  • example
    • There is a point in what he says

5.8 ICU Analyzer

  • You need to installplugin
    • Elasticsearch-plugin install analysis-icu
  • providesUnicodeBetter support for Asian languages

POST _analyze {"analyzer": "ICu_analyzer ", "text":" He's right"} GET /movies/_searchCopy the code

5. Search APIAn overview of the

  • URI Search
    • inURLThe query parameter is used in
  • Request Body Search
    • useElasticsearchA more complete JSON based format is providedQuery Domain Specific Language(DSL)

5.1 Specifying the Query index

grammar The scope of
/_search All indexes on the cluster (with_searchTo indicate that this is a search request.
/index1/_search Specify the name of the index as index1
/index1,inde-2/_search Multiple indexes can be specified, index1 and index-2
/index*/_search An index that starts with index

5.2 URIThe query

  • use"q"To specify the query string
  • “query string syntax”,KVKey/value pair
curl -XGET "http://elasticsearch:9200/kibana_sample_data_ecommerce/_search? q=customer_first_name:Eddie"Copy the code

5.3 Request BodyThe query

5.4 Search Result (Response)

5.5 correlation

5.5.1 Relevance of searches

  • Search is a conversation between the user and the search engine
  • What users care about is the relevance of search results
    • Can you find all the relevant content
    • How much relevant content is returned
    • Whether the document is rated reasonably
    • Balance results rankings with business requirements

5.5.2 Measuring relevance

  • Information Retrieval(A course in computer science)
    • Precision(Precision)Return as few extraneous documents as possible
    • Recall(Recall): Return as many related documents as possible
    • Ranking: Whether it can be sorted by relevance

5.5.3 Precision and Recall

5.6 URI SearchThrough:URI queryTo implement the search

GET /movies/_search? q=2012&df=title&sort=year:desc&from=o&size=10&timeout=1s { "profile":true }Copy the code
  • q: Specifies the query statementQuery String Syntax(Query String Syntax)
  • df: Default field. If this parameter is not specified, all fields are queried
  • Sort: sorting/fromsizeUsed for paging
  • Profile: Can you see how queries are executed

5.6.1 Field Query VS Normal Query

  • q=title:2012 / q=2012

GET /movies/_search? Q = 201&df =title {"profile":true} # q=2012 { "profile":true }Copy the code

5.6.2 Term VS Phrase

  • Beautiful MindEquivalent toBeautiful OR Mind, meaning that as long as there is a contain, it will return
  • "Beautiful Mind"That is equivalent toBeautiful AND Mind.PhraseQuery (Note: to usePhraseQuery, need to enclose the content in quotes), but also require the same order; It means to include both words
  • Grouping and quotation marks
    • title:(Beautiful AND Mind)grouping
    • title:"Beautiful Mind"

# Term VS Phrase # Phrase Find GET /movies/_search exactly? q=title:"Beautiful Mind" { "profile": "true" } GET /movies/_search? Q =title:Beautiful Mind {"profile": "true"} # TermQuery: Bool Query GET /movies/_search? q=title:(Mind Beautiful) { "profile": "true" }Copy the code

TermQuery uses parentheses () to represent a grouping, otherwise it would be the case above, where two terms together default to an OR relationship. PhraseQuery requires all terms to be contained in the same order

5.6.3 Boolean operation and grouping operation

  • AND/OR/NOTor&&/||/!
    • Must be capitalized
    • title:(matrix NOT reloaded)
  • grouping
    • +saidmust
    • -saidmust_not
    • title:(+matrix -reloaded)

GET /movies/_search? q=title:(Mind Beautiful) { "profile": "true" } # AND GET /movies/_search? q=title:(Mind AND Beautiful) { "profile": "true" } # NOT GET /movies/_search? q=title:(Beautiful NOT Mind) { "profile": "true" } # + GET /movies/_search? q=title:(Beautiful %2BMind) { "profile": "true" }Copy the code

5.6.4 Range queries and arithmetic symbols

  • Range queries
    • Interval represents:[]The closed interval,{}Open interval
      • year:{2019 TO 2018}
      • year:[* TO 2018]
  • Math symbols
    • year:>2010
    • year:(>2010 && <=2018)
    • year:(+>2010 +<=2018)

# all movie years must >=1980 GET /movies/_search? q=year:>=1980 { "profile": "true" }Copy the code

5.6.5 Wildcard query, regular expression, fuzzy matching and approximate query

  • Wildcard query (Do not use wildcard query because it is inefficient and occupies large memory. Especially at the front)
    • ?Stands for 1 character,*Stands for 0 or more characters
      • title:mi? d
      • title:be*
  • Regular expression
    • title:[bt]oy
  • Fuzzy matching and approximate query
    • title:befutifl~1
    • Title: "Lord rings" ~ 2

GET /movies/_search? Q =title:b* {"profile": "true"} # q=title:beautifl~1 { "profile": "true" } GET /movies/_search? q=title:"lord Ring"~2 { "profile": "true" }Copy the code

5.7 Request BodywithQuery DSLIntroduction to the

5.7.1 Request Body Search

  • Pass the query statementHTTP Request BodySent to theElasticcsearch
  • Query DSL

5.7.2 paging

  • FromStarting at 0, 10 results are returned by default
  • The cost of retrieving the later pages is higher

5.7.3 sorting

  • It is best to sort on the “number” and “date” fields
  • Because for sorting multi-value types or parsed fields, the system will pick a value that is not known

POST kibanA_sample_data_ecommerce /_search {" profile": "true", "sort": [{"order_date": "desc"}], "query": { "match_all": {} } }Copy the code

5.7.4 _source filtering

  • if_sourceIf there is no storage, only metadata for matching documents will be returned
  • _sourceWildcard characters are supported:_source["name*,"desc"]

//source filstering POST kibana_sample_data_ecommerce/_search { "_source": ["order_date"], "query": { "match_all": {}}}Copy the code

5.7.5 Script Fields

  • Use case: There are different exchange rates in the order, and the order price needs to be sorted according to exchange rates

// GET kibana_sample_data_ecommerce/_search {"script_fields": {"new_field": {"script": { "painless", "source": "doc['order_date'].value+'_hello'" } } }, "query": { "match_all": {} } }Copy the code

5.7.6 Using query Expressions:Match QueryWe learned earlierTerm Query.Phrase Query)

//matchQuery OR
POST movies/_search
{
  "query": {
    "match": {
      "title": "last Christmas"
    }
  }
}
//matchQuery AND
POST movies/_search
{
  "query": {
    "match": {
      "title": {
        "query": "last Christmas",
        "operator": "and"
      }
    }
  }
}
Copy the code

5.7.7 Phrase Search:Match PhraseThe query

//matchPhrase
POST movies/_search
{
  "query": {
    "match_phrase": {
      "title": "one love"
    }
  }
}
//matchPhrase slop:1
POST movies/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "one love",
        "slop": 1
      }
    }
  }
}
Copy the code

5.8 Summary of Several Query methods

5.9 Query StringandSimple Query StringThe query

5.8.1 Query String Query

  • similarURI Query

5.8.1 Simple Query String Query

  • similarQuery String, but ignores incorrect syntax and supports only partial query syntax
  • Does not supportAND, OR, NOTIs treated as a string
  • TermThe default relationship betweenOR, you can specifyOperator
  • Support partial logic
    • +Instead ofAND
    • |Instead ofOR
    • -Instead ofNOT

PUT /users/_doc/1 { "name":"Ruan Yiming", "about":"java,golang,node,swift,elasticsearch" } PUT /users/_doc/2 { "name":"Li Yiming", "about":"Hadoop" } # Query String POST users/_search { "query": { "query_string": { "default_field": "name", "query": "Ruan AND Yiming" } } } POST users/_search { "query": { "query_string": { "fields": ["name","about"], "query": "(Ruan AND Yiming) OR (Hadoop)"}}} # Simple Query String default operator is OR POST users/_search {" Query ": { "simple_query_string": { "query": "Ruan AND Yiming", "fields": ["name"] } } } POST users/_search { "query": { "simple_query_string": { "query": "Ruan Yiming", "fields": ["name"], "default_operator": "AND" } } }Copy the code

5.10 Differences between ES match, Match_PHRASE, query_string, and term

www.cnblogs.com/chenmz1995/…

6. Dynamic MappingAnd common field types

6.1 what isMapping

  • MappingLike in a databaseschemaIs defined as follows
    • Defines the name of the field in the index
    • Define the data types of fields, such as strings, numbers, booleans……
    • The relevant configuration of fields, inverted indexes, (Analyzed or Not Analyzed,Analyzer)
  • MappingtheJsonDocuments are mapped toLuceneFlat format required
  • aMappingBelonging to an indexType
    • Each document belongs to oneType
    • aTypeThere is aMappingdefine
    • 7.0 starting, do not need inMappingSpecified in the definitiontypeinformation

6.2 ElasticsearchThe data type of the field

  • A simple type
    • Text/Keyword
    • Date
    • Integer/Floating
    • Boolean
    • IPv4 & IPv6
  • Complex types: objects and nested objects
    • Object type/nested type
  • Special type
    • geo_point & geo_shape/percolator

6.3 what isDynamic Mapping

  • When a document is written, an index is automatically created if it does not exist
  • Dynamic MappingWe don’t need to define it manuallyMappings.ElasticsearchThe field type is automatically calculated based on the document information
  • But sometimes the calculations are wrong, such as geographical location information
  • When the type is not set correctly, some functions will not work properly, for exampleRangeThe query

6.4 Automatic type identification

JSONtype Elasticsearchtype
string Match the date format and set toDate

Configuration Number Set tofloatorlongThis option is disabled by default

Set toTextAnd increasekeywordsub-fields
Boolean value boolean
Floating point Numbers float
The integer long
object Object
An array of Is determined by the type of the first non-null value
A null value ignore
// Write to the document, View Mapping PUT mapping_test/_doc/1 {"firstName":"Chan", "lastName":"Jackie", "LoginDate ":" 2018-07-24T10:29:48.103z"} // Check the Mapping GET mapping_test/_mapping //Delete index Delete mapping_test PUT mapping_test/_doc/1 {"uid":"123", "isVip":false, "isAdmin":"true", "age":19, "Heigh ":180} # check mapping_test/_mappingCopy the code

6.5 Can I change it?MappingThe field type of

  • Two cases
    • The new field
      • DynamicSet totrue, once the document with the new field is written,MappingIt was updated at the same time
      • DynamicSet tofalse.MappingThe new field data cannot be indexed, but the information will appear in the_sourceIn the
      • DynamicSet toStrictFailed to write the document
    • Once data is written to an existing field, the field definition is no longer supported
      • LuceneThe implemented inverted index, once generated, is not allowed to be modified
    • You must if you want to change the field typeReindex API, rebuild index
  • why
    • If the data type of a field is changed, indexed entries cannot be searched
    • But if you are adding new fields, there is no such effect

6.6 controlDynamic Mappings

"true" "false" "strict"
Indexable document YES YES NO
Field indexability YES NO NO
MappingBe updated YES NO NO
  • whendynamicIs set tofalseWhen a new field is written, the data can be indexed but the new field is discarded
  • When set toStrictData write error in mode

Dynamic Mapping is supported by default. PUT dynamic_mapping_test/_doc/1 {"newField":"someValue"} POST dynamic_mapping_test/_search {"query": { "match": { "newField": "SomeValue"}} # change dynamic to false PUT dynamic_mapping_test/_mapping {"dynamic":false} # change anotherField to false PUT dynamic_mapping_test/_doc/10 {"anotherField":"someValue"} # However, this field cannot be searched, POST dynamic_mapping_test/_search {"query": {"match": {"anotherField": GET dynamic_mapping_test/_mapping # change dynamic to strict PUT Dynamic_mapping_test /_mapping {"dynamic":"strict"} HTTP Code 400 PUT dynamic_mapping_test/_doc/12 { "lastField":"value" }Copy the code

7 showsMappingSettings and common parameters

7.1 How do I Define a DisplayMapping

7.2 the customMappingSome suggestions

  • You can refer toAPIManual, pure handwriting
  • In order to reduce the input workload and reduce the probability of error, you can follow the following steps
    • Create a temporaryindex, write some sample data
    • By visitingMapping APIGets the dynamic of the temporary fileMappingdefine
    • After modification, use this configuration to create your index
    • Drop temporary index

7.3 Controlling whether the current field is indexed

  • Index: controls whether the current field is indexed. The default istrue. If set tofalse, the field is not searchable

DELETE dynamic_mapping_test # set index to false DELETE users PUT users {"mappings": {"properties": { "firstName":{ "type": "text" }, "lastName":{ "type": "text" }, "mobile":{ "type": "text", "index": false } } } } PUT users/_doc/1 { "firstName":"Ruan", "lastName":"Yiming", "mobile":"123456789" } POST /users/_search { "query": { "match": { "mobile": "123456789" } } }Copy the code

7.3 index Options

  • For the establishment of inverted indexes, ES provides four different levelsindex OptionsConfiguration to control the inversion of the contents of index records
    • docsRecord:doc id
    • freqsRecord:doc idandterm frequencies
    • positionsRecord:doc id/term frequencies/term position
    • offsetsRecord:doc id/term frequencies/term position/character offects
  • TextType Default recordpostions, the default values of other parameters aredocs
  • More records occupy more storage space

7.4 null_value

  • The need tonullValue to implement search
  • onlykeywordType support settingNull_Value

Null_value DELETE users PUT users {"mappings": {"properties": {"firstName":{"type": "text" }, "lastName":{ "type": "text" }, "mobile":{ "type": "keyword", "null_value": "NULL" } } } } PUT users/_doc/1 { "firstName":"Ruan", "lastName":"Yiming", "mobile":null } GET users/_search { "query": { "match": { "mobile": "NULL" } } }Copy the code

7.5 copy_toSet up the

  • _allin7In thecopy_toreplaced
  • Meet some specific search requirements
  • copy_toCopy the value of the field to the target field to achieve similar results_allThe role of
  • copy_toThe target field does not appear in_sourceIn the

# set Copy to DELETE users PUT users {"mappings": {"properties": {"firstName":{"type": "text", "copy_to": "fullName" }, "lastName":{ "type": "text", "copy_to": "fullName" } } } } PUT users/_doc/1 { "firstName":"Ruan", "lastName":"Yiming" } GET users/_search { "query": { "match": { "fullName": { "query": "Ruan Yiming", "operator": "and" } } } }Copy the code

When you index a document, if it contains firstName and lastName, we copy that value to the fullName field, and when you query it, you can use the fullName field

7.6 Array Types

  • ElasticsearchDoes not provide a specialized array type in. Any field, however, can contain multiple values of the same type

PUT users/_doc/1 {"name":"onebird", "interests":"reading"} PUT users/_doc/1 {"name":"twobirds", "interests":["reading","music"] } POST users/_search { "query": { "match_all": {} } } GET users/_mappingCopy the code

8. Multi-field features andMappingTo configure a customAnalyzer

8.1 Multi-field Type

  • Multi-field feature
    • Vendor name to achieve accurate matching
      • Add akeywordA child of the field
    • Use differentanalyzer
      • Different languages
      • Pingyin field search
      • It also supports specifying different values for search and indexanalyzer

8.2 Exact Values(Exact value) VSFull Text(Full text)

  • Exact values vs Full Text
    • Exact Value: contains numbers/dates/a specific string (e.g. “Apple Store”)
      • ElasticsearchIn thekeyword
    • Full-text, unstructured text data
      • ElasticsearchIn thetext

8.2.1 Exact ValuesExact Values and Full Text do not need to be parsed.

  • ElasticsearchCreate an inverted index for each field
    • Exact ValueNo special word segmentation is required when indexing

8.3 User-defined Participles

  • whenElasticsearchIf the built-in word divider cannot meet the requirements, you can customize a word divider. This is achieved by combining different components (the following three areAnalyzerSee 4.2 above.)
    • Character Filter
    • Tokenizer
    • Token Filter

8.3.1 Character Filters

  • inTokenizerPrevious processing of text, such as adding delete and replace characters. Multiple configurations can be configuredCharacter Filters. Will affect theTokenizerthepositionandoffsetinformation
  • Some of your ownCharacter Filter
    • HTML strip: removehtmlThe label
    • Mapping: String substitution
    • Pattern replace: Re match replacement

POST _analyze { "tokenizer": "keyword", "char_filter": ["html_strip"], "text": "<b>hello world</b>"} // Use char filter to replace (replace '-' with '_') POST _analyze {"tokenizer": "standard", "char_filter": [ { "type":"mapping", "mappings":["- => _"] } ], "text": "123-456,I-test! POST _analyze {"tokenizer": "standard", "char_filter": [ { "type":"mapping", "mappings": [":) => happy",":( => sad"] } ], "text": // GET _analyze {"tokenizer": "standard", "char_filter": [{"type": "pattern_replace", "pattern": "http://(.*)", "replacement": "$1" } ], "text": ["http://www.elastic.co"] }Copy the code

8.3.2 Tokenizer

  • Divide the original text into words according to certain rules (term or token)
  • ElasticsearchThe built-inTokenizers
    • whitespace/standard/uax_url_email/pattern/keyword/path hierarchy
  • You can usejavaDevelop plug-ins and implement your ownTokenizer

# Tokenizer
POST _analyze
{
  "tokenizer": "path_hierarchy",
  "text": "/user/ymruan/a/b/c/d/e"
}
Copy the code

8.3.3 are includedToken Filters

  • willTokenizerOutput words (term), add, modify, delete
  • built-inToken Filters
    • Lowercase/stop/synonym(Add synonyms)

# Token Filters #whitespace and stop # split by Spaces and filter specific words, such as on in the, which are called stop. GET _analyze {"tokenizer": "whitespace", "filter": ["stop"], "text": ["The rain in Spain falls mainly on The plain."]} # GET _analyze {"tokenizer": "whitespace", "filter": ["lowercase","stop"], "text": ["The girls in China are playing this game!"] } # How to customize an Analyzer to meet your specific needs DELETE my_index PUT my_index {" Settings ": {"analysis": {" Analyzer ": { "my_custom_analyzer": { "typpe":"custom", "char_filter": [ "emoticons" ], "tokenizer": "punctuation", "filter": [ "lowercase", "english_stop" ] } }, "tokenizer": { "punctuation":{ "type": "pattern", "pattern": "[.,!?] " } }, "char_filter": { "emoticons": { "type":"mapping", "mappings": [ ":) => _happy_", ":( => _sad_" ] } }, "filter": { "english_stop": { "type":"stop", "stopwords": "_english_" } } } } } POST my_index/_analyze { "analyzer": "my_custom_analyzer", "text": "I'm a :) person,and you?" }Copy the code

9. Index TemplateandDynamic Template

9.1 what isIndex Template

  • As time goes on, your cluster will have more and more indexes. For example, if your cluster is used for log management, you will generate a new index for these logs every day, because this way you can manage the data better, and the cluster will have a better performance.
  • Index Template: Helps you set upMappingsandSettingsAnd automatically matches the newly created index according to certain rules
    • Templates are only useful when an index is created. Modifying a template does not affect indexes that have been created
    • You can set more than one index template, these Settings will bemergetogether
    • You can specifyorderThe value of the controlmergingThe process of

9.1.1 Index TemplateThe way in which

  • When an index is created
    • applicationElasticsearchThe defaultsettingsandmappings
    • applicationorderNumerical lowIndex TemplateIn the set
    • applicationorderhighIndex TemplateThe previous Settings will be overwritten
    • Specified by the user when the application creates an indexSettingsandMappingsAnd overrides the Settings in the previous template

9.1.2 Demo

  • Create 2Index Template
  • View View by nameTemplate
  • To see alltemplates,_tmplate/*
  • Create a temporary index and viewreplicaAnd data type inference
  • Set all names to be able toIndex TemplateWhen matched, view the generatedIndextheMappingsandSettings

PUT _template/template_default {" index_Patterns ": ["*"], "order" : 0, "version" : 1, "Settings ": {"number_of_replicas" : 1}} # create a second template. PUT /_template/template_test {"index_patterns": ["test*"], "order" : 1, "settings" : { "number_of_shards" : 1, "number_of_replicas" : 2 }, "mappings" : { "date_detection" : false, "numeric_detection" : GET /_template/template_default GET /_template/temp* Index start with test PUT testTemplate /_doc/1 {"someNumber" : "1", "someDate" : "2020/12/05"} GET testtemplate/_mapping GET testtemplate/_settings "settings": { "number_of_replicas": 5 } } PUT testmy/_doc/1 { "key" : "value" } GET testmy/_settingsCopy the code

9.2 what isDynamic Template

  • According to theElasticseachIdentified data types, combined with field names, to dynamically set field types
    • All string types are set toKeywordOr shut it downkeywordfield
    • isThe leading fields are set toboolean
    • long_Everything at the beginning is set tolongtype

9.2.1 Dynamic Template

  • Dynamic TemplateIs defined in some indexMappingIn the
  • TemplateThere is a name
  • The matching rule is an array
  • Set to match to fieldMapping

GET my_index/_mapping
DELETE my_index
PUT my_index
{
  "mappings": {
    "dynamic_templates": [
      {
        "string_as_boolean": {
          "match_mapping_type": "string",
          "match" : "is*",
          "mapping": {
            "type": "boolean"
          }
        } 
      },
      {
        "string_as_keywords": {
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "keyword"
          }
        }
      }
    ]
  }
}

PUT my_index/_doc/1
{
  "firstName" : "Ruan",
  "isVip" : "true"
}

GET my_index/_mapping
Copy the code

10. ElasticsearchIntroduction to polymerization analysis

10.1 What is Aggregation?

  • ElasticsearchIn addition to search, provide targetedESThe function of statistical analysis of data
    • Real time high
    • Hadoop(T+1)
  • By aggregating, we get an overview of the data, analyzing and summarizing the whole set of data rather than looking for individual documents
    • Number of rooms in Tsim Sha Tsui and Hong Kong Island
    • Different price range, number of economy hotels and five-star hotels available for booking
  • High performance, requires only one statement, can be fromElasticsearchGet analysis results
    • There is no need for the client to implement the analysis logic itself
  • KibanaIn a large number of visual reports, are aggregated analysis

10.2 Classification of aggregation

  • Bucket Aggregation: a collection of documents with columns that meet certain criteria (similar to Group BY)
  • Metric Aggregation: Some mathematical operations for statistical analysis of document fields (maximum, minimum, average)
  • Pipeline Aggregation: Performs secondary aggregation on other aggregation results
  • Matrix Aggregation: Supports multiple field operations and provides a result matrix

10.3 Bucket & Metric

  • Bucket: Can be understood asSQLIn theGroup
  • Metric: Can be understood asSQLIn theCount, you can perform a series of statistical methods

10.3.1 Bucket

  • Some examples
    • Hangzhou belongs to Zhejiang/an actor belongs to male or female
    • Nested relationship: Hangzhou belongs to Zhejiang belongs to China belongs to Asia
  • ElasticsearchThere are many types availableBucketTo help you divide documents in a variety of ways
    • Term & Range(Time/age range/geographical location)

10.3.2 Metric

  • MetricComputes results based on data sets. In addition to supporting computations on fields, it also supports computations in scripts (painless script)
  • Most of theMetricIt is a mathematical calculation that outputs only one value
    • min/max/sum/avg/cardinality
  • Part of themetricMultiple values can be output
    • stats/percentiles/percentile_ranks

Elasticsearch Is a game about Elasticsearch. It’s a game about Elasticsearch.