elasticsearch Search APIs

URL Search API

Grammar:

get <index_name>/_search

post <index_name>/_search
{

}
Copy the code

Description:


  • /_search can be omitted. If not, all indexes in the entire cluster are queried


  • /_search supports wildcards. For example, user* indicates all indexes in the query range that start with user


  • /_search Supports multiple indexes, separated by commas (,), for example, user1. User2 /_search indicates that the search scope is user1 and user2

  • Get requests can append request parameters to the URL, using Query String Syntax

  • Post/GET requests can add a Request Body using Query Domain Specific Language(DSL)

  • For details, see the Search API

Query String Syntax

demo:

Get movies/_search? q=2012&df=year&sort=year:desc&from=0&size=10&timeout=1sCopy the code
  • Q Specifies the Query statement, using Query String Syntax
  • Df Specifies the field to be queried
  • Sort Specifies the sort rule
  • From and size are used for paging
  • Q can specify fields, precise queries, and fuzzy queries.
    • Single field exact query,q=k:v, such as:q=year:2012
    • Generic query, exactly_all, all fields:q=v, such as:get movies/_search? q=2012
    • The Term queryBeautiful MindEquivalent toBeautiful OR Mind
    • Phrase query:"Beautiful Mind"Equivalent toBeautiful AND Mind. The order should be consistent
    • Combination of conditions query:
      • Single condition query:q=+k1:v1 -k2:v2 k3:v3.+The prefix must match the query condition. Similarly,-The prefix does not match the query condition. There is no+or-All other criteria are optional, and the more matches, the more relevant the document. Such as:get movies/_search? q=+year:2012 -title:"Bullet to the Head"
      • Multi-condition combination query:AND / OR / NOTor&& / || / !, note: must be uppercase.
    • Range query:
      • Interval indicates: [] closed interval, {} open interval
        • year:{2019 TO 2018]
        • year:[* TO 2018]
      • Arithmetical representation:
        • year:>2012
        • year:(>2012 && <=2018)
        • year:(+>2010 +<=2018)
    • Wildcard query (Do not use wildcard query because it is inefficient and occupies large memory. Especially at the front)
      • ? 1 character, * 0 or more characters: for exampleGET /movies/_search? q=title:b*
    • Regular expression query (not recommended because query efficiency is low) :GET /movies/_search? q=title:[bt]oy
    • Fuzzy query and approximate query:
      • with~Indicates that the search word may have one or two letters incorrectly written, returns the result according to similarity, can blur up to 2 distances.
        • GET /movies/_search? q=title:beautifl~1
        • GET /movies/_search? q=title:"Lord Rings"~2

Query Domain Specific Language(DSL)

For example:

Get movies/_search? q=year:2005 post movies/_search { "query":{ "match": {"year": 2005} } }Copy the code
  • Paging query:

    • {
          "from": 10."size": 20."query": {
              "match_all": {}}}Copy the code
    • From starts from 0 and returns 10 results by default. The cost of page turning is higher if the result is obtained later.

  • The sorting

    • It is best to sort fields of numeric or date type

    • Because for sorting multi-value types or parsed fields, the system will pick a value that is not known

    • {
          "sort": [{"order_date": "desc"}}]Copy the code
  • _source filtering

    • If _source is not stored, only the metadata of the matching document is returned

    • _source Wildcard characters are supported: _source[“name*”,”desc*”]

    • {
          "_source": ["order_date"."order_date"."category_keyword"]}Copy the code
  • Script field

    • {
          "script_field": {
              "new_field": {
                  "script": {"lang": "painless"."source": "doc['order_date'].value+'hello'"}}}}Copy the code
    • Use case: There are different exchange rates in the order, and the order price needs to be sorted according to exchange rates.

Term-Level Queries

  • Term is the smallest unit of meaning. Both search and natural language processing using statistical language models require dealing with Term
  • Term Level Query: Term Query / Range Query / Exists Query / Prefix Query / Wildcard Query
  • In Es, Term Query, the input is not participled. The input as a whole is searched for the exact term in the inverted index, and the relevance score is calculated for each document that contains that term using the relevance score formula
  • Constant Score allows you to transform queries into a Filtering to avoid scoring and take advantage of caching to improve performance.

Case study:

Create an index of Products and insert three data items

DELETE products
PUT products
{
  "settings": {
    "number_of_shards": 1
  }
}


POST /products/_bulk
{ "index": { "_id": 1 }}
{ "productID" : "XHDK-A-1293-#fJ3","desc":"iPhone" }
{ "index": { "_id": 2 }}
{ "productID" : "KDKE-B-9947-#kL5","desc":"iPad" }
{ "index": { "_id": 3 }}
{ "productID" : "JODL-X-1937-#pV7","desc":"MBP" }
Copy the code

Term Query

Using Term Query, check that the value of desc is iPhone

POST /products/_search
{
  "query": {
    "term": {
      "desc": {
        "value":"iPhone"
      }
    }
  }
}
Copy the code

Results:

{
  "took" : 0."timed_out" : false."_shards" : {
    "total" : 1."successful" : 1."skipped" : 0."failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0."relation" : "eq"
    },
    "max_score" : null."hits": []}}Copy the code

** Thinking: ** Document clearly has the desc value is iPhone, why can not find the data?

The answer:

Because when you insert a document, it does word splitting, using Standard Analyzer, and it converts it to lowercase by default, but when you use Term Query, the input doesn’t do word splitting, so the uppercase P doesn’t convert to lowercase P. If the value of the query is iPhone, you get the result

POST /products/_search
{
  "query": {
    "term": {
      "desc": {
        "value":"iphone"
      }
    }
  }
}
Copy the code
  • Use Term Query to view according to productId

    POST /products/_search
    {
      "query": {
        "term": {
          "productID": {
            "value": "XHDK-A-1293-#fJ3"
          }
        }
      }
    }
    Copy the code

    Results:

    {
      "took" : 1."timed_out" : false."_shards" : {
        "total" : 1."successful" : 1."skipped" : 0."failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 0."relation" : "eq"
        },
        "max_score" : null."hits": []}}Copy the code

    ** Thinking: ** why can not find the data?

    The answer:

    Xhdk-a-1293 -#fJ3 = xhdK-a-1293 -#fJ3 = xhdK-a-1293 -#fJ3

    post _analyze
    {
      "analyzer": "standard",
      "text": "XHDK-A-1293-#fJ3"
    }
    Copy the code

    Results:

    {
      "tokens": [{"token" : "xhdk"."start_offset" : 0."end_offset" : 4."type" : "<ALPHANUM>"."position" : 0
        },
        {
          "token" : "a"."start_offset" : 5."end_offset" : 6."type" : "<ALPHANUM>"."position" : 1
        },
        {
          "token" : "1293"."start_offset" : 7."end_offset" : 11."type" : "<NUM>"."position" : 2
        },
        {
          "token" : "fj3"."start_offset" : 13."end_offset" : 16."type" : "<ALPHANUM>"."position" : 3}}]Copy the code

    Because Term Query does not do word segmentation for input, the Query results do not meet expectations.

    If the following statement is executed:

    POST /products/_search
    {
      "query": {
        "term": {
          "productID": {
            "value": "xhdk"
          }
        }
      }
    }
    Copy the code

    The corresponding result is returned.

    If you want full-text matching, you can execute the following statement:

    POST /products/_search
    {
      "query": {
        "term": {
          "productID.keyword": {
            "value": "XHDK-A-1293-#fJ3"
          }
        }
      }
    }
    Copy the code

    Why does keyword match the full text?

    This is actually the configuration of index Mapping.

    GET /products/_mapping
    Copy the code

    Results:

    {
      "products" : {
        "mappings" : {
          "properties" : {
            "desc" : {
              "type" : "text"."fields" : {
                "keyword" : {
                  "type" : "keyword"."ignore_above" : 256}}},"productID" : {
              "type" : "text"."fields" : {
                "keyword" : {
                  "type" : "keyword"."ignore_above" : 256
                }
              }
            }
          }
        }
      }
    }
    Copy the code
  • Because Term Query also returns Score, the comparison affects performance, you can skip the Score calculation step

    • Query is converted into Filter, and TF-IDF calculation is ignored to avoid the overhead of correlation calculation
    • Filters can make good use of caching
    POST /products/_search
    {
      "explain": true,
      "query": {
        "constant_score": {
          "filter": {
            "term": {
              "productID.keyword": "XHDK-A-1293-#fJ3"
            }
          }
        }
      }
    }
    Copy the code

Structured Search

  • Search for structured data
    • Dates, bool types, and numbers are all structured
  • Text can also be structured
    • For example, colored pens can have discrete color sets: red, green, blue
    • A blog might be tagged: distributed, search
    • Products on e-commerce sites have UPCs (Universal Product Codes) or other unique identifiers that are subject to strict, structured formats.
  • Structured data such as booleans, times, dates, and numbers: there are precise formats that we can logically manipulate.
  • Structured text can be matched exactly or partially
    • Term Query / Prefix Query
  • Structured results have only yes or no values
    • Depending on the scenario, you can decide whether structured search needs to be scored.
Boolean

Data preparation:

DELETE products
POST /products/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10,"avaliable":true,"date":"2018-01-01", "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20,"avaliable":true,"date":"2019-01-01", "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30,"avaliable":true, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30,"avaliable":false, "productID" : "QQPX-R-3956-#aD8" }

GET products/_mapping
Copy the code

Case study:

POST products/_search {"profile": "true", "explain": true, "query": {"term": {"avaliable": POST products/_search {"profile": "true", "explain": true, "query": { "constant_score": { "filter": { "term": { "avaliable": true } } } } }Copy the code
Numeric Range
  • Gt is greater than the
  • Lt is less than
  • Gte is greater than or equal to
  • Lte less than or equal to
# number type Term POST products/_search {"profile": "true", "explain": true, "query": {" Term ": {"price": Query: {"query": {"query": {"filter": {"terms": {"price": ["20", "30"]}}}}} # digital Range GET products/_search {"query" : {"constant_score" : {"filter" : {" Range ": { "price" : { "gte" : 20, "lte" : 30 } } } } } }Copy the code
Date Range
expression instructions
y Years
M Months
w Weeks
d Days
h Hours
H Hours
m Minutes
s Seconds

Suppose now means the time is now 2021-07-04 12:00:00

expression instructions
now+1h The 2021-07-04 13:00:00
now-1h The 2021-07-04 11:00:00
2021.07.04 | | + 1 m/d 2021-08-04 00:00:00

Case:

POST products/_search{    "query" : {        "constant_score" : {            "filter" : {                "range" : {                    "date" : {                      "gte" : "now-5y"                    }                }            }        }    }}
Copy the code
Exists

No result is returned when the EXISTS method is called in the following case

  • If the field does not exist, the corresponding value is null or []

  • If the field exists, the following conditions exist:

    • An empty string""or"-"
    • The array contains null,[null, "foo"]
    • The customnull-valueWhen defining index Mapping
POST products/_search{  "query": {    "constant_score": {      "filter": {        "exists": {          "field": "date"        }      }    }  }}POST products/_search{  "query": {    "constant_score": {      "filter": {        "bool": {          "must_not": {            "exists": {              "field": "date"            }          }        }      }    }  }}
Copy the code
Terms

The lookup contains more than one exact value, noting inclusion rather than equality

PUT my-index-000001{ "mappings": { "properties": { "color": { "type": "keyword" } } }}PUT my-index-000001/_bulk{"index": {"_id": 1}}{"color": ["blue", "green"]}{"index": {"_id": 2}}{"color": "blue"}GET my-index-000001/_search? pretty{ "query": { "terms": { "color" : { "index" : "my-index-000001", "id" : "2", "path" : "color" } } }}POST movies/_search{ "query": { "constant_score": { "filter": { "term": { "genre.keyword": "Comedy" } } } }}POST products/_search{ "query": { "constant_score": { "filter": { "terms": { "productID.keyword": [ "QQPX-R-3956-#aD8", "JODL-X-1937-#pV7" ] } } } }}Copy the code

Full Text Query

  • Classification of Full Text Query
    • Match Query
    • Match Phrase Query
    • Query String Query
    • Multi Match Query
    • Simple Query String Query
  • The characteristics of
    • Indexing and search are segmented, and the query string is passed to an appropriate tokenizer, which generates a list of terms to query.
    • During the query, the input query will be divided into words first, and then each word item one by one for the bottom of the query, and finally merge the results. A score is not generated for each document.

Query String Query

Similar to [URL Search](#URL Search API)

  • Query String Query

    • GET /movies/_search{	"profile": true."query": {"query_string": {"default_field": "title"."query": "Beautiful AND Mind"}}}Copy the code
    • GET /movies/_search{	"profile": true."query": {"query_string": {"fields": ["title"."year"]."query": "2012"}}}Copy the code

Simple Query String Query

  • Similar to Query String, but ignoring incorrect syntax.
  • Only partial query statements are supported
  • AND OR NOT is NOT supported AND is treated as a string
  • The default relationship between terms is OR, and Operator can be specified
  • Support partial logic
    • + alternative AND
    • | replace the OR
    • – replace the NOT
GET /movies/_search{	"profile":true,	"query":{		"simple_query_string":{			"query":"Beautiful +mind",			"fields":["title"]		}	}}
Copy the code

Match Query

POST movies/_search{"query": {"match": {"title": {"query": {"title": {"query": POST movies/_search{"query": {"match": {"title": {"query": {"query": {"query": { "Beautiful Mind", "operator": "AND" } } }}Copy the code

Match Phrase Query

Unlike Match Query, the text of the Query is not segmented and is still a complete phrase.

POST movies/_search{  "query": {    "match_phrase": {      "title":{        "query": "one I love"      }    }  }}POST movies/_search{  "query": {    "match_phrase": {      "title":{        "query": "one love",        "slop": 1      }    }  }}
Copy the code

This exact match is too strict in most cases, sometimes we want to include “I like swimming and riding!” “Documents can also match “I like riding”. The sloP parameter is used to control the flexibility of the query statement.

The SLOp parameter tells match_PHRASE how far apart the query term is and still treat the document as matching what is how far apart? That means how many times do you have to move terms in order for the query to match the document?

Multi Match Query

The multi_match query is built on top of the match query and, importantly, allows multiple field queries.

type instructions note
Best Fields Find documents that match any field, but use _score from the best field When fields compete with each other, they relate to each other. The score comes from the field that best matches.
Most Fields In cases where multiple fields contain the same text, the scores for all fields are combined When working with English content: A common tool is to extract stems in the main field (Engilsh Analyzer) to match more documents. The same text is added to the subfield (Standard Analyzer) to provide a more accurate match. Other fields serve as a signal to match documents for increased relevance. The more fields that match, the better.

Unable to use Operator

You can solve this with copy_to, but it requires extra storage
Cross Fields The query string is parsed and a word list is generated, thenSearch for each word in turn from all fields, as long as the query, the match. For certain entities, such as names, addresses, book information. Information needs to be determined in multiple fields, and a single field can only be part of the whole. Expect to find as many words as possible in any of these listed fields.

Support the operator

Compared to COPY_to, it can increase the weight of individual fields in a search
phrase With match_phrase + best_field
phrase_prefix With match_phrase_prefix + best_field
bool_prefix Same as match_BOOL_prefix + most field
POST blogs/_search{ "query": { "dis_max": { "queries": [ { "match": { "title": "Quick pets" }}, { "match": { "body": "Quick pets"}}], "tie_breaker": 0.2}}}POST blogs/_search{"query": {"multi_match": {"type": "Best_fields "," query": "Quick pets", "fields": ["title","body"], "tie_breaker": 0.2, "minimum_should_match": "20%" } }}POST books/_search{ "multi_match": { "query": "Quick brown fox", "fields": "*_title" }}POST books/_search{ "multi_match": { "query": "Quick brown fox", "fields": [ "*_title", "chapter_title^2" ] }}DELETE /titlesPUT /titles{ "mappings": { "properties": { "title": { "type": "text", "analyzer": "english", "fields": {"std": {"type": "text","analyzer": "standard"}} } } }}POST titles/_bulk{ "index": { "_id": 1 }}{ "title": "My dog barks" }{ "index": { "_id": 2 }}{ "title": "I see a lot of barking dogs on the road " }GET /titles/_search{ "query": { "multi_match": { "query": "barking dogs", "type": "most_fields", "fields": [ "title", "title.std" ] } }}GET /titles/_search{ "query": { "multi_match": { "query": "barking dogs", "type": "most_fields", "fields": [ "title^10", "title.std" ] } }}Copy the code

Compound queries

Query Context & Filter Context

  • Advanced search: supports multiple text input and searches for multiple fields.
  • Search engines also generally provide filtering based on time, price and so on
  • In ES, there are two different contexts: Query and Filter
    • Query Context: Correlation score
    • Filter Context: No scoring is required. Cache can be used for better performance

Boolean Query

Case study:

  • Suppose you want to search for a movie that contains the following criteria
    • Reviews included Guitar, which users rated above 3 and had release dates between 1993 and 2000
  • This search involves three pieces of logic
    • Include Guitar in the comment field
    • The user rating field is higher than 3 points
    • The release date field needs to be in the given range

Features:

  • A Boolean Query is a combination of one or more Query clauses
    • A total of 4 clauses are included, of which 2 affect the score and 2 do not affect
      • must:It has to match. Contribution counts
      • should:Selective matching, contribution counts
      • must_not:Filter Context query clause, must not match, does not contribute to the score
      • filter:The Filter Context must match, but does not contribute to the score
  • Relevance is not just for full-text search, but also for relevanceyes|noThe more matched clauses, the higher the correlation score. If multiple query clauses are combined into a single compound query statement, such as Boolean Query, the score calculated from each query clause is added to the total correlation score.
  • Competing fields at the same level have the same weight
  • You can change the impact on scoring by nesting Boolean Query
  • The must_not subquery is nested in should to implement the logic of should not

Grammar:

  • Subqueries can appear in any order
  • Subqueries can be nested
  • If there is no Must condition, one of the queries Must be satisfied in should, using arrays
POST /products/_search{  "query": {    "bool" : {      "must" : {        "term" : { "price" : "30" }      },      "filter": {        "term" : { "avaliable" : "true" }      },      "must_not" : {        "range" : {          "price" : { "lte" : 10 }        }      },      "should" : [        { "term" : { "productID.keyword" : "JODL-X-1937-#pV7" } },        { "term" : { "productID.keyword" : "XHDK-A-1293-#fJ3" } }      ],      "minimum_should_match" :1    }  }}
Copy the code

How to solve the problem left over by Terms Query, including rather than equality.

Add the count field, using Boolean Query

# Change the data model to add fields. POST /newmovies/_bulk{"index": {"_id": 1}}{"title" : "Father of the Bridge Part II","year":1995, "genre":"Comedy","genre_count":1 }{ "index": { "_id": 2 }}{ "title" : "Dave","year":1993,"genre":["Comedy","Romance"],"genre_count":2}#must, POST /newmovies/_search{"query": {"bool": {" must ": [{" term" : {" genre. The keyword ": {" value" : "Comedy"}}}, {" term ": {" genre_count" : {" value ": 1}}}]}}} # Filter. /newmovies/_search{"query": {" bool": {" filter": [{"term": {"genre. Keyword ": {"value": "Comedy"}}}, {"term": {"genre_count": {"value": 1}}} ] } }}#Query ContextPOST /products/_search{ "query": { "bool": { "should": [ { "term": { "productID.keyword": { "value": "JODL-X-1937-#pV7"}} }, {"term": {"avaliable": {"value": POST /products/_search{"query": {"bool": {"must": {"term": {"price": {"term": {"price": "30" } }, "should": [ { "bool": { "must_not": { "term": { "avaliable": "false" } } } } ], "minimum_should_match": 1 } }}#Controll the PrecisionPOST _search{ "query": { "bool" : { "must" : { "term" : { "price" : "30" } }, "filter": { "term" : { "avaliable" : "true" } }, "must_not" : { "range" : { "price" : { "lte" : 10 } } }, "should" : [ { "term" : { "productID.keyword" : "JODL-X-1937-#pV7" } }, { "term" : { "productID.keyword" : "XHDK-A-1293-#fJ3" } } ], "minimum_should_match" :2 } }}Copy the code

Boosting Query

  • Boosting is a measure of control over relatedness
    • Index, field, or query subcondition
  • Parameter boost
    • When boost is greater than 1, the relative relevance of the score increases
    • When 0< Boost <1, the relative weight of the score decreases
    • When boost<0, contribution is negative
  • Instead of not appearing at all, I want the results that contain something to come down the list.

Case study:

DELETE blogsPOST /blogs/_bulk{ "index": { "_id": 1 }}{"title":"Apple iPad", "content":"Apple iPad,Apple iPad" }{ "index": { "_id": 2 }}{"title":"Apple iPad,Apple iPad", "content":"Apple iPad" }POST blogs/_search{ "query": { "bool": { "should": [{" match ": {" title" : {" query ":" apple, apple, "" boost" : 1.1}}}, {" match ": {" content" : {" query ": "apple,ipad", "boost": 2 } }} ] } }}DELETE newsPOST /news/_bulk{ "index": { "_id": 1 }}{ "content":"Apple Mac" }{ "index": { "_id": 2 }}{ "content":"Apple iPad" }{ "index": { "_id": 3 }}{ "content":"Apple employee like Apple Pie and Apple Juice" }POST news/_search{ "query": { "bool": { "must": { "match":{"content":"apple"} } } }}POST news/_search{ "query": { "bool": { "must": { "match":{"content":"apple"} }, "must_not": { "match":{"content":"pie"} } } }}POST news/_search{ "query": { "boosting": { "positive": { "match": { "content": "apple" } }, "negative": { "match": { "content": "Pie"}}, "negative_boost": 0.5}}Copy the code
  • Positive: Indicates that the query object must exist. Specify the query clause that you want to execute. The returned results meet the conditions specified by the clause
  • Negative: Must exist, query object, the specified query clause is used to reduce the similarity score of matched documents
  • Negative_boost: Must exist, floating point number, between 0 and 1.0, to reduce the similarity score of matched documents

Constant Score Query

Disjunction Max Query

An instance of a single string query

PUT /blogs/_doc/1{    "title": "Quick brown rabbits",    "body":  "Brown rabbits are commonly seen."}PUT /blogs/_doc/2{    "title": "Keeping pets healthy",    "body":  "My quick brown fox eats rabbits on a regular basis."}POST /blogs/_search{    "query": {        "bool": {            "should": [                { "match": { "title": "Brown fox" }},                { "match": { "body":  "Brown fox" }}            ]        }    }}
Copy the code

Expectation:

Title: Brown appears in document 1

Body: Brown appears in document 1, Brown Fox appears in document 2, and remains in the same order as the query. Visually, document 2 should have the highest correlation score.

Results:

The score for Document 1 is higher than for Document 2.

{  "took" : 0."timed_out" : false."_shards" : {    "total" : 1."successful" : 1."skipped" : 0."failed" : 0  },  "hits" : {    "total" : {      "value" : 2."relation" : "eq"    },    "max_score" : 0.90425634."hits": [{"_index" : "blogs"."_type" : "_doc"."_id" : "1"."_score" : 0.90425634."_source" : {          "title" : "Quick brown rabbits"."body" : "Brown rabbits are commonly seen."}}, {"_index" : "blogs"."_type" : "_doc"."_id" : "2"."_score" : 0.77041256."_source" : {          "title" : "Keeping pets healthy"."body" : "My quick brown fox eats rabbits on a regular basis."}}}}]Copy the code

Scoring process:

  • Query two queries in the should statement
  • Add the scores of the two queries
  • Times the total number of matching statements
  • Divided by the total number of statements

You can use Explain to see the query results and analysis

Title and body compete with each other, and instead of simply stacking scores, you should find a score for a single field that best matches. A Disjunction Max Query returns any document that matches any Query as a result. The score that best matches the field is used to return the final score.

POST blogs/_search{    "query": {        "dis_max": {            "queries": [                { "match": { "title": "Brown fox" }},                { "match": { "body":  "Brown fox" }}            ]        }    }}
Copy the code

The result will be as expected.

Tie_breaker parameters:

  • Get the score _score for the best matching statement.
  • Multiply the scores of other matching statements by tie_breaker
  • Sum and normalize the above scores
  • Is a floating point number between 0 and 1. 0 means the best match is used and 1 means all statements are equally important.

Function Score Query

Score and sort

  • Elasticsearch will sort documents by relevance score by default
  • You can specify one or more fields for sorting
  • Some specific conditions cannot be satisfied by using relevance score ranking
    • There is no more control over sorting by relevance

Function Score Query

  • You can perform a series of recalculations for each matching document after the query, reordering the score based on the newly generated score.
  • function
    • Weight: Sets a simple, unnormalized weight for each document
    • Field Value Factor: Use this Value to modify _score, such as “popularity” and “likes”
    • Random Score: Use different, Random Score results for each user
    • Attenuation function: Based on the value of a field, the closer the distance to a value, the higher the score
    • Script Score: Custom Script for complete control of the logic required
  • Boost Mode
    • Multiply: The result of counting points and the value of a function
    • Sum: Calculates the Sum of fractions and functions
    • Min/Max: Calculate the minimum and maximum value of the score and function
    • Replace: Use function values instead of scoring
  • Max Boost can limit the score to a maximum
  • Consistent random function:
    • Usage scenario: Ads on a website need to provide visibility
    • Specific requirements: Let each user can see different random numbers, but also hope that the same user access, the relative order of the results
DELETE blogsPUT /blogs/_doc/1{ "title": "About popularity", "content": "In this post we will talk about..." , "votes": 0}PUT /blogs/_doc/2{ "title": "About popularity", "content": "In this post we will talk about..." , "votes": 100}PUT /blogs/_doc/3{ "title": "About popularity", "content": "In this post we will talk about..." , "votes": 1000000}POST /blogs/_search{ "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes" } } }}POST /blogs/_search{ "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p" } } }}POST /blogs/_search{ "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "Log1p", "factor", 0.1}}}} POST/blogs / _search {" query ": {" function_score" : {" query ": {" multi_match" : {" query ": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "Log1p ", "factor": 0.1}," BOOST_mode ": "sum", "max_boost": 3}}}POST /blogs/_search{"query": {"function_score": { "random_score": { "seed": 911119 } } }}Copy the code

Search Template

  • Query statement of Elasticsearch
    • It is critical for correlation score and query performance
  • Early in development, although query parameters can be specified, it is often not possible to ultimately define the specific structure of the DSL for the query
    • Define a Contract with the Search Template
  • Each in its own right, decoupled
    • Developers, search engineers, performance engineers
GET _search/template{  "source" : {    "query": { "match" : { "{{my_field}}" : "{{my_value}}" } },    "size" : "{{my_size}}"  },  "params" : {    "my_field" : "message",    "my_value" : "foo",    "my_size" : 5  }}
Copy the code

Suggester API

  • What are search suggestions
    • Modern search engines generally provide Suggest as you type functions
    • Help users to automatically complete or correct errors in the process of searching, and improve the degree of document matching in the subsequent search phase by assisting users to enter more accurate keywords
    • The search on Google will be automatically completed at the beginning. When the input reaches a certain length, for example, similar words or sentences will be prompted due to spelling errors
  • API
    • Suggester HAS implemented a similar feature in the search engine using the Suggester API
    • How it works: Divide the input text into tokens, then look up similar terms in the index’s dictionary and return.
    • Term Suggester (word correction and completion)
    • Suggester (complete a Phrase automatically by typing a word to complete an entire Phrase)
    • Suggester(Complete a word, output the first part, Complete the whole word)
    • Context complester
  • Suggestion Mode
    • Missing- Do not offer suggestions if they already exist in the index
    • Popular- Recommend words that appear more frequently
    • Always- Offer advice whether it exists or not
  • Accuracy and recall
    • precision
      • completion > phrase > term
    • The recall rate
      • term > phrase > completion
    • performance
      • completion > phrase > term

Term Suggester && Prase Suggester

Term Suggester uses word segmentation for the search terms, compares them one by one with the specified index data, calculates the edit distance, and then returns a suggestion word.

Edit distance: An algorithm called Levenstein Edit Distance is used. The idea is that a word can be changed as many times as another word can be. For example, in order to get ElasticSearch from Elasticseach, you have to add a letter “r”, so the edit distance between the two words is 1.

Prase Suggester has added some logic to Term Suggester

Suggester Suggester Max errors: specifies the maximum number of Terms that can be misspitted and confidences: specifies the maximum number of results that can be returned. The default is 1

DELETE articlesPUT articles{ "mappings": { "properties": { "title_completion":{ "type": "completion" } } }}POST articles/_bulk{ "index" : { } }{ "title_completion": "lucene is very cool"}{ "index" : { } }{ "title_completion": "Elasticsearch builds on top of lucene"}{ "index" : { } }{ "title_completion": "Elasticsearch rocks"}{ "index" : { } }{ "title_completion": "elastic is the company behind ELK stack"}{ "index" : { } }{ "title_completion": "Elk stack rocks"}{ "index" : {} }POST articles/_search? pretty{ "size": 0, "suggest": { "article-suggester": { "prefix": "elk ", "completion": { "field": "title_completion" } } }}DELETE articlesPOST articles/_bulk{ "index" : { } }{ "body": "lucene is very cool"}{ "index" : { } }{ "body": "Elasticsearch builds on top of lucene"}{ "index" : { } }{ "body": "Elasticsearch rocks"}{ "index" : { } }{ "body": "elastic is the company behind ELK stack"}{ "index" : { } }{ "body": "Elk stack rocks"}{ "index" : {} }{ "body": "elasticsearch is rock solid"}POST _analyze{ "analyzer": "standard", "text": ["Elk stack rocks rock"]}POST /articles/_search{ "size": 1, "query": { "match": { "body": "lucen rock" } }, "suggest": { "term-suggestion": { "text": "lucen rock", "term": { "suggest_mode": "missing", "field": "body" } } }}POST /articles/_search{ "suggest": { "term-suggestion": { "text": "lucen rock", "term": { "suggest_mode": "popular", "field": "body" } } }}POST /articles/_search{ "suggest": { "term-suggestion": { "text": "lucen rock", "term": { "suggest_mode": "always", "field": "body", } } }}POST /articles/_search{ "suggest": { "term-suggestion": { "text": "lucen hocks", "term": { "suggest_mode": "always", "field": "body", "prefix_length":0, "sort": "frequency" } } }}POST /articles/_search{ "suggest": { "my-suggestion": { "text": "lucne and elasticsear rock hello world ", "phrase": { "field": "body", "max_errors":2, "confidence":0, "direct_generator":[{ "field":"body", "suggest_mode":"always" }], "highlight": { "pre_tag": "<em>", "post_tag": "</em>" } } } }}Copy the code

Complection Suggester

  • Complection Suggester provides the Auto Complete function. Every time a user enters a character, a query request is sent to the backend to find a match.
  • Elasticsearch uses a different data structure. Instead of inverting indexes, elasticSearch encodes the Analyzer’s data into FST and stores the index together. The FST is loaded into memory by ES, which is very fast
  • FST can only be used for prefix lookup
  • Define the mapping using completion Type
  • The index data
  • Run suggest query

context Suggester

  • Completion Suggester has been extended
  • You can add more contextual information to the search, for example by typing “star”
    • Coffee related: Suggest starbucks
    • Movie Related: Suggestions for ‘Star Wars’
  • Define two types of context
    • Category- An arbitrary string
    • Geo- Geographic information
  • Define the mapping
    • type
    • name
  • Index the data and add context information for each document
  • Suggestion query with context
DELETE articlesPUT articles{ "mappings": { "properties": { "title_completion":{ "type": "completion" } } }}POST articles/_bulk{ "index" : { } }{ "title_completion": "lucene is very cool"}{ "index" : { } }{ "title_completion": "Elasticsearch builds on top of lucene"}{ "index" : { } }{ "title_completion": "Elasticsearch rocks"}{ "index" : { } }{ "title_completion": "elastic is the company behind ELK stack"}{ "index" : { } }{ "title_completion": "Elk stack rocks"}{ "index" : {} }POST articles/_search? pretty{ "size": 0, "suggest": { "article-suggester": { "prefix": "elk ", "completion": { "field": "title_completion" } } }}DELETE commentsPUT commentsPUT comments/_mapping{ "properties": { "comment_autocomplete":{ "type": "completion", "contexts":[{ "type":"category", "name":"comment_category" }] } }}POST comments/_doc{ "comment":"I love the star war movies", "comment_autocomplete":{ "input":["star wars"], "contexts":{ "comment_category":"movies" } }}POST comments/_doc{ "comment":"Where can I find a Starbucks", "comment_autocomplete":{ "input":["starbucks"], "contexts":{ "comment_category":"coffee" } }}POST comments/_search{ "suggest": { "MY_SUGGESTION": { "prefix": "sta", "completion":{ "field":"comment_autocomplete", "contexts":{ "comment_category":"coffee" } } } }}Copy the code

Cross Cluster Search

Pain points for horizontal expansion:

  • ChanJiQun:
    • When scaling horizontally, the number of nodes cannot increase indefinitely
    • If a cluster has too much meta information (nodes, indexes, and cluster status), the update pressure increases. A single active master becomes a performance bottleneck, causing the cluster to fail to work properly
  • In the early version, the tribe Node can realize the requirement of multi-cluster access, but there are still some problems
    • The tribe node will join each cluster as a client node. Task changes of the master node in the cluster can continue only after the tribe node responds
    • The tribe Node cannot store cluster state information. Once the restart cluster is restarted, the initialization is slow
    • If multiple clusters have the same index name, only one PREFER rule can be set

Cross Cluster Search

  • The early scheme of tribe Node had some problems, so it was deprecated
  • Es5.3 introduced the cross Cluster search feature
    • Allows any node to act as a Federated node and broker search requests in a lightweight manner
    • You do not need to add a client node to another cluster

Case study:

Bin/ElasticSearch -e node.name=cluster0node -e cluster.name=cluster0 -e path.data=cluster0_data -e discovery.type=single-node -E http.port=9200 -E transport.port=9300bin/elasticsearch -E node.name=cluster1node -E cluster.name=cluster1 -E path.data=cluster1_data -E discovery.type=single-node -E http.port=9201 -E transport.port=9301bin/elasticsearch -E node.name=cluster2node -E cluster.name=cluster2 -E path.data=cluster2_data -E Discovery. type=single-node -e http.port=9202 -e transport.port=9302// Set dynamic Settings on each cluster PUT _cluster/ Settings {"persistent": {" cluster ": {" remote" : {" cluster0 ": {" seeds" : [" 127.0.0.1:9300] ", "transport. Ping_schedule" : "30 s"}, "cluster1" : {"seeds": ["127.0.0.1:9301"], "transport.compress": true, "skip_unavailable": true}, "Cluster2 ": {"seeds": ["127.0.0.1:9301"], "transport.compress": true, "skip_unavailable": true}," Cluster2 ": {"seeds": ["127.0.0.1:9302"]}}}}}#cURLcurl -xput "http://localhost:9200/_cluster/settings" -h 'content-type: application/json' - d '{" persistent ": {" cluster" : {" remote ": {" cluster0" : {" seeds ": [" 127.0.0.1:9300]", "transport. Ping_schedule" : "30 s"}, "cluster1 ": {" seeds" : [" 127.0.0.1:9301] ", "transport.com press" : true, "skip_unavailable" : true}, "cluster2" : {" seeds ": [127.0.0.1:9302" "]} }}}}'curl -XPUT "http://localhost:9201/_cluster/settings" -H 'Content-Type: application/json' - d '{" persistent ": {" cluster" : {" remote ": {" cluster0" : {" seeds ": [" 127.0.0.1:9300]", "transport. Ping_schedule" : "30 s"}, "cluster1 ": {" seeds" : [" 127.0.0.1:9301] ", "transport.com press" : true, "skip_unavailable" : true}, "cluster2" : {" seeds ": [127.0.0.1:9302" "]} }}}}'curl -XPUT "http://localhost:9202/_cluster/settings" -H 'Content-Type: application/json' - d '{" persistent ": {" cluster" : {" remote ": {" cluster0" : {" seeds ": [" 127.0.0.1:9300]", "transport. Ping_schedule" : "30 s"}, "cluster1 ": {" seeds" : [" 127.0.0.1:9301] ", "transport.com press" : true, "skip_unavailable" : true}, "cluster2" : {" seeds ": [127.0.0.1:9302" "]} }}}} '# to create test data curl - XPOST "http://localhost:9200/users/_doc" - H' the content-type: application/json' -d'{"name":"user1","age":10}'curl -XPOST "http://localhost:9201/users/_doc" -H 'Content-Type: application/json' -d'{"name":"user2","age":20}'curl -XPOST "http://localhost:9202/users/_doc" -H 'Content-Type: Application/json '3-d' {" name ":" user3 ", "age" : 30} '# query the GET/users, cluster1: users, cluster2: users / _search {" query ": {" range" : { "age": { "gte": 20, "lte": 40 } } }}Copy the code

resources

REST APIs

Search APIs

Query DSL