[TOC]

preface

In Elasticsearch, query Term and full-text queries are two completely different approach, in the last we also compared the simple query Term and full-text query in the Phrase of the difference, so in this paper, a thorough to clarify the relationship between these two kinds of query.

We recreate a new index, Index_002, and insert the following data

POST /_bulk
{"index":{"_index":"index_002"}}
{"id": "1"."name":"lonely wolf"."address":null."count": 1.}
{"index":{"_index":"index_002"}}
{"id": "2"."name":"lonely hello wolf"."address":[]."count": 3}
{"index":{"_index":"index_002"}}
{"id": "3"."name":"lonely hello word wolf"."address":"[guangdong]","count":1}
{"index":{"_index":"index_002"}}
{"id": "4"."name":"Lonely Wolf"."address":"['in guangdong'.'shenzhen']","count":2}
{"index":{"_index":"index_002"}}
{"id": "5"."name":"wolf"."address":null."count": 1.}
Copy the code

The Term query

Term queries generally express minimal unit queries, which means that the keywords we pass in will be queried as a whole, not as a word segmentation.

Select * from text; select * from text; select * from text;

POST index_001/_search
{
  "query": {
    "term": {
      "name.keyword": {
        "value": "lonely wolf"}}}}Copy the code

This is because fields of type text are stored by an inverted index. The inverted index uses the parser to segment the text. We can use the parser to view the result of the word segmentation:

POST /_analyze
{
 "analyzer": "standard"."text": ["lonely wolf"]}Copy the code

It can be seen that lonely Wolf is divided into lonely and Wolf, so it is natural that we cannot query lonely Wolf as a single word.

There is a point to note here, if we store capital words, such as Lonely Wolf, the word segmentation will be the same result, that is, the capital letters will be converted to lowercase for storage, so the full text query can not query the result.

The exists query

Used to determine the presence of a field and returns a document containing any index value for the field.

GET index_002/_search
{
  "query": {
    "exists": {
      "field": "address"}}}Copy the code

The third and fourth values, such as null values and empty arrays [], are not returned.

If you want to return a null value or an empty array [], you can use the must_NOT statement of a bool query:

GET index_002/_search
{
  "query": {
    "bool": {
      "must_not":[{"exists": {"field": "address"}}]}}}Copy the code

Fuzzy queries

For example, sometimes when we use Baidu to search, we will correct the wrong word:

A fuzzy query can replace a word with a similar word when a word is wrong, mainly in the following scenarios:

  • Change a word, as in:box—>fox.
  • Remove a word such as:black–>lack.
  • Insert a word, such as:sic–>sick.
  • Convert the order of two words, as in:act–>cat.

In order to find such approximate words, fuzzy query needs to create a set of all approximate words, so that the exact query can be used to find the approximate words instead of the query.

Select * from loneyl; select * from loneyle; select * from loneyl; select * from loneyle;

GET index_002/_search
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "lonel"}}}}Copy the code

Ids query

The query result is returned based on the document ID, which is the document ID.

GET index_002/_search
{
  "query": {
    "ids": {
      "values": ["id1"."id2"]}}}Copy the code

The prefix queries

Queries by specifying the prefix of a field.

GET index_002/_search
{
  "query": {
    "prefix": {
      "name": {
        "value": "lo"}}}}Copy the code

Range queries

Query by range.

GET index_002/_search
{
  "query": {
   "range": {
     "id": {
       "gte": 1."lte": 2}}}}Copy the code

Among them:

  • Gt: greater than.
  • Gte: greater than or equal to.
  • Lt: Less than.
  • Lte: less than or equal to.

The range query can also be used for the range query of dates, in which the date is converted to milliseconds, such as the following example to query the range from yesterday to today, and the time zone can be specified with time_zone:

GET _search
{
    "query": {
        "range" : {
            "timestamp" : {
                "gte" : "now-1d/d"."lt" :  "now/d"}}}}Copy the code

The regexp query

Query using regular expressions. The following example can query all data starting with LON:

GET index_002/_search
{
  "query": {
   "regexp": {
     "name": "lon.*"}}}Copy the code

The term query

Returns a document in which one or more words exactly match.

Return the first four pieces of data
GET index_002/_search
{
  "query": {
   "term": {
     "name": {
       "value": "lonely"}}}}# return only the first data
GET index_002/_search
{
  "query": {
   "term": {
     "name.keyword": {
       "value": "lonely wolf"}}}}Copy the code

Query terms

Terms and term are the same meaning, the only difference being that terms can accurately match multiple words at once.

Return all five data points
GET index_002/_search
{
  "query": {
   "terms": {
     "name": [
       "lonely"."wolf"]}}}Copy the code

Terms_set query

The terms_set query uses the same rules as the terms query, except that the terms_set query can define the number of matching terms. The defined number can only be retrieved from a column in the document or configured using a script:

The count column cannot be a text column because the first letter in 'Wolf' is uppercase and cannot be matched exactly
GET index_002/_search
{
  "query": {
    "terms_set": {
      "name": {
        "terms": [
          "lonely"."Wolf"]."minimum_should_match_field": "count"}}}}Copy the code

Type the query

For a specified type query, the type of type is already marked as expired in version 7.0 and deprecated in version 8.0.

Wildcard query

Query by wildcard, this can be interpreted as a simplified version of regular expression query:

GET index_002/_search
{
  "query": {
   "wildcard": {
     "name": {
       "value": "lone*"}}}}Copy the code

The full text query

Advanced full-text queries are typically used for full-text queries on full-text field text types, such as the body of an E-mail message. Full-text query performs word segmentation for fields in search and index. Before querying, it performs word segmentation for input words first, then queries for each word item, and finally merges the results and returns the results according to the scoring results.

There are many types of full-text queries, and here we mainly introduce the match query and the match_PHRASE query.

Match the query

Match queries are standard queries that perform full-text searches, including fuzzy matching options. Here is a standard match query:

Return all 5 items of data
POST index_002/_search
{
  "query": {
    "match": {
      "name": "lonely wolf"}}}Copy the code

Compare term queries:

# No result satisfies the condition
POST index_002/_search
{
  "query": {
    "term": {
      "name": "lonely wolf"}}}# return the first data
POST index_002/_search
{
  "query": {
    "term": {
      "name.keyword": "lonely wolf"}}}Copy the code

According to the results of the above several queries, we can draw the difference between term query and full-text match query:

  • termThe query queries the search keyword as a whole.
  • matchThe query will segment the search keyword, and the default word segmentation isorThe relationship between.

According to these two conclusions, it is also obvious that term queries are generally not used for text type fields, because text type fields will be indexed by word segmentation, which may result in failure to be matched by term queries.

Look at the following example, which returns the second and third pieces of data (the search after the word segmentation is independent of the order) :

Select * from the list where at least 3 words are matched
POST index_002/_search
{
  "query": {
    "match": {
      "name": {
        "query": "hello wolf lonely"."operator": "or"."minimum_should_match": 3}}}}Copy the code

Match_phrase query

Match_phrase queries the entered search keyword as a phrase, which looks similar to the term query, but the match_phrase query has a built-in parameter slop that defines the allowable gap in the phrase. The default is 0 to indicate that no other words are allowed in the middle:

POST index_002/_search
{
  "query": {
    "match_phrase": {
      "name": {
        "query": "lonely wolf"}}}}Copy the code

The result of this statement will be able to query the first and fourth data. Note that although lonely Wolf in the fourth data starts with a capital letter, it will be indexed in lower case, so the result can be queried as well.

At this point, slop=1 was added to query, indicating that there was a gap between allowed phrases, so the second data could be queried at this time:

POST index_002/_search
{
  "query": {
    "match_phrase": {
      "name": {
        "query": "hello wolf lonely"."slop": 1}}}}Copy the code

conclusion

This paper mainly describes the difference between Term query and match query in full-text query, and summarizes the following points:

  1. TermThe query does not perform word segmentation for the search keywords, but queries as a whole.
  2. Full-text enquiries are as follows:matchAnd so on, will be the search keyword segmentation, and search for each term, the defaultorThe relationship is merged, and the final algorithm returns the result.
  3. rightTextType field, the index will be split, capital letters will be converted to lowercase, so if you useTermormatch_phraseAttention should be paid to the impact of word segmentation on query results.