This is my 16th day of The August Update Challenge

The Normalizer parameter of the keyword type is similar to Analyzer, but normalizer generates only one token.

Normalizer is used before indexing and before querying the keyword field, such as match and term-level queries, such as term queries.

Test data:

PUT index { "settings": { "analysis": { "normalizer": { "my_normalizer": { "type": "custom", "char_filter": [], "filter": ["lowercase", "asciifolding"] } } } }, "mappings": { "properties": { "foo": { "type": "Keyword ", "normalizer": "my_normalizer"}}}} PUT index/_doc/1 {"foo":" BAR "} PUT index/_doc/2 {"foo": "bar" } PUT index/_doc/3 { "foo": "baz" } POST index/_refreshCopy the code

Execute query:

GET index/_search
{
  "query": {
    "term": {
      "foo": "BAR"
    }
  }
}

GET index/_search
{
  "query": {
    "match": {
      "foo": "BAR"
    }
  }
}
Copy the code

Both of the above queries return doc1 and doc2 because normalizer precedes the index, [‘BAR’, ‘BAR’] –> BAR. Normalizer (BAR –> BAR); doc1 and doc2 will also be matched by normalizer (BAR –> BAR).

Note: In other cases, the text of the term query will be compared as is with the tokens in the inverted index. However, when a keyword field explicitly sets the Normalizer attribute, this rule is broken. Input text will be processed through Normalizer before being compared with the tokens in the inverted index.

Data returned:

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : {" total ": {" value" : 2, the "base" : "eq"}, "max_score" : 0.47000363, "hits" : [{" _index ":" index ", "_type" : "_doc", "_id" : "1", "_score" : 0.47000363, "_source" : {" foo ":" BAR "}}, {" _index ":" index ", "_type" : "_doc", "_id" : "2", "_score" : 0.47000363, "_source" : {" foo ":" bar "}}}}]Copy the code

Note that data of the keyword type will be indexed by Normalizer first. This means that the key returned by normalizer will be the keyword in the aggregate query. Example:

GET index/_search
{
  "size": 0,
  "aggs": {
    "foo_terms": {
      "terms": {
        "field": "foo"
      }
    }
  }
}
Copy the code

Data returned:

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "foo_terms" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "bar",
          "doc_count" : 2
        },
        {
          "key" : "baz",
          "doc_count" : 1
        }
      ]
    }
  }
}
Copy the code

The preceding example shows the keyword type documents for which the Normalizer attribute is set. When the keyword is used to aggregate the query results, the token after the Normalizer is used as the key. For example: [‘BAR’, ‘BAR’] –> BAR.