We talked about Elasticsearch’s index, search, and participle, but today we’re going to talk about Mapping.

Mapping is equivalent to schema in a relational database. It can be used to define the name of a field in an index, define the data type of a field, and configure some fields. As of Elasticsearch 7.0, there is no need to define type information in Mapping.

The data type of the field

We just mentioned that you can define the data types of fields in Mapping. This is probably the most common feature of Mapping, so let’s take a look at what data types Elasticsearch supports.

  • Simple types: text, keyword, date, long, double, Boolean, IP
  • Complex types: object types, nested types
  • Special type: geo_point, geo_shape for a geographical location

Elasticsearch supports many more data types, which I won’t mention here for space reasons. Let me introduce you to some people who are common at work.

For Elasticsearch, there are two strings: text and keyword. Where the string of type text can be retrieved in full text, it will be used by the word segmentation,

PUT my_index
{
  "mappings": {
    "properties": {
      "full_name": {
        "type":  "text"}}}}Copy the code

When you set the field type to TEXT, you can further customize the field with parameters.

Index: Indicates whether the field can be searched. Default is true

Search_analyzer: Word analyzer used when searching. The default word analyzer is set in Setting

Fielddata: Whether a field can be sorted or aggregated in memory. Default is false

Meta: Some metadata about fields

For fields like ID, email, and domain name, we need the keyword type. The keyword type supports sorting, aggregation, and only precise queries.

Some students may want to set the ID to a numeric type, which is fine. The numeric type and the keyword type each have their own advantages. Using the numeric type allows you to perform range searches, while using the keyword type allows you to query more efficiently. Which one to use depends on the usage scenario.

Date types have three representations in Elasticsearch

  1. A string that can be formatted as a date type, such as"2020-07-26"and"2015/01/01 12:10:30"such
  2. The millisecond level timestamp is represented by type long
  3. The second-level timestamp is represented by the integer type

Within Elasticsearch, the date type is stored as a long millisecond timestamp and the time zone is 0.

We can custom time format, is used by default strict_date_optional_time | | epoch_millis

Strict_date_optional_time_nanos strict_date_optional_time_nanos is the general date format for parsing. It must contain at least the year, and if it contains the time, it must be separated by T, such as YYYY-MM-DD ‘T’HH: MM :ss. SSSSZ or YYYY-MM-DD.

If you want to support multiple date formats at the same time, you can use the format field

PUT my_index
{
  "mappings": {
    "properties": {
      "date": {
        "type":   "date"."format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"}}}}Copy the code

The Mapping parameters

We just mentioned the format parameter that configures the date format of Mapping. Mapping also provides many other parameters.

  • analyzer
  • boost
  • coerce
  • copy_to
  • doc_values
  • dynamic
  • eager_global_ordinals
  • enabled
  • fielddata
  • fields
  • format
  • ignore_above
  • ignore_malformed
  • index_options
  • index_phrases
  • index_prefixes
  • index
  • meta
  • normalizer
  • norms
  • null_value
  • position_increment_gap
  • properties
  • search_analyzer
  • similarity
  • store
  • term_vector

Let’s introduce a few commonly used fields.

fields

The first is fields, which can make the same field serve different purposes in different ways.

For example, we can set a string field to text for full-text retrieval, and fields to keyword for sorting and aggregation.

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "city": {
        "type": "text"."fields": {
          "raw": {
            "type":  "keyword"
          }
        }
      }
    }
  }
}
Copy the code

When querying, we can use city for full text retrieval and city.raw for sorting and aggregation.

GET my-index-000001/_search
{
  "query": {
    "match": {
      "city": "york"}},"sort": {
    "city.raw": "asc" 
  },
  "aggs": {
    "Cities": {
      "terms": {
        "field": "city.raw"}}}}Copy the code

enabled

If you want to use a field only as a data store and do not need to search for it, you can disable the field. After the field is disabled, its value is not controlled by the type specified by the mapping.

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "user_id": {
        "type":  "keyword"
      },
      "last_updated": {
        "type": "date"
      },
      "session_data": { 
        "type": "object"."enabled": false}}}}Copy the code

In the example above, we disabled the session_data field. In this case, you can store both JSON and non-JSON data in the session_data field.

In addition to disabling individual fields, you can disable the entire mapping directly. Let’s recreate an index

PUT my-index-000002
{
  "mappings": {
    "enabled": false}}Copy the code

At this point, all of the document’s fields are not indexed, but stored.

Note that the enabled attribute of the mapping cannot be changed because if it is set to false, Elasticsearch will not index the field or verify the validity of the data. If it is set to True, the enabled attribute of the mapping cannot be changed. It will cause a program error.

null_value

Null is not indexable or searchable in Elasticsearch. For example, an array where all values are NULL, but by definition, there are no values.

What about businesses that need to search for null values? Elasticsearch provides the null_value parameter, which specifies a value that can be used in search instead of a null value.

Take a chestnut

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "status_code": {
        "type":       "keyword"."null_value": "NULL"}}}}Copy the code

We set null_value to “NULL” for the status_code field. Note that null_value must be of the same type as the data you are looking for. If status_code is of type long in this example, null_value cannot be set to “NULL”.

dynamic

For newly added fields:

  • If dynamic is set to true, the Mapping will be updated whenever documents with new fields are written
  • If dynamic is set to false, the Mapping will not be updated and new fields will not be indexed, but information will appear in the_source
  • Document writing fails when dynamic is set to strict

Once data has been written to an existing field, the field definition is no longer supported

Dynamic Mapping

/ / Elasticsearch will automatically identify the field type (s). / / Elasticsearch will automatically identify the field type (s). We call this Dynamic Mapping. But sometimes the calculations are not very accurate.

Elasticsearch automatically recognizes types based on JSON. The mapping between data types is as follows (table from the Elastic website)

JSON data type Elasticsearch data type
null No field is added.
true or false boolean field
floating point number float field
integer long field
object object field
array Depends on the first non-null value in the array.
string Either a date field (if the value passes date detection), a double or long field (if the value passes numeric detection) or a text field, with a keyword sub-field.

The data types of Elasticsearch field maps are specified in this document.

About the date type, is can be mapped by default, but Elasticsearch can only identify several formats date yyyy/MM/dd HH: MM: ss | | yyyy/MM/dd | | epoch_millis. If the date_detection switch is turned off, then only strings are recognized.

PUT my-index-000001
{
  "mappings": {
    "date_detection": false}}Copy the code

Of course, you can specify the date formats you want to recognize on your own, using the dynamic_date_formats parameter.

PUT my-index-000001
{
  "mappings": {
    "dynamic_date_formats": ["MM/dd/yyyy"]}}Copy the code

Elasticsearch also provides the ability to recognize string numbers as numbers, controlled by the Numeric_detection switch.

PUT my-index-000005
{
  "mappings": {
    "numeric_detection": true
  }
}

PUT my-index-000005/_doc/1
{
  "my_float":   "1.0"."my_integer": "1" 
}
Copy the code

In this example, my_float is recognized as a float and my_INTEGER as a long.

Dynamic template

The Dynamic template allows us to customize the mapping and apply it to specific indexes. The general definition of a dynamic template is this

  "dynamic_templates": [{"my_template_name": {... match conditions ..."mapping": {... }}},... ]Copy the code

My_template_name can be any string.

Match conditions include match_mapping_type, match, match_pattern, unmatch, path_match and path_unmatch.

Mapping refers to the mapping used for matched fields. Here we introduce several match conditions

match_mapping_type

Let’s start with a simple example

PUT my-index-000001
{
  "mappings": {
    "dynamic_templates": [{"integers": {
          "match_mapping_type": "long"."mapping": {
            "type": "integer"}}}, {"strings": {
          "match_mapping_type": "string"."mapping": {
            "type": "text"."fields": {
              "raw": {
                "type":  "keyword"."ignore_above": 256}}}}}]}}Copy the code

Here we have two templates, one that uses the integer type instead of the long type, and the other that maps the string type to keyword.

Match and unmatch

Match is a field that matches a pattern, and unmatch is a field that does not match.

PUT my-index-000001
{
  "mappings": {
    "dynamic_templates": [{"longs_as_strings": {
          "match_mapping_type": "string"."match":   "long_*"."unmatch": "*_text"."mapping": {
            "type": "long"}}}]}}Copy the code

In this case, we need a string beginning with long_, not a string field ending with _text.

In addition to the above three types, match_pattern is used for regular matching, and path_match and path_unmatch indicate whether the path where the field is located matches.

Dynamic Template also supports two variable substitutions, {name} and {dynamic_type}. Name is the field name and dynamic_type is the detected field type.

conclusion

For Elasticsearch mapping we’ll talk about this for a moment, but I think it’s a matter of experience to configure the mapping, and as you handle more and more cases, you’ll learn how to configure the mapping better. In addition, many fields and parameters of the mapping are not mentioned in this paper. For me, most of them are referred to on-the-spot documents, but I still recommend you to have a look at the documents, so that you can at least know the general direction of searching the documents when you encounter problems. You’ll be better than the people around you.