Elasticsearch dynamic mapping and static mapping, as well as four field types

In the fourth part of the ElasticSearch series, we’ll talk about dynamic mapping, static mapping, and the four different types of fields in ES.

This scene is recorded a video tutorial notes, notes concise, complete friends can refer to the video content, video download link: https://pan.baidu.com/s/1oKiV… Extract code: P3SX

1. ElasticSearch mapping

Mapping is Mapping, which defines how a document and the fields it contains should be stored and indexed. So, it’s kind of like the definition of a table in a relational database.

1.1 Mapping Classification

Dynamic mapping

As the name suggests, it’s an automatically created map. According to the stored documents, ES automatically analyzes the types of fields in the documents and the storage methods, which is dynamic mapping.

For a simple example, create a new index and view the index information:

Within the index information created, you can see that Mappings are null, and that the “mappings” contained in this Mappings is the mapping information.

Now we add a document to the index as follows:

PUT blog/_doc/1
{
  "title":"1111",
  "date":"2020-11-11"
}

After the document is successfully added, Mappings will be automatically generated:

As you can see, the date field is of type DATE, and the title field has two types, text and keyword.

By default, if fields are added to a document, then mappings will be automatically added.

Sometimes an exception can be thrown to alert the developer if they wish to add a field. This can be configured with the dynamic attribute in Mappings.

The dynamic property has three values:

True, which is the default. Automatically add new fields.
False to ignore new fields.
Strict, strict mode, throws an exception when a new field is found.

Mappings is specified when the index is created (this is essentially a static mapping) : / / mappings; / / mappings; / / mappings;

PUT blog
{
  "mappings": {
    "dynamic":"strict",
    "properties": {
      "title":{
        "type": "text"
      },
      "age":{
        "type":"long"
      }
    }
  }
}

Then add data to the index in the blog:

PUT blog/_doc/2
{
  "title":"1111",
  "date":"2020-11-11",
  "age":99
}

In the added document, there is an extra date field, which is not predefined, so this added operation returns an error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "strict_dynamic_mapping_exception",
        "reason" : "mapping set to strict, dynamic introduction of [date] within [_doc] is not allowed"
      }
    ],
    "type" : "strict_dynamic_mapping_exception",
    "reason" : "mapping set to strict, dynamic introduction of [date] within [_doc] is not allowed"
  },
  "status" : 400
}

Dynamic mapping also has the problem of date detection.

For example, create a new index and add a document with a date like this:

PUT blog/_doc/1
{
  "remark":"2020-11-11"
}

Upon success, the remark field is inferred to be of a date type.

At this point, the remark field cannot store other types.

PUT blog/_doc/1
{
  "remark":"javaboy"
}

The error is as follows:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "failed to parse field [remark] of type [date] in document with id '1'. Preview of field's value: 'javaboy'"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse field [remark] of type [date] in document with id '1'. Preview of field's value: 'javaboy'",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "failed to parse date field [javaboy] with format [strict_date_optional_time||epoch_millis]",
      "caused_by" : {
        "type" : "date_time_parse_exception",
        "reason" : "Failed to parse with all enclosed parsers"
      }
    }
  },
  "status" : 400
}

To solve this problem, you can use a static mapping, where the remark is specified as the text type when the index is defined. You can also turn off date detection.

PUT blog
{
  "mappings": {
    "date_detection": false
  }
}

The date type is then treated as text.

Static mapping

A little.

1.2 Type Inference

The inference of dynamic mapping type in ES is as follows:

Data in JSON	A data type that is automatically inferred
null	No fields were added
true/false	boolean
Floating point Numbers	float
digital	long
JSON object	object
An array of	The first non-null value in the array is determined
string	Text/keyword/date/double/long is possible

2. Elasticsearch field type

2.1 Core Types

2.1.1 String types

String: This is an expired string type. Before ES5, this was used to describe strings, but now it has been replaced by text and keyword.
Text: If a field is to be retrieved by full text, such as blog content, news content, or product description, use text. With text, the field content is parsed and the string is broken up into word items by the word splitter before the inverted index is generated. Fields of type Text are not used for sorting and are rarely used for aggregation. This string is also known as the analyzed field.
Keyword: This type is used for structured fields such as tags, email addresses, cell phone numbers, and so on. This type of field can be used for filtering, sorting, aggregating, and so on. This string is also called a not-analyzed field.

2.1.2 Number types

type	Value range
long	– 2 ^ 63-2 ^ 63-1
integer	– 2 ^ 31 to 2 ^ 31-1
short	15 to 2-2 ^ ^ 15-1
byte	– 2 2 ^ ^ 7 to 7-1
double	64-bit double precision IEEE754 floating-point type
float	32 bit double precision IEEE754 floating-point type
half_float	16 bit double precision IEEE754 floating-point type
scaled_float	Scaling type Floating point type

A narrow range of fields is preferred when the requirements are met. The shorter the field length, the more efficient the indexing and searching.
Floating-point numbers, with scaled_float preferred.

Scaled_float example:

PUT product
{
  "mappings": {
    "properties": {
      "name":{
        "type": "text"
      },
      "price":{
        "type": "scaled_float",
        "scaling_factor": 100
      }
    }
  }
}

2.1.3 Date type

Since there is no date type in JSON, date types in ES are quite varied:

2020-11-11 或者 2020-11-11 11:11:11
A number of seconds or milliseconds from the time zero of 1970.1.1 to the present.

ES internally converts the time to UTC and stores the time as a long integer of millseconds-since-the epoch.

Custom date type:

PUT product
{
  "mappings": {
    "properties": {
      "date":{
        "type": "date"
      }
    }
  }
}

This can be resolved to the time format is more.

PUT product/_doc/1
{
  "date":"2020-11-11"
}

PUT product/_doc/2
{
  "date":"2020-11-11T11:11:11Z"
}


PUT product/_doc/3
{
  "date":"1604672099958"
}

The dates in all three documents can be parsed, and the internal storage is a long integer for millisecond timings.

2.1.4 Boolean Types

“True”, “false”, “true”, or “false” in JSON will work.

The binary type is a binary type.

Binary accepts base64 encoded strings, which are not stored or searchable by default.

2.1.6 Range types

integer_range
float_range
long_range
double_range
date_range
ip_range

To do this, specify the scope type:

PUT product
{
  "mappings": {
    "properties": {
      "date":{
        "type": "date"
      },
      "price":{
        "type":"float_range"
      }
    }
  }
}

When inserting a document, you need to specify the bounds of the scope:

PUT product
{
  "mappings": {
    "properties": {
      "date":{
        "type": "date"
      },
      "price":{
        "type":"float_range"
      }
    }
  }
}

When a range is specified, GT, GTE, LT, LTE can be used.

2.2 Compound Type

2.2.1 Array types

There is no special array type in ES. By default, any field can have one or more values. Note that the elements in the array must be of the same type.

To add an array, the first element in the array determines the type of the entire array.

2.2.2 Object Type

Because JSON itself is hierarchical, the document contains internal objects. Within an inner object, you can also include an inner object.

PUT product/_doc/2
{
  "date":"2020-11-11T11:11:11Z",
  "ext_info":{
    "address":"China"
  }
}

2.2.3 Nested Types

Nested is a special case of Object.

If you use type object, suppose you have the following document:

{
  "user":[
    {
      "first":"Zhang",
      "last":"san"
    },
    {
      "first":"Li",
      "last":"si"
    }
    ]
}

Since Lucene has no concept of an internal object, ES flattens the object hierarchy, turning an object into a simple list of field names and values. The final storage form of the above document is as follows:

{
"user.first":["Zhang","Li"],
"user.last":["san","si"]
}

After flattening, the relationship between user names is gone. This will result in that if you search for Zhang Si, you will find.

The problem can then be solved by Nested types, which preserve the independence of each object in the array. The nested type indexes each object in the array as a separate hidden document, so that each nested object can be indexed independently.

{
{
"user.first":"Zhang",
"user.last":"san"
},{
"user.first":"Li",
"user.last":"si"
}
}

advantages

Documents are stored together for high reading performance.

disadvantages

More documents need to be updated when updating parent or child documents.

2.3 Geographical Types

Usage scenario:

Find a geographic location within a range
Documents are aggregated by geographic location or distance from a central point
Put the distance into the score of the document
Sort documents by distance

2.3.1 geo_point

GEO_POINT is a coordinate point, defined as follows:

PUT people
{
  "mappings": {
    "properties": {
      "location":{
        "type": "geo_point"
      }
    }
  }
}

When creating a field, specify the type of the field. When storing the field, there are four ways:

PUT people/_doc/1 {"location":{"lat": 34.27, "lon": } PUT people/_doc/2 {"location":"34.27,108.94"} PUT people/_doc/3 {"location":"uzbrgzfxuzup"} PUT People / _doc / 4 {" location ": [108.94, 34.27]}

Note that the array description is used, with latitude followed by longitude.

Address location to geo_hash: http://www.csxgame.top/#/

2.3.2 geo_shape

GeoJSON	ElasticSearch	note
Point	point	A point described by latitude and longitude
LineString	linestring	An arbitrary line consisting of more than two points
Polygon	polygon	A closed polygon
MultiPoint	multipoint	A set of discontinuous points
MultiLineString	multilinestring	Multiple unconnected lines
MultiPolygon	multipolygon	polygon
GeometryCollection	geometrycollection	A collection of geometric objects
	circle	A circular
	envelope	A rectangle determined by two points in the upper left and lower right corners

Specify the geo_shape type:

PUT people
{
  "mappings": {
    "properties": {
      "location":{
        "type": "geo_shape"
      }
    }
  }
}

When adding a document, you need to specify the specific type:

Put people/_doc/1 {"location":{"type":"point", "coordinates": [108.94,34.27]}}

If it is linestring, it looks like this:

PUT people / _doc / 2 {" location ": {" type" : "linestring," "coordinates" : [[108.94, 34.27], [100]]}}

2.4 Special types

Against 2.4.1 IP

The store IP address is of type IP:

PUT blog
{
  "mappings": {
    "properties": {
      "address":{
        "type": "ip"
      }
    }
  }
}

Add document:

PUT blog/_doc/1 {"address":"192.168.91.1"}

Search documents:

GET blogs / _search {" query ": {" term" : {" address ":" 192.168.0.0/16 "}}}

2.4.2 token_count

Used to count the number of word items in a string after word segmentation.

PUT blog
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "fields": {
          "length":{
            "type":"token_count",
            "analyzer":"standard"
          }
        }
      }
    }
  }
}

It is equivalent to adding the title. Length field to count the number of word items after word segmentation.