Mapping in ElasticSearch

Schema Mapping in ElasticSearch is the process of defining document types and fields to store and index. Remember that mapping is a dynamic process. Each document in the index has a type, and each type has its own mapping. Mapping defines the data type of a document field by configuring it to define the relationship between the field type and the metadata associated with that type.

Mapping type

Each index has one or more mapping types, which are used to divide data into different logical groups within an index. Each mapping type consists of metfields and fields.

1. Yuan field

Meta fields are used to indicate how to process the document’s metadata. Common meta fields include _index,_type,_id, and _source. Every document has metadata associated with it, and meta fields are the built-in fields that keep the system running.

2. Fields (or properties)

The purpose of a field (or property) is to identify the type of data. Each mapping type contains a list of fields or properties associated with the type. Note that if the same fields of different mapping types in the same index have the same data type, it means that in the same index range, as long as the mapping fields are the same, then they have the same mapping type.

Field data type

Field data types in ElasticSearch include basic strings, dates, long, double, Boolean, JSONIAN objects, IP, geolags, and geolags (more on the different types later). ElasticSearch maps can be set explicitly or dynamically. ElasticSearch maps can not be defined in advance, relying on dynamic mapping, depending on new index documents, new types and field names can be automatically added. In turn, new type mappings can be added to top-level mapping types or to objects and embedded fields within mapping types.

The mapping type is unique in an index. That is, in different types of the same index, the same mapping name is the same, so they have the same type.

1. Core data types

String data type

The string data type can be divided into full text and keyword. You can set both the full text and keyword in the same field. Full text is well suited for text-based relevance searches.

Full-text string data types can be segmtioned into words, which can be converted into a list of words by using a word segmentation machine before indexing. This is how ElasticSearch searches for words in full-text. Note here that full-text string data types are not used for sorting and are rarely used for aggregation.

Keyword string data types are commonly used for filtering, sorting, and aggregation. And does not participate in participles.

digital
Date type: Note that JSON does not have a date type, so it must be a string, a long integer representing milliseconds of time, or an integer. You can use double bars for multiple date format matches.
Boolean type
Base64 encoded binary value. It is not stored by default and cannot be searched.

2. Complex data types

An array of

There is no specific array type for ElasticSearch, but by default each field can contain one or more values of the same data type.

Object type

The object type referred to here uses the natural hierarchical nature of JSON. Json documents can contain objects inside.

Nested data types

3. Geographical data types

The geographic data type here is latitude and longitude. From the latitude and longitude to expand a few aspects:

Look for data in a geographical area
Aggregate documents that are some distance from the center point
Order by distance from the center point
Integrate geographic data into document relevance scores

Geographical data types are divided into:

Geographical point
Geographic shape

3. Specialized data types

IP
Word counter: token_count

Mapping parameters

Parameter for field mapping. ElasticSearch has a number of mapping parameters whose default values are most appropriate for common scenarios, but you need to understand them if you want to become proficient in tuning. Here are some typical mapping parameters selected.

1.analyzer

Strings are analyzed into index terms when indexed and queried. The efficiency of indexing queries can be greatly improved by proper configuration of indexers.

2.boost

The fields are weighted. The default value is 1, or a multiple of the set value if a value is set. Note here that it is best not to use weights when indexing data. Because the weights don’t change unless all the documents are re-indexed.

3.coerce

Cast. If the value is false, it is discarded when a document of a non-matching type is indexed

4.facat

Facat stands for multiple fields and ElasticSearch allows you to set two types on a single field. Suppose you want to do two kinds of analysis on the same field, one for search and one for sorting. Or one is parsed by a specific language parser, while the other is parsed by whitespace only. Then you can use the multi-field feature. This feature allows users to assign both types to a field by defining only one type. As follows:

“Name” : {” type “:” string “, “property” : {” facat “: {” type” : “string”, “index” : “not_anayzed”}}}

This defines two fields called name and name.property, and ElasticSearch copies the value from the name field into the name.property field.

5.copy_to

ElasticSearch can use this field to create a custom _all field. This means that you can combine multiple fields into a single field and query the combined field as a single field.

6.doc_valus

Column storage of inverted indexes. This structure is good for aggregation, sorting, and scripting.

Dynamic index

When you insert a document directly into an index, the system automatically maps the index structure according to the document without setting the index structure. Doing so greatly simplifies the operation of the index.

Rules for dynamic mapping generation can be specified by purpose:

ElasticSearch has a default mapping named _default_, which acts as the base mapping for creating new mapping types.
Dynamic field mapping.
Dynamic template: The mapping of dynamically added fields is configured based on predefined custom rules.

By default, when a new field is found in a document, ElasticSearch automatically adds the new field to the type map based on a simple set of rules. Some of these simple rules can parse strings like dates, and some can parse numbers. The bottom line for ElasticSearch is to guess the structure and data type of the document by defining the JSON of the document. For example, strings are enclosed in quotes, Boolean values use specific characters, and numeric values are numeric. Obviously, these simple rules work and are easy to understand.

What about more flexible document type detection? For example, can a quoted numeric type be detected as a numeric type? The answer is yes. ELasticsearch has a mechanism called numeric auto-detection, which is associated with numeric_detection. This option is off by default and can be turned on by responding to a PUT request. If this option is turned on and the data is an array type, such as float, then use get to ask for _mapping_ after the index is finished, you will see that the type of the field is double. If the value of the index is not float, you will see that the type of the field is double. It’s of type long, so the type in _mapping will also be long.

How is the mapping XXX

First find the entry to the map. Since these are new mappings from rest interfaces, you can’t go wrong starting with the REST package. Since there are not many classes in the REST package, we know that the mapping related classes are in the Indices package. Since the mappings are attached to index, they are placed under the Indices package. As shown in the figure below:

Next, find the entry point and analyze how the mapping is constructed.

We first construct an initial one from the INDEX field in the URLPutMappingRequest
And then to thePutMappingRequestConstruct the parameters carried in various requests
- Type: mapping related metadata
- Source: Maps related metadata
- Update_all_types: Whether all mapping fields across multiple types are updated
- Timeout: timeout
- Master_timeout: Specifies the timeout value if the master is not found or the connection to the master is disconnected
- Expand_wildcards: Enabled extended wildcards related to index configurations
- Ignore_unavailable: Related to the index configuration, the snapshot, restore, and index_Settings operations use this setting.
- Allow_no_indices: Index configuration related,
At last,NodeClientPutMapping putMapping putMapping putMapping putMapping putMapping putMapping putMapping putMapping putMapping putMapping

The following figure shows where the mapping is actually performed: NodeClient’s execute method

The PutMappingRequest initialization parameters are shown in the figure below. You can see that the data we put is in the source:

The putMapping action will then be found from the Actions. The action is of type TransportAction. The actual execution method of the action, execute, is the method that is actually used when the transport operation is invoked resulting in the creation of a new associated task.

We can see that the method is to use the default good actionName ‘indices: admin/mapping/put’ and construct good putMappingRequest registered a task. The success and failure callback methods are then set in the ActionListener, respectively.

Because I am a single node running ElasticSearch, so in the actual class is on a mission TransportMasterNodeAction, this class is the need to perform the operations on the master node of the base class. DoStart method in which a task through TransportPutMappingAction and parameters, the request arrive the most practical method metaDataMappingService. PutMapping, as shown in the figure below.

At this point the task scheduling in front of the work is finished, after is metaDataMappingService. PutMappingExecutor work.