Elasticsearch

Distributed full text search engine

First, use scenarios

Information search
- E-commerce sites
- Job site
- News website
Log collection and analysis – ELK
Data analysis – product sales, visits, consumption amount

Ii. Core Concepts

Index Indicates the Index. – Database Indicates the Database
Shard Index fragment
- A Shard corresponds to a Lucene Index
- Each Shard has a translog
Type Indicates the Type (to be abolished) – Table indicates the Table
Document Document – Row Data Row
Field – Column Field
Mapping Mapping-scheme Field Constraints

Three, API

URL with? Explain can view the cause of the statement error

1. The Index Index

Create index – PUT/Index name
Check whether the index exists – HEAD/index name
View index properties
- Single – GET/index name
- Multiple -get/index name 1, index name 2, index name 3
- All – GET _all
- – GET /_cat/indices? v
Enable index -post/index name /_open
Close index – POST/index name /_close
DELETE index – DELETE/Index name 1, index name 2, index name 3
Index migration – POST _reindex
- version_type
  - [Default] internal – Migrates directly, overwrites existing documents when encountered
  - External-retain version information for migration and update version when encountering an existing document
- op_type
  - Create – An error occurs when encountering an existing document
- conflicts
  - Proceed – When you encounter an existing document, an error message is displayed indicating only the number of document conflicts
- Query – Supports data filtering, sorting, and quantity Settings

2. The Mapping Mapping

Create a mapping – PUT/Index name /_mapping

PUT/index library name /_mapping {"properties": {
        "Field name": {
            "type": "Type"."index": true."store": true."analyzer": "Word splitter"}}}Copy the code

PUT /lagou-company-index/_mapping/
{
    "properties": {
        "name": {
            "type": "text"."analyzer": "ik_max_word"
        },
        "job": {
            "type": "text"."analyzer": "ik_max_word"
        },
        "logo": {
            "type": "keyword"."index": "false"
        },
        "payment": {
       		"type": "float"}}}Copy the code

View mapping – GET/Index name /_mapping
View all mappings
- GET _mapping
- GET all/_mapping
Modify a mapping – PUT/Index name /_mapping
- You can only add mapping fields, not change them
- If you need to change the mapping, you can only delete the reconstruction mapping
Create an index and a mapping – PUT/index name

3. Document Document

Add the document
- Specify ID -post/index name /_doc/{ID}
- Automatically generates id-post/index name /_doc
To view the document
- ID search – GET/index name /_doc/{ID}
- Conditional search – GET/ index name /_search
- Return attribute filtering – GET/index name /_doc/{id}? _source= attribute 1, attribute 2
Update the document
- Global update (added after original data is deleted) -put/index name /_doc/{id} -id is added if it does not exist
- Partial update (modify single field) – POST/index name /_update/{id}
Delete the document
- Specify ID -delete/index name /_doc/{ID}
- Conditional Filter – POST/Index name /_delete_by_query
Batch search
- GET /_mget
- GET/Index name /_mget
Batch add, delete, change – POST / _bulk {” method “: {” _index” : “index name”, “_id” : “id number”}} {} “data”
- Create – Adds a document
- Index – Add document, full-text replace document – equivalent to PUT
- Update – Locally updates the document
- Delete – Deletes a document
You are advised to update 1000 to 5000 documents at a time. The document size ranges from 5 MB to 15 MB

4. Mapping attributes

Type type
- String String
  - Text – participle, not aggregable
  - Keyword – Can be aggregated without keyword
- Numberical value
  - byte
  - short
  - interger
  - long
  - double
  - float
  - half_float
  - Scaled_float – High precision, precision factor needs to be specified
- Date Date – [Suggestion] Use long to save milliseconds
- Array an Array
  - If any element in the array is matched, it is considered to be matched
  - When sorting, ascending uses the smallest element in the array, descending uses the largest element in the array
- Object
- Geo_point latitude and longitude
Index Whether to index – Whether to search – [default] true
Store or not – Whether data is stored independently, which speeds up parsing but consumes space – [default] false
Analyer participle
- Chinese
  - Ik_max_word [often used] – maximum granularity
  - Ik_smart – coarsest granularity
Dynamic Indicates the dynamic mapping mode when unfamiliar fields are encountered
- True – Automatic mapping
- False – ignore
- Strict – an error
Date_detection Whether to turn off date detection – When set to false, the string will always be string
Dynamic_date_formats sets the string conversion date rule
Dynamic_templates uses different mappings for different fields or data types
Refresh_interval Index refresh frequency – [default] 1 second
The index. The translog. Durability translog brush set way – [default] sync
Index. translog.sync_interval Translog flush interval – [default] 5 seconds

PUT/index library name {"settings": {"number_of_shards": Number of fragments,"number_of_replicas": Number of copies,"refresh_interval": "Index refresh Rate"."index.translog.durability": "async"."index.translog.sync_interval": "5s"
    },
    "mappings": {"dynamic": "Dynamic mapping mode"."date_detection": Whether to turn off date detection,"dynamic_date_formats": "MM/dd/yyyy"."properties": {"Field name": {"Mapping attribute Name":"Mapping attribute value"}}}},Copy the code

Fifth, search type

POST/index library name /_search {"query": {"Search type": {"Search criteria":"Find conditional value"}},"sort": [{"Fields to sort": {"order": "asc"}}]."highlight": {
        "pre_tags": "<font color='pink'>"."post_tags": "</font>"."fields": [{"Fields to highlight": {}}},"from": Current page number,"size": Number of items per page}Copy the code

Match_all – Finds all
Match – Sets the search conditions for word search. The relationship between terms is CHANGED from OR – to and and requires the operator attribute
Match_phrase – Will look for conditional participles, and the target document must contain all participles in the same order
Multi_match – Searches for terms in terms of or, and can specify the search field
- You can use * to describe field – *_name
- You can use ^ enhanced field weighting – subject^3
Term – Lookup regardless of word
Query_string – Specifies field OR full-text search, AND splits strings using the AND, OR, AND ~ operators
Range-range search, used to find numbers and dates
Exists – A non-null lookup
Prefix – Searches for prefix matches
Wildcard – Wildcard lookup
Regexp – Regular lookup
Fuzzy-fuzzy lookup
Bool – Compound lookup
- Must – Must contain
- Filter – Must contain, does not affect the score, will be cached in memory, repeated search speed
- Should – should include
- Must_not – Must not be included and does not affect scoring
Dis_max – Multiple search field scores, only take the highest score as the score – default to add multiple search field scores
Suggest — suggest a search
- Completion – Finds conditional prefix matches and makes suggestions
- Preserve_separators – Finds whether to reserve separators for conditions
  - Preserve_position_increments – Whether to ignore the stop word when the first word of the suggested word is the stop word
- Phrase – Will find the condition word segmentation, judge the matching degree with the original text and give suggestions
- Term – Classifies search terms and makes recommendations for each term
  - Missing – To give advice when an entry cannot be found in the dictionary
  - Always – Gives advice whether an entry is found in a dictionary or not
  - Popular – Suggestions for higher frequency of words, whether or not they are found in the dictionary
- Context – Similar to Completion, add categories for further filtering
Production Suggestions:

Completion → Zero matching → Phase → Zero matching → term

Polymerization analysis

"aggregations" : {
    "<aggregation_name>": {<! -- aggregate name -->"<aggregation_type>": {<! --> <aggregation_body> <! -- aggregator: which fields are aggregated -->} [,"meta": { [<meta_data_body>] } ]? <! --> [,"aggregations": { [<sub_aggregation>]+ } ]? <! -->} [,"<aggregation_name_2>": {... }] * <! -- aggregate name -->}Copy the code

1. Statistical method

Pointer aggregation metric
Bucket polymerization bucketing – Data is grouped before aggregation statistics are performed

2. Statistical Pointers

The maximum Max
Min min
And the sum
The mean avg
Count count
Document fields have a value count, value_count
To recalculate cardinality
Stats – Includes Max, min, sum, AVG, and count
Advanced statistics extended_STATS – includes sum of squares, variance, and standard deviation
Percentiles – Percentiles can be specified
Percentile_ranks Interval percentage statistics

Distributed cluster

1. The role

Cluster – A Cluster consisting of multiple nodes, each of which is identified by a common Cluster name
The Node Node
- Master – Whether you are eligible to run for the primary node – [default] true
- Data – Whether to save data – [default] true
Shard Shard – The data partition of an index
- The number of primary shards is immutable unless the index is rebuilt
- By default, each master shard has one replica shard, and the two shards are not on the same node

Characteristics of 2.

New nodes are automatically discovered
Node peer – Each node can receive a request and forward the request to the other node where the data is stored
When the node is down, the missing data is recovered through copy fragmentation
Search time in a hundred milliseconds

3. Building and planning

The principle of

30 GB JVM memory, the maximum size of shards is set to 30 GB, and then calculate the total number of shards based on the data volume.
The total number of slices divided by 1.5 ~ 3 is the number of nodes
The number of copies is 2 to ensure high availability
When the search performance deteriorates, the number of copies can be increased to improve the concurrent search capability

application

Search function – Tens of millions to billions of data – two to four nodes
Online processing analysis – ELK – Data volume of billions – dozens to hundreds of nodes

4. Consistency assurance

? Wait_for_active_shards = Number of Synchronization fragments &timeout= Timeout duration

8. Relevance

Application-side join Application connection – Independent between indexes – Applies to a small number of document records
Data denormalization, Nested objects Nested documents
- Through field redundancy, index performance is sacrificed for lookup performance
- Redundant fields should rarely change
- Suitable for small number of relationship processing
- This applies to scenarios where you read too much and write too little
Parent/ Child Relationships document
- Sacrifice lookup performance for index performance
- A lookup cannot return both parent and child documents
- Parent and child documents must be on the same shard
- This applies to scenarios where you write too much and read too little

9. Persistence

refresh
- Writes the memory buffer to a new segment, making the index retrievable
- [Default] Runs every 1 second
flush
- Flush all segments, clear the Translog, and create commit points
- [Default] Runs every 30 minutes
When the node crashes and restarts, the Translog log is replayed from the commit point to recover the data

Concurrency control

Built-in version number -? If_seq_no = version &if_primary_term=1
Custom version number -? Version = Version &version_type=external

11. Paging scheme

From + size – Common paging method, deep paging can cause performance problems
Scroll – to cache all qualified search results – not suitable for real-time search, suitable for background batch processing
Search after – Determine the next page based on the last data on the previous page – cannot skip pages

Xii. Performance optimization

Set the number of copies to 0 for the first time
Automatically generates a DOC ID to avoid disk read operations
Unimportant fields have no word or index
Adjust index refresh interval – default 1 second

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Elasticsearch – a distributed full text search engine

Elasticsearch

First, use scenarios

Ii. Core Concepts

Three, API

1. The Index Index

2. The Mapping Mapping

3. Document Document

4. Mapping attributes

Fifth, search type

Polymerization analysis

1. Statistical method

2. Statistical Pointers

Distributed cluster

1. The role

Characteristics of 2.

3. Building and planning

4. Consistency assurance

8. Relevance

9. Persistence

Concurrency control

11. Paging scheme

Xii. Performance optimization

Elasticsearch – a distributed full text search engine

Elasticsearch

First, use scenarios

Ii. Core Concepts

Three, API

1. The Index Index

2. The Mapping Mapping

3. Document Document

4. Mapping attributes

Fifth, search type

Polymerization analysis

1. Statistical method

2. Statistical Pointers

Distributed cluster

1. The role

Characteristics of 2.

3. Building and planning

4. Consistency assurance

8. Relevance

9. Persistence

Concurrency control

11. Paging scheme

Xii. Performance optimization

Related Posts

SpringCloud Alibaba micro-service practice five – current limit fuse

Download and install mysql(version 5.7.34)

“Squat can also enter the factory” multi-threaded series – spin lock/shared lock/exclusive lock details