“This is the fifth day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

The basic concept

Nodes, clusters, shards, and replicas

Node (node)

A node is an instance of Elasticsearch.

After you start Elasticsearch on the server, you have a node. This is another node if Elasticsearch is started on another server. You can even have multiple nodes on the same server by starting multiple Elasticsearch processes.

Cluster

A collection of multiple Elasticsearch nodes working together is called a cluster.

On a multi-node cluster, the same data can be propagated across multiple servers. This helps performance. This also helps stability, if each shard has at least one replica shard, then Elasticsearch can still serve and return all data if any node goes down.

But it has its drawbacks: you have to make sure that the nodes can communicate quickly enough and that there is no split brain effect (two parts of the cluster can’t communicate with each other and each thinks the other is down).

Shard

Indexes may store large amounts of data that may exceed the hardware limits of a single node. For example, a single index of a billion documents that takes up 1TB of disk space might not fit on a single node’s disk, or might be too slow to independently satisfy search requests from a single node.

To solve this problem, Elasticsearch provides the ability to subdivide an index into multiple shards. When creating an index, you only need to define the number of shards you need. Each shard is itself a fully functional and independent “index” that can be hosted on any node in the cluster.

Sharding is important for two main reasons:

  • It allows you to split/scale the internal volume horizontally
  • It allows you to distribute and parallelize operations across shards (possibly on multiple nodes), thereby improving performance/throughput

The mechanism for how shards are distributed and their documents aggregated back into the search request is completely managed by Elasticsearch and is transparent to you as the user.

This is useful in a network/cloud environment where failure can occur at any time, and it is highly recommended that you use a failover mechanism in case fragments/nodes are taken offline or disappear for some reason. To do this, Elasticsearch allows you to make one or more copies of an index shard into a so-called replica shard (copy for short).

Replica

Sharding allows users to push data to the Elasticsearch cluster that exceeds the capacity of a single machine. Copy solves the problem that a single machine cannot handle all requests when the access pressure is too high.

A shard can be either a master shard or a replica shard, where a replica shard is a complete copy of the master shard. Replica shards are used for searching, or become new master shards if the original master shard is lost.

Note: You can change the number of replica shards per shard at any time, because replica shards can always be created and removed. This does not apply to the number of primary shards that the index divides into, which must be determined before the index can be created. Too few shards limit scalability, but too many shards affect performance. The default setting of 5 copies is a good start.

Documents, types, indexes, and mappings

Document

Elasticsearch is document-oriented, which means that the smallest unit of index and search data is a document.

Documents have several important properties in Elasticsearch.

  • It’s self-contained. A document contains both fields and their values.
  • It can be hierarchical. The document also contains new documents, and fields can also contain other fields and values. For example, the “Location” field can contain both “City” and “Street”.
  • It has a flexible structure. Documents do not depend on predefined schemas. Not all documents need to have the same fields; they are not limited to the same schema.

Type (type)

A type is a logical container for a document, just as a table is a container for rows. It is best to put documents with different structures in different types. For example, you can use one type to define groups at a party and another type to define activities that people participate in.

Index

Indexes are containers for mapping types. An Elasticsearch index is a large collection of documents that stand alone. Each index is stored in the same group file on disk, and the index stores fields of all mapping types, as well as some Settings.

Mapping

All documents are analyzed before being indexed, and the user can set parameters that determine how to split the input text into terms, which terms should be filtered out, or which additional processing should be invoked (such as removing HTML tags). This is the role of the map: to store all the information needed for the analysis chain.

Elasticsearch is a full text search library based on Lucene, which stores data.

Comparison:

  • Indices -> Database Database
  • Type (type) -> Table Indicates the data Table
  • Document -> Row line
  • Field -> Columns
  • Mappings -> Constraints (type, length) for each column

Details:

Index Library Indices is the plural of index, representing a number of indexes
concept instructions
Type (type) Type is similar to the table concept in mysql. An index library can have different types of indexes (currently 6.x and later versions can only have one type), similar to the table concept in database. The database table has the table structure, which is the constraint information of each field in the table. The type of index library that corresponds to the table structure is called mapping, which defines the constraints of each field.
Document Store raw data into the index library. Each piece of product information, for example, is a document
Field (fifield) Properties in documents
Mappings (Mappings) Data type, attributes, index, storage, etc

Create an index library

grammar

Elasticsearch uses a Rest style API, so its API is an HTTP request, and you can make an HTTP request using any tool

Request format for index creation:

  • Request mode: PUT
  • Request path: / index library name
  • Request parameters: JSON format

{
    "settings": {
        "Attribute name": "Attribute value"}}Copy the code

Settings: index library Settings, where you can define the properties of the index library.

Created using Kibana

Kibana console, which can simplify HTTP requests, example:

This removes the elasticSearch server address

And there are grammar tips, very comfortable.

View the index library

grammar

The Get request can help us to view the index information in the format:

GET/Index library nameCopy the code

Delete index library

DELETE an index using a DELETE request

grammar

DELETE/index library nameCopy the code

The sample

Check lagou again:

Of course, we can also use the HEAD request to see if the index exists: