Elasticsearch (12)

A node is an ES instance (not a server), a shard is a service (a Lucene service that can be instantiated on a single computer), and a master is a special node

Diagram the horizontal expansion process, how to exceed the expansion limits, and how to improve fault tolerance

  1. Primary & Replica Automatic load balancing, 6 shards, 3 primary, 3 replicas

    • Try to ensure that the number of shards on each node is the same and the load is balanced
  2. Fewer shards per node, more IO/CPU/Memory resources allocated to each shard, and better performance per shard (important)

  3. The capacity expansion limit is 6 shards (3 primary, 3 replica). The capacity can be expanded to a maximum of 6 machines. Each shard can occupy all the resources of a single server with the best performance

  4. When the capacity expansion limit is exceeded, the number of replicas is dynamically modified to 9 shards (3primary, 6 replica) and expanded to 9 machines. The read throughput is 3 times that of 3 machines

  5. Under 3 machines, there are 9 shards (3 primary, 6 replica), which has fewer resources but better fault tolerance. It can accommodate 2 machines at most while 6 shards can only accommodate 1 machine

  6. On the one hand, it tells you the principle of capacity expansion, how to expand capacity and how to improve the overall throughput of the system. On the other hand, we should consider the fault tolerance of the system. How to ensure that the fault tolerance is improved, so that as many servers go down as possible, to ensure that data is not lost

There is an error in the following picture: 6 shards,3 servers, what is the fault tolerance? Allows a maximum of several servers to go down to prevent data loss

a

  • Capacity Expansion Process Analysis

To understand:

  • Suppose there are two nodes (computers), there are six nodes, p0, P1,02 on node_one, R0,R1,R2 on node_two, and now a new node(node_three) is added to the cluster, so that load balancing can be performed automatically. P0, P1 on node_one, R0,R2 on node_two, P2,R1 on node_three.
  • Fault tolerance to correct

Elasticsearch

Fault tolerance mechanism of Elasticsearch: Master vote, Replica fault tolerance, data recovery

  1. Nine shards, three nodes
  2. Master (actually a node) Node downtime, automatic master election, short time red
  3. Replica fault: The new master elevates replica to Primary shard, yellow
  4. Restart the downed node, master Copy Replica finds the node, uses the original shard and synchronizes the changes made after the downed node, Green

Elasticsearch (14)

{
  "_index": "test_index"."_type": "test_type"."_id": "1"."_version": 1."found": true,
  "_source": {
    "test_content": "test test"}}Copy the code

_index metadata

  1. Represents the index in which a document is stored
  2. Similar data are placed in one index, while non-similar data are placed in different indexes: Product Index (which contains all commodities), Sales Index (which contains all commodity sales data), and Inventory Index (which contains all inventory related data). If you put product, sales, human resource (Employee) all in one big index, like company index, it doesn’t work.
  3. Index contains many similar documents: The fields of these documents are largely the same. You said you put three documents, and the fields of each document are completely different, so it’s not similar, so it doesn’t fit in an index.
  4. Index names must be lowercase, cannot start with an underscore, and cannot contain commas: product, website, blog

_type metadata

  1. Represents what category document belongs to in index.
  2. An index is usually divided into multiple types to logically classify slightly different types of data in the index: Because a batch of the same data may have many of the same fields, but there may still be some slight differences, there may be a few different fields, for example, goods, for example, may be divided into electronic goods, fresh goods, daily chemical goods, and so on.
  3. The type name can be uppercase or lowercase, but cannot start with an underscore or contain a comma

_id metadata

  1. A unique identifier that represents a document and, along with index and type, uniquely identifies and locates a document
  2. We can manually specify the ID of the document (put /index/type/ ID), or we can leave it blank and let es automatically create an ID for us