preface


Revolutionary comrades are bricks, moved where necessary! I want to create a share of Elasticsearch technology in the group. Now we’re starting to look like ES. Although I used ELK to do log monitoring system a long time ago, but after all, it has been a long time, or have to start from scratch. Of course, don’t stop with the work at hand. Don’t stop talking and start sharing. So what is ES?


[liuzhirichard] Record technology, development and source code notes in work and study. From time to time, share what you’ve seen and heard in your life. Welcome to guide!

What is ES

Elasticsearch is a distributed search and analysis engine.

Elasticsearch provides near real-time search and analysis for all types of data.

Common scenarios:

  1. Site search
  2. ELK log collection, storage and analysis
  3. Geographic information system analysis

Like the design used in the image below:

Features:

  1. ES is a distributed document store where the data is serialized as JSON documents.
  2. An inverted index is used to store data, which is more suitable for full-text search.
  3. Based on the Apache Lucene search engine library, you can store and retrieve documents and metadata.
  4. The JSON-style Query language, Query DSL, is supported as well as SQL-style queries.
  5. Cluster deployment is easy to scale. When a new node is added to the cluster, ES automatically migrates shards to the new node to rebalance the cluster.
    1. Shard is divided into two types: primary shard and Replica Shard.
    2. The Replica Shard stores redundant copies of the Primary Shard to prevent cluster failures and data loss and speed up search or retrieval.
    3. The number of primary shards is fixed while the number of replica shards can be changed during index creation.
    4. Shards are configured by indexes. The more shards there are, the higher the cost of index maintenance will be. The larger the fragment size is, the longer the fragment movement time will be when ES adds or decreases nodes to rebalance the cluster.
  6. Cluster recovery: Cross-cluster replication (CCR) that automatically synchronizes indexes from the primary cluster to a secondary remote cluster with hot backup.

What is an inverted index?

An inverted index can also be an inverted index.

One of the things that we’re often exposed to as developers is MySql, assuming you have a bunch of technical books that are already numbered.

  1. The beauty of Concurrent programming in Java
  2. Java Development Manual
  3. Deep distributed cache
  4. Java concurrent programming
  5. algorithm
  6. Data structures and algorithms
  • This is true if you put it in MySql
id book_name
1 The beauty of Concurrent programming in Java
2 Java Development Manual
3 Deep distributed cache
4 Java concurrent programming
5 algorithm
6 Data structures and algorithms

At this point I want to query all the books on concurrency.

select * from table_book where book_name like %concurrent%;
Copy the code

It then starts iterating through the table to find records 1 and 4.

  • If it is inverted index processing

Each name will first be segmented, for example, the beauty of Java concurrent programming will be divided into the beauty of Java concurrent programming. After the end of the word segmentation, the number of the book is associated with the word.

term ids
Java One, two, four
concurrent 1, 4,
programming 1
algorithm 5, 6,
distributed 3
. .

Searching for concurrency in an inverted index, and then retrieving it, makes it easy to locate the number of books about concurrency.

So what is Lucene?

Lucene can be understood as an open source, high-performance, scalable information search library. Developed in Java, encapsulates various inverted indexing and search apis. Equivalent to a component.

ES is developed on top of Lucene for high availability, cluster deployment, failover, backup and disaster recovery, etc.

conclusion

So that’s it. Let’s figure out what an ES is. Follow-up and then slowly look, slowly summed up.