This is the second day of my participation in the August More Text Challenge

What is the ElasticSearch

ElasticSearch is a distributed, free, open source search engine that works with all types of data, including text, digital, geospatial, structured and unstructured data. Elasticsearch is based on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now Elastic). Elasticsearch, known for its simple REST-style API, distributed features, speed, and extensibility, is the core component of Elastic Stack. Elastic Stack is a set of free, open source tools for data acquisition, scaling, storage, analysis, and visualization. Commonly referred to as the ELK Stack (for Elasticsearch, Logstash and Kibana), the Elastic Stack now includes a rich collection of lightweight data collection agents called Beats, You can send data to Elasticsearch.

  • Open source search engine based on Apache Lucene
  • It is written in Java and provides easy-to-use RESTFul apis
  • Easy horizontal scaling to support petabytes of structured or unstructured data processing

Elasticsearch purpose: Used for program search, web search, logging, and analysis

Quick start

Docker installation Elasticsearch

Elasticsearch download address: www.elastic.co/cn/download… . Run Elasticsearch using the Docker download here

Es # # download mirror docker pull elasticsearch: 7.4.2 # # download kibana mirror docker pull kibana: 7.4.2 mkdir -p/mydata/elasticsearch/config mkdir -p /mydata/elasticsearch/data echo "http.host: 0.0.0.0 "> > / mydata/elasticsearch/config/elasticsearch yml chmod -r 777 / mydata/elasticsearch # # start es container docker run --name elasticsearch -p 9200:9200 \ -p 9300:9300 -e "discovery.type=single-node" \ -e ES_JAVA_OPTS="-Xms128m -Xmx512m" \  -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \ -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \ -v / mydata/elasticsearch/plugins: / usr/share/elasticsearch/plugins \ - d elasticsearch: 7.4.2 # # start kibana container docker run -- the name Kibana -e ELASTICSEARCH_HOSTS= IP address of http://{HOST}:9200 -p 5601:5601 -d kibana:7.4.2Copy the code

Access es at http://ip:9200;

{
  "name" : "e88a7d84048c",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "4JdkRlV4RYm_LLoYaUf37Q",
  "version" : {
    "number" : "7.4.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "2f90bbf7b93631e52bafb59b3b049cb44ec25e96",
    "build_date" : "2019-10-28T20:40:44.881551Z",
    "build_snapshot" : false,
    "lucene_version" : "8.2.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
Copy the code

http://ip:5601 access kibana

Basic concepts of Elasticsearch

Document

Elasticsearch is document-oriented, the smallest unit of all searchable data in Elasticsearch. It is equivalent to a row of records in a relational database. Each document has its own ID, which can be specified by Elasticsearch or automatically generated by Elasticsearch.

Index

The top unit of index data management in Elasticsearch is the container for documents, equivalent to the Database in a relational database. Elasticsearch can store indexes on one machine or spread them across multiple servers. Each index has one or more shards and each shard can have multiple replicas.

Type (Type)

Prior to 6.0, there could be multiple types under an Index, which was equivalent to a table in a relational database. In 6.x, Type is deprecated. After version 7.0, only one _doc Type can be created for an Index.

Inverted index

Elasticsearch uses a structure called an inverted index, which is suitable for fast full-text searches. An inverted index consists of a list of all non-repeating words in a document, and for each word there is a list of documents containing it.

Relational database Elasticsearch
Table Index (before 7.0, Type)
Row Document
Column Field
Schema Mapping
SQL DSL

Elasticsearch CRUD

Create an empty index

PUT /techbook {" Settings ": {"index": {"number_of_shards": "2", "number_of_replicas": "0" // Number of copies}}} {" indispensable ": true," shards_indispensable ": true, "index" : "techbook"}Copy the code

Insert data

POST /{index}/{type}/{id}

When no Id is specified. Create (“created”); create (“created”); create (“created”);

When Id is specified, if Id does not exist in ES, the document is added with the specified Id.

If the Id exists, modify the specified document

In the returned result, version+1 and result is updated

PUT way

POST is new. If you do not specify an ID, the system automatically generates the ID. Specifying Id modifies the data for the specified Id and adds the version number _version.

PUT can be added or modified. When the Id does not exist, the document is created, and when the Id does exist, it is updated.

PUT differs from POST in that an Id must be specified for PUT. If the Id is not specified, an error is reported

{
  "error": "Incorrect HTTP method for uri [/book/_doc/?pretty] and method [PUT], allowed: [POST]",
  "status": 405
}
Copy the code

Query data

GET /{index}/{type}/{id}

GET/techbook _doc / 1 return information below {" _index ":" techbook ", "_type" : "_doc", "_id" : "1", "_version" : 2, "_seq_no" : 1, "_primary_term" : 1, "found" : true, "_source" : {"title" : "Mysql Crash Course", "price" : "59.00", "author" : "Ben Forta", "tag" : "Database; Mysql; SQL" } }Copy the code

Delete the data

DELETE /{index}/{type}/{id}

DELETE/techbook / _doc / 1 return results: {" _index ":" techbook ", "_type" : "_doc", "_id" : "1", "_version" : 3, "result" : "deleted", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 4, "_primary_term" : 1 }Copy the code

Bulk API

Batch operations can reduce the number of network requests.

  1. Specify the index
POST/techbook / _doc / _bulk {" index ": {" _id" : 1001}} {" title ":" deep understanding of computer system ", "price" : "99.00", "author" : "Randal E.B ryant", "tag" : "system"} {" index ": {" _id" : 1002}} {" title ":" algorithm) ", "price" : "99.00", "author" : "Robert Sedgewick" and "tag", "algorithm"} {" index ": {" _id" : 1003}} {" title ":" high performance MySQL ", "price" : "99.00", "author" : "Schwarz ", "tag":"MySQL"}Copy the code
  1. Not specifying an index
POST /_bulk
{ action: { metadata }}
{ request body }
{ action: { metadata }}
{ request body }
Copy the code
POST / _bulk {" create ": {" _index" : "the users", "_type" : "_doc", "_id" : 1}} {" name ":" zhang sanfeng ", "age" : 99, "gender" : "Male"} {" create ": {" _index" : "the users", "_type" : "_doc", "_id" : 2}} {" name ":" zhang mowgli ", "age" : 20, "gender" : "male"}Copy the code

Search API

Elasticsearch supports two methods of retrieval: one is to send a request with parameters using the Rest Request URL. The other option is to put the parameters in the Request Body.

To prepare data

DELETE /techbook POST /techbook/_doc/_bulk {"index":{"_id":1001}} {"title":" Computer system ","price": 99.00,"author": "Randal E.B ryant", "tag" : "system"} {" index ": {" _id" : 1002}} {" title ":" algorithm ", "price" : 99.00, "author" : "Robert Sedgewick" and "tag", "algorithm"} {" index ": {" _id" : 1003}} {" title ":" high performance MySQL ", "price" : 99.00, "author" : "Mr. Schwartz," "tag" : "MySQL"} {" index ": {" _id" : 1004}} {" title ":" MySQL will know will be ", "price" : 39.00, "author" : "Ben Forta", "tag" : "MySQL"} {" index ": {" _id" : 1005}} {" title ":" deep understanding of the Java virtual machine ", "price" : 79.00, "author" : "zhi-ming zhou", "tag" : "Java"}Copy the code

URI Search

GET /{index}/_search? q={search}&df={field}&sort={key}}:asc&from=0&size=10

Parameters that

  • Q: specifies the query statement
  • Df: indicates the field to be queried. If df is not specified, all fields are queried
  • Sort: sort field
  • From, size: paging data

For example, query books with Java names

GET /techbook/_search? q=Java&df=titleCopy the code

Query all books, sort by ascending price

GET /techbook/_search? q=*&sort=price:ascCopy the code

Query DSL

Elasticsearch provides a JSON-style DSL (Domain-Specific Language) that supports queries.

The term query

Term is primarily used for exact matches, such as numbers, dates, Boolean values, or not_analyzed strings (unanalyzed text data types)

GET / # # grammar techbook / _search {" query ": {" term" : {" FIELD ": {" value" : "value"}}}}Copy the code

Range queries

Finds data in the specified range. Range operators include: gt: greater than; Gte is greater than or equal to; Lt: less than; Lte: less than or equal to

Match the query

The match query is a standard query that is almost always used whether you need a full-text query or an exact query.

If you use match to query a full-text field, it will use the parser to parse the match query character before actually querying it:If a particular value is specified with match, it will search for the given value for you when it encounters numbers, dates, Boolean values, or not_analyzed strings:

Boolean query

Boolean logic that can be used to construct multiple conditional query results, containing the following operators:

  • Must: A complete match of multiple query conditions, equivalent to and.
  • Must_not: Reverse match of multiple query conditions, equivalent to NOT.
  • Should: at least one query condition matches, which is equivalent to or.

For example: look for books with MySQL tags that can’t cost more than 59.

More Query DSL content please refer to www.elastic.co/guide/en/el…