Personal tech blog: www.zhenganwen.top

The environment

  • 64-bit Win10, 8G memory, JDK8
  • ES install package: ElasticSearch -6.2.1
  • ES Chinese word segmentation plug-in: IK-6.4.0
  • The official documentation

Install the ES

ES Project Structure

Zip ==ES root directory ==. Each directory has the following functions:

  • Bin: stores command scripts such as starting ES
  • Config, stores the configuration file of ES, which will be read when ES starts
    • Yml, ES cluster information, external port, memory lock, data directory, cross-domain access, etc
    • Jvm. options, ES is written in Java. This file is used to set JVM-related parameters, such as maximum heap, minimum heap
    • Log4j2.properties, ES uses Log4j as its logging framework
  • Data, directory where data is stored (index data)
  • Lib, ES dependent library
  • Logs, the directory for storing logs
  • Modules, ES Function modules
  • Plugins are the plugins directory of ES. If ik Chinese word segmentation plug-ins can be placed in this directory, ES will automatically load when started

The configuration properties

All of the properties in default ElasticSearch.yml are commented out, so we need to set the necessary values for the properties and add the following at the end of the article (cluster-related configurations will be explained later) :

Name: xc_node_1 node name. An ES instance is a node (usually only one ES instance is deployed on a host). HTTP. Port: 9200 # Use this port to RESTful access ES transport.tcp.port: 9300 # ES cluster communication ports used node. The master: true # whether this node to node as the primary node. Data: # true whether the node storing data discovery. Zen. Ping. Unicast. Hosts: ["0.0.0.0:9300", "0.0.0.0:9301"] # Communication ports for other nodes in the cluster. 1 # Minimum number of primary nodes (master_eligible_nodes/2)+1, number of primary nodes/2 +1 node.ingest: When the index library has multiple shards and each shard is located on different nodes, if the node receiving the query request finds that the data to be queried is on the shard of another node, The node that acts as the coordinating node will forward the request and ultimately respond with the resulting data bootstrap.memory_lock: When ES is idle, the operating system may temporarily save the memory data occupied by ES to the disk. When ES is active, the operating system will swap the memory data occupied by ES to the disk. Set to true if ES is required to be responsive at all times, the running memory of ES is never swapped to disk to avoid latency caused by swapping. 2 # The maximum number of storage nodes on a local machine. Multiple ES instances can share a data directory. This feature is useful for testing the clustering mechanism on a single machine in the development environment, but it is recommended to set it to 1 in the production environment, and it is also recommended to deploy only one ES instance path.data on a cluster: D:\software\es\ elasticSearch-6.2.1 \data # es\ path-logs: D:\software\es\ elasticSearch -6.2.1\logs # es http.cers.enabled: Http.coron.allow-origin: /.*/ # Allows all domains to access this ESCopy the code

JVM Parameter Settings

The default heap allocation for ES startup is 1GB, which can be adjusted to 512MB if your machine has a small amount of memory in jvm.options:

-Xms512m
-Xmx512
Copy the code

Start the ES

Double-click the /bin/elasticSearch. bat script to start ES. Close the command line window to close ES.

After startup, visit http://localhost:9200. If the following response is received, ES is successfully started:

{ name: "xc_node_1", cluster_name: "xuecheng", cluster_uuid: "93K4AFOVSD-kPF2DdHlcow", version: { number: Build_hash: "7299dc3", build_date: "2018-02-07T19:34:26.990113z ", build_snapshot: false, lucene_version: "7.2.1", minimum_wire_compatibility_version: "5.6.0", minimum_index_compatibility_version: "5.0.0", tagline: "You Know, for Search" }Copy the code

Elasticsearch – Head visualization plugin

ES is a production-level search engine developed based on Lucene, which encapsulates many internal details. Through this plug-in, we can visually view its internal state through the Web.

This plug-in does not need to be placed in the /plugins directory of ES because it interacts with ES through JS.

git clone git://github.com/mobz/elasticsearch-head.git
cd elasticsearch-head
npm install
npm run start
Copy the code

Open your browser: http://localhost:9100 and connect to ES via the HTTP port provided by ES:

ES Quick Start

First of all, we need to understand several concepts: index, document and field. We can understand them by analogy with relational databases:

ES MySQL
The index database index Database database
type Table table
Document is the document Row, row
Field in the field, Column column

However, since ES6.x, the concept of Type has been gradually weakened and will be officially eliminated in ES9. So we can think of an index library as a table. An index library is used to store a series of similarly structured data. Although it is possible to create the effect of “multiple tables” in a single index library with multiple types, this is not recommended because it can degrade indexing and search performance, or you can create another index library. Just like MySQL, a library only has one table.

Noun index & verb index

A noun index refers to an index library, a file on disk.

An index library is an inverted index table, and the process of storing data in ES is the process of dividing the data into words and then adding them to the inverted index table.

Take adding “People’s Republic of China” and “China upper and lower 5000 years” to the index database as an example, the logical structure of the inverted index table is as follows:

term doc_id
The Chinese 1, 2,
The people’s 1
Republic of China 1
Up and down 2
five 2
In one thousand, 2
doc_id doc
1 People’s Republic of China
2 Five thousand years across China

This process of dividing data into words and establishing associations between words and documents is called == index == (verb)

Postman

Postman is an HTTP client tool that makes it easy to send various forms of RESTful requests.

The following uses Postman to test the RESTful apis of ES with the root URL of the request: http://localhost:9200

Index library Management

Create an index library

Create an index library named “xc_course” for storing online course data:

  • PUT /xc_course
{" Settings ":{"number_of_shards":1, number_of_replicas":0, number of replicas for each fragment}}Copy the code

Was it created successfully? Update localhost:9100: localhost:9100:

Delete index library

DELET /xc_course

Viewing Index Information

GET /xc_course

The mapping management

Create a mapping

Mapping can be analogous to the MySQL table structure definition, such as what fields there are, what types of fields.

The request format for creating a mapping is POST/Index_name /type_name/_mapping.

Isn’t it true that type has been weakened? Why specify the name of type here? Since the concept of type was officially eliminated in ES9 and there is a transition period before that, we can specify a meaningless type name such as “doc” :

POST /xc_course/doc/_mapping

{
	"properties": {"name": {"type":"text"
		},
		"description": {"type":"text"
		},
		"price": {"type":"double"}}}Copy the code

Viewing a map (analogous to viewing a table structure)

GET /xc_course/doc/_mapping

{
    "xc_course": {
        "mappings": {
            "doc": {
                "properties": {
                    "description": {
                        "type": "text"
                    },
                    "name": {
                        "type": "text"
                    },
                    "price": {
                        "type": "double"
                    }
                }
            }
        }
    }
}
Copy the code

It can also be viewed via the head plugin:

Document management

Add the document

PUT /index/type/id

If id is not specified, ES will automatically generate for us:

PUT /xc_course/doc

{
    "name" : "Bootstrap Development Framework"."description" : "Bootstrap is a foreground page development framework developed by Twitter and widely used in the industry. This development framework contains a large number of CSS, JS program code, can help developers (especially not good at page development program personnel) easily achieve a beautiful interface effect not limited by the browser.."price" : 99.9
}
Copy the code

The response is as follows:

{
    "_index": "xc_course"."_type": "doc"."_id": "Hib0QmoB7xBOMrejqjF3"."_version": 1."result": "created"."_shards": {
        "total": 1."successful": 1."failed": 0
    },
    "_seq_no": 0."_primary_term": 1
}
Copy the code

Query documents by ID

GET /index/type/id

So we get the id we just added to generate the data to query:

GET /xc_course/doc/Hib0QmoB7xBOMrejqjF3

{
    "_index": "xc_course"."_type": "doc"."_id": "Hib0QmoB7xBOMrejqjF3"."_version": 1."found": true."_source": {
        "name": "Bootstrap Development Framework"."description": "Bootstrap is a foreground page development framework developed by Twitter and widely used in the industry. This development framework contains a large number of CSS, JS program code, can help developers (especially not good at page development program personnel) easily achieve a beautiful interface effect not limited by the browser.."price": 99.9}}Copy the code

Query all documents

GET /index/type/_search

To be skipped or deflected. 2. To be skipped or deflected. 2. To be deflected or skipped. , "hits" : {0} "total" : 1, "max_score" : 1, "hits" : / / / query matching document set {" _index ":" xc_course ", "_type" : "doc", "_id" : "Hib0QmoB7xBOMrejqjF3", "_score": 1, "_source": {"name": "Bootstrap ", "description": "Bootstrap is a foreground page development framework launched by Twitter and widely used in the industry. This development framework contains a large number of CSS and JS program codes, which can help developers (especially those who are not good at page development) easily achieve a beautiful interface effect that is not limited by the browser. ", "price": 99.9}}]}}Copy the code

IK Chinese word divider

ES does not support Chinese word segmentation by default, that is to say, for added Chinese data, ES will treat each word as a term (term), which is not conducive to Chinese retrieval.

The result of testing ES for Chinese word segmentation by default:

POST /_analyze

You’ll notice that ES’s fixed apis are prefixed with _, such as _mapping, _search, and _analyze

{
	"text":"The People's Republic of China"
}
Copy the code

The segmentation results are as follows:

{
    "tokens": [{"token": "In"."start_offset": 0."end_offset": 1."type": "<IDEOGRAPHIC>"."position": 0
        },
        {
            "token": "China"."start_offset": 1."end_offset": 2."type": "<IDEOGRAPHIC>"."position": 1
        },
        {
            "token": "People"."start_offset": 2."end_offset": 3."type": "<IDEOGRAPHIC>"."position": 2
        },
        {
            "token": "The people"."start_offset": 3."end_offset": 4."type": "<IDEOGRAPHIC>"."position": 3
        },
        {
            "token": "Total"."start_offset": 4."end_offset": 5."type": "<IDEOGRAPHIC>"."position": 4
        },
        {
            "token": "And"."start_offset": 5."end_offset": 6."type": "<IDEOGRAPHIC>"."position": 5
        },
        {
            "token": "The"."start_offset": 6."end_offset": 7."type": "<IDEOGRAPHIC>"."position": 6}}]Copy the code

Download IK-6.4.0 and unzip to ES/plugins/, rename the unzipped directory ==ik==, == restart ES==, and the plugin will load automatically.

Restart ES to test the segmentation effect:

POST http://localhost:9200/_analyze

{
	"text":"The People's Republic of China"."analyzer":"ik_max_word"Ik_max_word and ik_smart} // Set the tokenizer to ik, otherwise the default tokenizer will be used.Copy the code

Ik_max_word word segmentation strategy is to divide as many terms as possible, that is, fine-grained word segmentation:

{
    "tokens": [{"token": "The People's Republic of China"."start_offset": 0."end_offset": 7."type": "CN_WORD"."position": 0
        },
        {
            "token": "The Chinese people"."start_offset": 0."end_offset": 4."type": "CN_WORD"."position": 1
        },
        {
            "token": "Chinese"."start_offset": 0."end_offset": 2."type": "CN_WORD"."position": 2
        },
        {
            "token": "Chinese"."start_offset": 1."end_offset": 3."type": "CN_WORD"."position": 3
        },
        {
            "token": "People's Republic"."start_offset": 2."end_offset": 7."type": "CN_WORD"."position": 4
        },
        {
            "token": "The people"."start_offset": 2."end_offset": 4."type": "CN_WORD"."position": 5
        },
        {
            "token": "Republic"."start_offset": 4."end_offset": 7."type": "CN_WORD"."position": 6
        },
        {
            "token": "The republic"."start_offset": 4."end_offset": 6."type": "CN_WORD"."position": 7
        },
        {
            "token": "The"."start_offset": 6."end_offset": 7."type": "CN_CHAR"."position": 8}}]Copy the code

Ik_smart is the output granularity analyzer (set “Analyzer” : “ik_smart”) :

{
    "tokens": [{"token": "The People's Republic of China"."start_offset": 0."end_offset": 7."type": "CN_WORD"."position": 0}}]Copy the code

Custom thesaurus

Ik word segmentation only provides the thesaurus of commonly used Chinese phrases, but it cannot recognize popular online phrases in real time. Therefore, sometimes we need to expand the thesaurus in order to increase the accuracy of word segmentation.

First, we tested the segmentation effect of IK on the Internet word “lanshouxianggu” :

PUT /_analyze

{
    "text":"Thin blue Mushroom"."analyzer":"ik_smart"
}
Copy the code

Participles are as follows:

{
    "tokens": [{"token": "Blue"."start_offset": 0."end_offset": 1."type": "CN_CHAR"."position": 0
        },
        {
            "token": "Thin"."start_offset": 1."end_offset": 2."type": "CN_CHAR"."position": 1
        },
        {
            "token": "Mushroom"."start_offset": 2."end_offset": 4."type": "CN_WORD"."position": 2}}]Copy the code

Dic was added to the /plugins/ik/config directory of ES and a line of “thin blue mushroom” was added. And during the ik configuration file/plugins/ik/config/IKAnalyzer CFG. Introduce the custom XML dictionary:

<! Users can configure their own extended dictionary here -->
<entry key="ext_dict">my.dic</entry>
Copy the code

== restart ES==, ik tokenizer will take our new word as the word segmentation standard:

{
    "tokens": [{"token": "Thin blue Mushroom"."start_offset": 0."end_offset": 4."type": "CN_WORD"."position": 0}}]Copy the code

mapping

The new field

PUT /xc_course/doc/_mapping

{
    "properties": {"create_time": {"type":"date"}}}Copy the code

GET /xc_course/doc/_mapping

{
    "xc_course": {
        "mappings": {
            "doc": {
                "properties": {
                    "create_time": {
                        "type": "date"
                    },
                    "description": {
                        "type": "text"
                    },
                    "name": {
                        "type": "text"
                    },
                    "price": {
                        "type": "double"
                    }
                }
            }
        }
    }
}
Copy the code

You can add fields to an existing mapping, but you cannot change the definition of an existing field!

PUT /xc_course/doc/_mapping

{
    "properties": {"price": {"type":"integer"}}}Copy the code

Error: Defined price cannot be changed from double to integer:

{
    "error": {
        "root_cause": [{"type": "illegal_argument_exception"."reason": "mapper [price] cannot be changed from type [double] to [integer]"}]."type": "illegal_argument_exception"."reason": "mapper [price] cannot be changed from type [double] to [integer]"
    },
    "status": 400
}
Copy the code

If you must change the definition of a field (including type, participle, index or not, etc.), you have to delete the index library and re-index the fields before migrating the data. Therefore, the definition of the map should be considered when creating the index library, because only fields can be extended but not redefined.

The common mapping type — type

The core data types of ES6.2 are as follows:

keyword

Fields of this type are not segmented; the field content is represented as indivisible as a phrase. Such as major trademarks and brand names can use this type. For example, if you search for Huawei in the brand field whose type is keyword, no document whose value is Huawei Honor is found.

date

Fields whose type is date can also specify an additional format, such as

{
    "properties": {"create_time": {"type":"date"."format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"}}}Copy the code

The create_time field value of the new document can be date + time or date only

Numeric types

For floating point numbers, try to use scale factors, such as a price field, the unit is yuan. We set the scale factor to 100, which will be stored in ES according to == points ==, and the mapping is as follows:

"price": {
    "type": "scaled_float",       
    "scaling_factor": 100
}
Copy the code

Since the scale factor is 100, if we input a price of 23.45, the ES will store 23.45 times 100 in ES. If the input price is 23.456, ES multiplies 23.456 by 100 and takes a number close to the original value to get 2346.

The advantage of using scale factors is that integers are much more compressible than floating-point types, saving disk space.

Whether to create an index — index

Index defaults to true, meaning that words are segmented and an inverted index (the relationship between words and documents) is established based on the terms obtained from the segmented words. For example, the URL of the course picture is only used to display pictures and does not need to be indexed by word segmentation, so it can be set to false:

PUT /xc_course/doc/_mapping

{
    "properties": {"pic": {"type":"text"
            "index":"false"}}}Copy the code

Index participle & Search participle

Index word analyzer

Ik_max_word is recommended when adding data to the index base, such as “People’s Republic of China”. If ik_smart is used, then the whole “People’s Republic of China” will be stored in the inverted index table as one term. This data is not found in a search for “Republic” (terms are an exact match).

Search word analyzer – search_Analyzer

The search segmentation is used to input the user’s retrieval word segmentation.

It is recommended to use IK_smart, for example to search for “People’s Republic of China”, the content of “Himalayan Republic” should not appear.

Whether to store extra — store

If stored outside source, each document index will store a copy of the original document in ES, stored in _source. In general, there is no need to set store to true, because there is already a copy of the original document in _source.

Comprehensive practical

Create a map of the course collection:

  1. First, delete the index that has been mapped

    DELET /xc_course

  2. The new index

    PUT /xc_course

  3. Create a mapping

    PUT /xc_course/doc/_mapping

    {
        "properties": {
            "name": {
                "type": "text"."analyzer": "ik_max_word"."search_analyzer": "ik_smart"
            },
            "description": {
                "type": "text"."analyzer": "ik_max_word"."search_analyzer": "ik_smart"
            },
            "price": {
                "type": "scaled_float"."scaling_factor": 100
            },
            "studypattern": {
                "type": "keyword"
            },
            "pic": {
                "type": "text"."index": false
            },
            "timestamp": {
                "type": "date"."format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"}}}Copy the code
  4. Add the document

    POST /xc_course/doc

    {
        "name": "Java Core Technologies"."description": "Explain core Java technologies and related principles in simple terms."."price": 99.9."studypattern": "20101"."pic": "http://xxx.xxx.xxx/skllsdfsdflsdfk.img"."timestamp": "The 2019-4-1 13:16:00"
    }
    Copy the code
  5. Retrieve the Java

    GET http://localhost:9200/xc_course/doc/_search? q=name:java

  6. Retrieval learning model

    GET http://localhost:9200/xc_course/doc/_search? q=studypattern:20101

Index management and Java clients

Starting with this section, we will implement the matching Java code for each RESTful API of ES. After all, although the front end can access ES through HTTP, the management and customization of ES still needs a back end as a hub.

RestClient, a Java client provided by ES

RestClient is officially recommended and has two types: Java Low Level REST Client and Java High Level REST Client. The Java High Level REST Client is available in ES after 6.0. The Java High Level REST Client is officially recommended for both types of clients. However, it is still in progress and some features are not yet available. Java Low Level REST Client is used. .

Dependencies are as follows:

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>6.2.1</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>6.2.1</version>
</dependency>
Copy the code

Spring integration ES

Rely on

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>6.2.1</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>6.2.1</version>
</dependency>

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
</dependency>
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
</dependency>
<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
</dependency>
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-io</artifactId>
</dependency>	
Copy the code

The configuration file

Application. Yml:

server:
  port: ${port:40100}
spring:
  application:
    name: xc-service-search
xuecheng:	# Custom attribute items
  elasticsearch:
    host-list: The ${eshostlist: 127.0.0.1:9200} Separate multiple nodes with commas
Copy the code

Start the class

@SpringBootApplication
public class SearchApplication {
    public static void main(String[] args){ SpringApplication.run(SearchApplication.class, args); }}Copy the code

ES configuration class

package com.xuecheng.search.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * ElasticsearchConfig class
 *
 * @author : zaw
 * @date: 2019/4/22 * /
@Configuration
public class ElasticsearchConfig {

    @Value("${xuecheng.elasticsearch.host-list}")
    private String hostList;

    @Bean
    public RestHighLevelClient restHighLevelClient(a) {
        return new RestHighLevelClient(RestClient.builder(getHttpHostList(hostList)));
    }

    private HttpHost[] getHttpHostList(String hostList) {
        String[] hosts = hostList.split(",");
        HttpHost[] httpHostArr = new HttpHost[hosts.length];
        for (int i = 0; i < hosts.length; i++) {
            String[] items = hosts[i].split(":");
            httpHostArr[i] = new HttpHost(items[0], Integer.parseInt(items[1]), "http");
        }
        return httpHostArr;
    }

    // rest low level client
    @Bean
    public RestClient restClient(a) {
        returnRestClient.builder(getHttpHostList(hostList)).build(); }}Copy the code

The test class

package com.xuecheng.search;

import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.client.IndicesClient;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import java.io.IOException;

/**
 * TestES class
 *
 * @author : zaw
 * @date: 2019/4/22 * /
@SpringBootTest
@RunWith(SpringRunner.class)
public class TestESRestClient {

    @Autowired
    RestHighLevelClient restHighLevelClient;    //ES connection object

    @Autowired
    RestClient restClient;
}

Copy the code

ES client API

First we create index library drop:

DELETE /xc_course

Then review the RESTful form of creating an index library:

PUT /xc_course

{
	"settings": {"index": {"number_of_shards":1."number_of_replicas":0}}}Copy the code

Create an index library

@Test
public void testCreateIndex(a) throws IOException {
    CreateIndexRequest request = new CreateIndexRequest("xc_course");
    /** * { * "settings":{ * "index":{ * "number_of_shards":1, * "number_of_replicas":0 * } * } * } */
    request.settings(Settings.builder().put("number_of_shards".1).put("number_of_replicas".0));
    IndicesClient indicesClient = restHighLevelClient.indices();    // Get the index library management object from the ES connection object
    CreateIndexResponse response = indicesClient.create(request);
    System.out.println(response.isAcknowledged());  // Check whether the operation succeeds
}
Copy the code

In contrast to RESTful form, this request is made using CreateIndexRequest, line 3 uses the constructor to specify the name of the index library to be created (URI /xc_course), and line 14 constructs the request body (you’ll notice that the Settings method is similar to the JSON request format).

You need to use IndicesClient objects to operate index libraries.

Delete index library

@Test
public void testDeleteIndex(a) throws IOException {
    DeleteIndexRequest request = new DeleteIndexRequest("xc_course");
    IndicesClient indicesClient = restHighLevelClient.indices();
    DeleteIndexResponse response = indicesClient.delete(request);
    System.out.println(response.isAcknowledged());
}
Copy the code

Specify the mapping when creating the index library

@Test
public void testCreateIndexWithMapping(a) throws IOException {
    CreateIndexRequest request = new CreateIndexRequest("xc_course");
    request.settings(Settings.builder().put("number_of_shards".1).put("number_of_replicas".0));
    request.mapping("doc"."{\n" +
                    " \"properties\": {\n" +
                    " \"name\": {\n" +
                    " \"type\": \"text\",\n" +
                    " \"analyzer\": \"ik_max_word\",\n" +
                    " \"search_analyzer\": \"ik_smart\"\n" +
                    " },\n" +
                    " \"price\": {\n" +
                    " \"type\": \"scaled_float\",\n" +
                    " \"scaling_factor\": 100\n" +
                    " },\n" +
                    " \"timestamp\": {\n" +
                    " \"type\": \"date\",\n" +
                    " \"format\": \"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd\"\n" +
                    " }\n" +
                    " }\n" +
                    "}", XContentType.JSON);
    IndicesClient indicesClient = restHighLevelClient.indices();
    CreateIndexResponse response = indicesClient.create(request);
    System.out.println(response.isAcknowledged());
}
Copy the code

Add the document

The process of adding documents is “indexing” (verb). IndexRequest objects are required for indexing operations.

public static final SimpleDateFormat FORMAT = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
@Test
public void testAddDocument(a) throws IOException {
    Map<String, Object> jsonMap = new HashMap<>();
    jsonMap.put("name"."Java Core Technologies");
    jsonMap.put("price".66.6);
    jsonMap.put("timestamp", FORMAT.format(new Date(System.currentTimeMillis())));
    IndexRequest request = new IndexRequest("xc_course"."doc");
    request.source(jsonMap);
    IndexResponse response = restHighLevelClient.index(request);
    System.out.println(response);
}
Copy the code

The response result contains the document ID that ES generated for us. Here, THE id that I tested was fHh6RWoBduPBueXKl_tz

Query documents by ID

@Test
public void testFindById(a) throws IOException {
    GetRequest request = new GetRequest("xc_course"."doc"."fHh6RWoBduPBueXKl_tz");
    GetResponse response = restHighLevelClient.get(request);
    System.out.println(response);
}
Copy the code

Update the document by ID

ES updates documents in two ways: full replacement and partial update

Full replacement: ES first queries the document by ID and deletes it and then inserts that ID as the ID of the new document.

Partial update: Only the corresponding fields are updated

Full replacement:

POST /index/type/id

Local updates:

POST /index/type/_update

The Java client provides partial updates, that is, only the submitted fields are updated with the values of the other fields unchanged

@Test
public void testUpdateDoc(a) throws IOException {
    UpdateRequest request = new UpdateRequest("xc_course"."doc"."fHh6RWoBduPBueXKl_tz");
    Map<String, Object> docMap = new HashMap<>();
    docMap.put("name"."Spring Core Technologies");
    docMap.put("price".99.8);
    docMap.put("timestamp", FORMAT.format(new Date(System.currentTimeMillis())));
    request.doc(docMap);
    UpdateResponse response = restHighLevelClient.update(request);
    System.out.println(response);
    testFindById();
}
Copy the code

Delete documents by ID

@Test
public void testDeleteDoc(a) throws IOException {
    DeleteRequest request = new DeleteRequest("xc_course"."doc"."fHh6RWoBduPBueXKl_tz");
    DeleteResponse response = restHighLevelClient.delete(request);
    System.out.println(response);
}
Copy the code

The search administration

Prepare the environment

To have data to search for, we recreate the map and add some test data

Create a mapping

DELETE /xc_course

PUT /xc_course

{
    "settings": {"number_of_shards":1."number_of_replicas":0}}Copy the code

PUT /xc_course/doc/_mapping

{
    "properties": {
        "name": {
            "type": "text"."analyzer": "ik_max_word"."search_analyzer": "ik_smart"
        },
        "description": {
            "type": "text"."analyzer": "ik_max_word"."search_analyzer": "ik_smart"
        },
        "studymodel": {"type":"keyword"// Teach mode, value is data dictionary code},"pic": {
            "type": "text"."index": false
        },
        "price": {
            "type": "float"
        },
        "timestamp": {
            "type": "date"."format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"}}}Copy the code

Adding test Data

PUT /xc_course/doc/1

{
    "name": "The Bootstrap development"."description": Bootstrap is a popular framework for developing foreground pages by Twitter that integrates various effects. This development framework contains a large number of CSS, JS program code, can help developers (especially not good at page development program personnel) easily achieve a beautiful interface effect not limited by the browser.."studymodel": "201002"."price": 38.6."pic": "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg"."timestamp": "The 2018-04-25 19:11:35"
}
Copy the code

PUT /xc_course/doc/2

{
    "name": "Java Programming Basics"."description": "The Java language is the number one programming language in the world and the most used in software development."."studymodel": "201001"."price": 68.6."pic": "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg"."timestamp": "The 2018-03-25 19:11:35"
}
Copy the code

PUT /xc_course/doc/3

{
    "name": "Spring Development Basics"."description": "Spring is very popular in the Java world and Java programmers are using it."."studymodel": "201001"."price": 88.6."pic": "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg"."timestamp": "The 2018-02-24 19:11:35"
}
Copy the code

Simple search

  • Searches all documents in the specified index library

    GET /xc_course/_search

  • Searches for all documents in the specified type

    GET /xc_course/doc/_search

DSL search

Domain Specific Language (DSL) is a JSON-based search method proposed by ES. During the search, Specific JSON format data is imported to meet different search requirements.

DSL is more powerful than URI search, so it is recommended to use DSL to complete the search in the project.

DSL searches use POST submissions, urIs ending in _search (within an index or type range), and search criteria are defined in the body of the JSON request.

Query all documents — matchAllQuery

POST /xc_course/doc/_search

{
	"query": {"match_all":{}
	},
	"_source": ["name"."studymodel"]}Copy the code

Query is used to define the search criteria, and _source is used to specify which fields to include in the returned result set. This is useful when the document itself has a large amount of data but we only want to retrieve data from a few specific fields (both to filter out unnecessary fields and to improve transmission efficiency).

Results:

  • Took, the time that the operation took, in milliseconds
  • Time_out, whether the request times out (ES unavailable or network failure times out)
  • _shard, which fragments are searched in this operation
  • Hits, results of hits
  • Hits. Total indicates the number of eligible documents
  • Hits. Hits, hit document set
  • Hits. max_score, hits.hits indicates the highest score of each document score
  • _source: Document source data
{
    "took": 57."timed_out": false."_shards": {
        "total": 1."successful": 1."skipped": 0."failed": 0
    },
    "hits": {
        "total": 3."max_score": 1."hits": [{"_index": "xc_course"."_type": "doc"."_id": "1"."_score": 1."_source": {
                    "studymodel": "201002"."name": "The Bootstrap development"}}, {"_index": "xc_course"."_type": "doc"."_id": "2"."_score": 1."_source": {
                    "studymodel": "201001"."name": "Java Programming Basics"}}, {"_index": "xc_course"."_type": "doc"."_id": "3"."_score": 1."_source": {
                    "studymodel": "201001"."name": "Spring Development Basics"}}]}}Copy the code

Java code implementation:

@Test
public void testMatchAll(a) throws IOException {
    // POST /xc_course/doc
    SearchRequest request = new SearchRequest("xc_course");    // THE DSL searches the request object
    request.types("doc");
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();    // the DSL request body constructs the object
    /** * { * "from":2,"size":1, * "query":{ * "match_all":{ * * }* }, * "_source":["name","studymodel"] * } */
    searchSourceBuilder.query(QueryBuilders.matchAllQuery());
    Parameter 1: Which fields to return Parameter 2: which fields not to return Usually specify one of the two
    searchSourceBuilder.fetchSource(new String[]{"name"."studymodel"}, null);
    // Set the request body to the request object
    request.source(searchSourceBuilder);
    // Initiate a DSL request
    SearchResponse response = restHighLevelClient.search(request);
    System.out.println(response);
}
Copy the code

DSL core API

  • New SearchRequest(index), which specifies the index library to search for
  • Searchrequest.type (type), which specifies the type to search for
  • SearchSourceBuilder, which builds the DSL request body
  • SearchSourceBuilder. Query (criteriabuilder), construct the request body"Query" : {}Part of the content
  • QueryBuilders, a static factory class that makes it easy to construct QueryBuilders, such assearchSourceBuilder.query(QueryBuilders.matchAllQuery())That’s the same thing as constructing"Query" : {" match_all ": {}}
  • Searchrequest.source (), which sets the constructed request body to the request object

Paging query

PUT http://localhost:9200/xc_course/doc/_search

{
	"from":0."size":1."query": {"match_all": {}},"_source": ["name"."studymodel"]}Copy the code

Where from means the offset of the result set, and size is the result of the size bar after the offset position.

{
    "took": 80."timed_out": false."_shards": {
        "total": 1."successful": 1."skipped": 0."failed": 0
    },
    "hits": {
        "total": 3."max_score": 1."hits": [{"_index": "xc_course"."_type": "doc"."_id": "1"."_score": 1."_source": {
                    "studymodel": "201002"."name": "The Bootstrap development"}}]}}Copy the code

Hits. Total = 3, but only the first record is returned. From = (page-1)*size = (page-1)*size

Java code implementation

@Test
public void testPaginating(a) throws IOException {
    SearchRequest request = new SearchRequest("xc_course");
    request.types("doc");

    int page = 1, size = 1;
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.from((page - 1) * size);
    searchSourceBuilder.size(size);
    searchSourceBuilder.query(QueryBuilders.matchAllQuery());
    searchSourceBuilder.fetchSource(new String[]{"name"."studymodel"}, null);

    request.source(searchSourceBuilder);
    SearchResponse response = restHighLevelClient.search(request);
    System.out.println(response);
}
Copy the code

Extract the documents in the result set

SearchResponse response = restHighLevelClient.search(request);
SearchHits hits = response.getHits();				//hits
if(hits ! =null) {
    SearchHit[] results = hits.getHits();			//hits.hits
    for (SearchHit result : results) {
        System.out.println(result.getSourceAsMap()); //hits.hits._source}}Copy the code

Term matching — termQuery

Word matching is == Exact match ==, return the set of documents associated with that word only if the word we specify exists in the inverted index table.

For example, search for documents whose course titles contain Java terms

{
	"from":0."size":1."query": {"term": {"name":"java"}},"_source": ["name"."studymodel"]}Copy the code

The results are as follows:

"Hits" : {" total ": 1," max_score ": 0.9331132," hits ": [{" _index" : "xc_course", "_type" : "doc", "_id" : "2", "_score" : 0.9331132, "_source": {" studyModel ": "201001", "studyModel ":" studyModel "}}]}Copy the code

But if you specify “term” as {“name”:” Java programming “}, the search will not be:

"hits": {
    "total": 0,
    "max_score": null,
    "hits": []
}
Copy the code

None of the terms named “Java programming” match this query because “Java Programming Basics” will be added to the inverted index as “Java”, “programming”, and “basics”.

The term query is an exact match, and the term. Name is not partitioned by the search_Analyzer, but is matched as a whole with terms in the inverted index table.

Exact match by ID — termsQuery

Query documents 1 and 3

POST http://localhost:9200/xc_course/doc/_search

{
	"query": {"ids": {"values": ["1"."3"]}}}Copy the code

Java implementation

@Test
public void testQueryByIds(a){
    SearchRequest request = new SearchRequest("xc_course");
    request.types("doc");

    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    List<String> ids = Arrays.asList(new String[]{"1"."3"});
    sourceBuilder.query(QueryBuilders.termsQuery("_id", ids));

    printResult(request, sourceBuilder);
}

private void printResult(SearchRequest request,SearchSourceBuilder sourceBuilder) {
    request.source(sourceBuilder);
    SearchResponse response = null;
    try {
        response = restHighLevelClient.search(request);
    } catch (IOException e) {
        e.printStackTrace();
    }
    SearchHits hits = response.getHits();
    if(hits ! =null) {
        SearchHit[] results = hits.getHits();
        for(SearchHit result : results) { System.out.println(result.getSourceAsMap()); }}}Copy the code

= = hole = =

An exact match by ID is also a term query, but the API called is termsQuery(“_id”, IDS), termsQuery not termQuery.

Full-text search — matchQuery

The keyword will be divided by the word analyzer specified by search_Analyzer, and then the document set will be searched in the inverted index table based on the obtained word. The document set associated with each word will be found. For example, “Bootstrap foundation” will be found to show “Java programming foundation” :

POST

{
	"query": {"match": {"name":"The bootstrap basis"}}}Copy the code

Because “Bootstrap Basics” is divided into two terms “Bootstrap” and “basics”, the term “basics” is associated with the document “Java Programming Basics”.

@Test
public void testMatchQuery(a) {

    SearchRequest request = new SearchRequest("xc_course");
    request.types("doc");

    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    sourceBuilder.query(QueryBuilders.matchQuery("name"."The bootstrap basis"));

    printResult(request, sourceBuilder);
}
Copy the code

operator

The above query is equivalent to:

{
    "query": {
        "match": {
            "name": {
                "query": "The bootstrap basis"."operator": "or"}}}}Copy the code

That is to take the union of the query results of each term after the keyword segmentation.

The value of operator can be OR or and, corresponding to the union and intersection respectively.

The following query will have only one result (only “Java Programming Basics” if the course name contains both “Java” and “basics”) :

{
    "query": {
        "match": {
            "name": {
                "query": "Java foundation"."operator": "and"}}}}Copy the code

Java code

@Test
public void testMatchQuery2(a) {
    SearchRequest request = new SearchRequest("xc_course");
    request.types("doc");

    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    sourceBuilder.query(QueryBuilders.matchQuery("name"."Java foundation").operator(Operator.AND));

    printResult(request, sourceBuilder);
}
Copy the code

minimum_should_match

Operator = or the operator = or operator = or the operator = or operator = or the operator = or operator = or the operator = or operator = or the operator = or operator = or the operator = or operator = or

Use minimum_should_match to specify the percentage of document matching words, such as the search statement:

{
    "query": {
        "match": {
            "name": {
                "query": "Spring Development Framework"."minimum_should_match":"80%"}}}}Copy the code

“Spring Development Framework” is divided into three words: Spring, development, and framework.

Minimum_should_match set “minimum_should_match”:”80%” to indicate that the matching ratio of three words in the document is 80%, i.e. 3*0.8=2.4, rounded up to 2, indicating that at least two words in the document are matched successfully.

@Test public void testMatchQuery3() { SearchRequest request = new SearchRequest("xc_course"); request.types("doc"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); Sourcebuilder.query (QueryBuilders. MatchQuery ("name", "Spring Developer guide "). MinimumShouldMatch ("70%")); //3*0.7 -> 2 printResult(request, sourceBuilder); }Copy the code

Multi – domain Search — multiMatchQuery

TermQuery and matchQuery can only match one Field at a time. In this section, we learn multiQuery, which can match multiple fields at a time.

For example, retrieving documents with “Spring” or “CSS” in the course name or course description:

{
    "query": {
        "multi_match": {
            "query": "spring css"."minimum_should_match": "50%"."fields": [
                "name"."description"]}},"_source": ["name"."description"]}Copy the code

Java:

@Test
public void testMultiMatchQuery(a) {
    SearchRequest request = new SearchRequest("xc_course");
    request.types("doc");

    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    sourceBuilder.query(QueryBuilders.
                        multiMatchQuery("spring css"."name"."description").
                        minimumShouldMatch("50%")); 

    printResult(request, sourceBuilder);
}
Copy the code

Boost the weight

Observe the document score detected above:

"Hits" : [{" _index ":" xc_course ", "_type" : "doc", "_id" : "3", "_score" : 1.3339276, "_source" : {" name ": }}, {"_index": "xc_course", "_type": "doc", "_id": "1", "_score": 0.69607234, "_source": {"name": "Bootstrap ", "description": Bootstrap is a popular framework for developing foreground pages by Twitter that integrates various effects. This development framework contains a large number of CSS, JS program code, can help developers (especially not good at page development program personnel) easily achieve a beautiful interface effect not limited by the browser. "}}]Copy the code

You’ll notice in Document 3 that spring terms have a higher percentage of the total number of terms in the document and therefore have a higher _score. We wondered if adding more CSS to the course description in document 1 would improve its _score.

So let’s update document 1:

@Test
public void testUpdateDoc(a) throws IOException {
    UpdateRequest request = new UpdateRequest("xc_course"."doc"."1");
    Map<String, Object> docMap = new HashMap<>();
    docMap.put("description".Bootstrap is a popular CSS development framework that integrates various CSS effects. This development framework contains a large number of CSS, JS program code, can help CSS developers (especially not good at CSS page development program personnel) easily achieve a beautiful interface effect not limited by the browser.);
    request.doc(docMap);
    UpdateResponse response = restHighLevelClient.update(request);
    System.out.println(response);
    testFindById();
}
Copy the code

A second query found that the score for Document 1 was indeed higher:

"Hits" : [{" _index ":" xc_course ", "_type" : "doc", "_id" : "1", "_score" : 1.575484, "_source" : {" name ": "Bootstrap development ", "description": Bootstrap is a popular CSS development framework that integrates various CSS effects. This development framework contains a large number of CSS, JS program code, can help CSS developers (especially not good at CSS page development program personnel) easily achieve a beautiful interface effect not limited by the browser. "Doc", "_id" : "3", "_score" : 1.346281, "_source" : {" name ":" spring development base ", "description" : "Spring is so popular in the Java world that Java programmers are using it."}}]Copy the code

So we have a business requirement that the presence of spring or CSS in a course must be more relevant to spring or CSS than the presence of a course description. So we want to increase the score weight of the keyword items appearing in the course, we can do this (append a ^ symbol to the name field and specify the weight, default is 1) :

{
    "query": {
        "multi_match": {
            "query": "spring css"."minimum_should_match": "50%"."fields": [
                "name^10"."description"]}},"_source": ["name"."description"]}Copy the code

Java:

@Test
public void testMultiMatchQuery2(a) {
    SearchRequest request = new SearchRequest("xc_course");
    request.types("doc");

    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("spring css"."name"."description").minimumShouldMatch("50%");
    multiMatchQueryBuilder.field("name".10);
    sourceBuilder.query(multiMatchQueryBuilder);

    printResult(request, sourceBuilder);
}
Copy the code

Boolean query — boolQuery

Boolean queries correspond to Lucene’s BooleanQuery and implement == to combine multiple queries ==.

Three parameters

  • must: Documents must match the query conditions included in must, equivalent to “AND”
  • should: Documents should match one OR more of the query criteria included in should, equivalent to “OR”
  • must_not: Document does NOT match the query condition included in must_NOT, equivalent to “NOT”

If the course name contains “Spring” == and the course name or course description is related to “development framework” :

{
    "query": {
        "bool": {"must":[
                {
                    "term": {"name":"spring"}}, {"multi_match": {"query":"Development Framework"."fields": ["name"."description"]}}]}},"_source": ["name"]}Copy the code

Java

@Test
public void testBoolQuery(a) {
    SearchRequest request = new SearchRequest("xc_course");
    request.types("doc");

    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();	//query
    BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();		//query.bool

    TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name"."spring");
    MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("Development Framework"."name"."description");

    boolQueryBuilder.must(termQueryBuilder);		//query.bool.must
    boolQueryBuilder.must(multiMatchQueryBuilder);	
    sourceBuilder.query(boolQueryBuilder);

    printResult(request, sourceBuilder);
}
Copy the code

Conditions that must be met in must(BoolqueryBuilder. must(condition)), conditions that must be excluded in must_NOT, only conditions that satisfy one of them in should.

Query courses whose course name must contain “development” but not “Java” and contain “spring” or “boostrap” :

{
    "query": {
       "bool": {"must":[
				{
					"term": {"name":"Development"}}]."must_not":[
				{
					"term": {"name":"java"}}]."should":[
				{
					"term": {"name":"bootstrap"}}, {"term": {"name":"spring"}}]}},"_source": ["name"]}Copy the code

Of course, the actual project does not set the conditions like this, but this is just for demonstration purposes, and the termQuery is used for demonstration purposes, and the fact is that you can use any of the preceding queries.

Java

@Test
public void testBoolQuery2(a) {
    SearchRequest request = new SearchRequest("xc_course");
    request.types("doc");

    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();

    boolQueryBuilder.must(QueryBuilders.termQuery("name"."Development"));
    boolQueryBuilder.mustNot(QueryBuilders.termQuery("name"."java"));
    boolQueryBuilder.should(QueryBuilders.termQuery("name"."spring"));
    boolQueryBuilder.should(QueryBuilders.termQuery("name"."bootstrap"));

    sourceBuilder.query(boolQueryBuilder);

    printResult(request, sourceBuilder);
}
Copy the code

Filter, filter

Filter is used to filter the search results. The == filter mainly determines whether the documents match, and does not calculate and judge the matching score of the documents ==. Therefore, the performance of the == filter is higher than that of the query, and it is convenient to cache ==.

Filters can only be used in Boolean queries.

Full text search “Spring Framework”, and filter out learning mode code is not “201001” and course price is not between 10~100

{
    "query": {
        "bool": {
            "must": [{"multi_match": {
                        "query": "The spring framework"."fields": [
                            "name"."description"]}}],"filter": [{"term": {
                        "studymodel": "201001"}}, {"range": {
                        "price": {
                            "gte": "10"."lte": "100"}}}]}}Copy the code

Java

@Test
public void testBoolQuery3(a) {
    SearchRequest request = new SearchRequest("xc_course");
    request.types("doc");

    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();

    boolQueryBuilder.must(QueryBuilders.multiMatchQuery("The spring framework"."name"."description"));
    boolQueryBuilder.filter(QueryBuilders.termQuery("studymodel"."201001"));
    boolQueryBuilder.filter(QueryBuilders.rangeQuery("price").gte(10).lte(100));

    sourceBuilder.query(boolQueryBuilder);

    printResult(request, sourceBuilder);
}
Copy the code

The sorting

Query courses whose prices are between 10 and 100, and rank them in ascending order, and then rank them in descending order by timestamp when the prices are the same

{
    "query": {
        "bool": {
            "filter": [{"range": {
                        "price": {
                            "gte": "10"."lte": "100"}}}]}},"sort": [{"price": "asc"
        },
        {
            "timestamp": "desc"}]."_source": [
        "name"."price"."timestamp"]}Copy the code

Java

@Test
public void testSort(a) {
    SearchRequest request = new SearchRequest("xc_course");
    request.types("doc");

    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

    BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
    boolQueryBuilder.filter(QueryBuilders.rangeQuery("price").gte(10).lte(100));

    sourceBuilder.sort("price", SortOrder.ASC);
    sourceBuilder.sort("timestamp", SortOrder.DESC);

    sourceBuilder.query(boolQueryBuilder);
    printResult(request, sourceBuilder);
}
Copy the code

The highlighted

{
    "query": {
        "bool": {
            "filter": [{"multi_match": {
                        "query": "bootstrap"."fields": [
                            "name"."description"]}}]}},"highlight": {"pre_tags": ["<tag>"]."post_tags": ["</tag>"]."fields": {"name": {},"description": {}}}}Copy the code

Results:

"hits": [ { "_index": "xc_course", "_type": "doc", "_id": "1", "_score": 0, "_source": { "name": "Bootstrap development ", "description": Bootstrap is a popular CSS development framework that integrates various CSS effects. This development framework contains a large number of CSS, JS program code, can help CSS developers (especially not good at CSS page development program personnel) easily achieve a beautiful interface effect not limited by the browser. 38.6 "PIC", "group1 M00/00/00 / wKhlQFs6RCeAY0pHAAJx5ZjNDEM428. JPG", "timestamp" : "the 2018-04-25 19:11:35"}, "highlight" : {" name ":" < tag > Bootstrap < / tag > development "], "description" : ["<tag>Bootstrap</tag> is a popular CSS development framework, which integrates various CSS page effects. "]}}]Copy the code

Hits each result in the hits result set is highlighted with the corresponding result “highlight” in addition to the source document “_source”

Java

@Test
public void testHighlight(a) throws IOException {
    SearchRequest request = new SearchRequest("xc_course");
    request.types("doc");
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

    BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
    boolQueryBuilder.filter(QueryBuilders.multiMatchQuery("bootstrap"."name"."description"));

    HighlightBuilder highlightBuilder = new HighlightBuilder();
    highlightBuilder.preTags("<tag>");
    highlightBuilder.postTags("</tag>");
    highlightBuilder.field("name").field("description");

    sourceBuilder.query(boolQueryBuilder);
    sourceBuilder.highlighter(highlightBuilder);
    request.source(sourceBuilder);

    SearchResponse response = restHighLevelClient.search(request);
    SearchHits hits = response.getHits();       //hits
    if(hits ! =null) {
        SearchHit[] results = hits.getHits();   //hits.hits
        if(results ! =null) {
            for (SearchHit result : results) {
                Map<String, Object> source = result.getSourceAsMap();   //_source
                String name = (String) source.get("name");
                Map<String, HighlightField> highlightFields = result.getHighlightFields();  //highlight
                HighlightField highlightField = highlightFields.get("name");
                if(highlightField ! =null) {
                    Text[] fragments = highlightField.getFragments();
                    StringBuilder stringBuilder = new StringBuilder();
                    for (Text text : fragments) {
                        stringBuilder.append(text.string());
                    }
                    name = stringBuilder.toString();
                }
                System.out.println(name);

                String description = (String) source.get("description");
                HighlightField highlightField2 = highlightFields.get("description");
                if(highlightField2 ! =null) {
                    Text[] fragments = highlightField2.getFragments();
                    StringBuilder stringBuilder = new StringBuilder();
                    for(Text text : fragments) { stringBuilder.append(text.string()); } description = stringBuilder.toString(); } System.out.println(description); }}}}Copy the code

Difficult to understand the API is HighlightFields and highlightField getFragments (), we need to compare the response JSO structure to understand analogy.

Highlightfields.get () to retrieve highlight.name and highlight.description. But hightField.getFragment returns a Text[] instead of a Text. We guessed that ES had broken the document into sections by sentence, and only highlighted and returned the sections where the keyword items appeared, so we searched the CSS to see if it did:

Therefore, you need to be aware that the returned Highlight may not contain all of the original field content

Cluster management

ES usually works in cluster mode, which can not only improve the search ability of ES, but also deal with the ability of big data search, but also increase the fault tolerance and high availability of the system. ES can realize the search of PB level data.

Below is a schematic diagram of the ES cluster structure:

Cluster related Concepts

node

An ES cluster consists of multiple servers, each of which is a Node Node (the service has only one ES process deployed).

shard

When we had a large number of documents, due to the limitations of memory and hard disk, as well as to improve the processing capacity, fault tolerance and high availability of ES, we divided the index into several fragments (analogous to partition in MySQL, a table is divided into multiple files), and each fragment can be placed on different servers. This enables multiple servers to provide indexing and search services.

When a search request comes in, it is queried separately from each shard, and finally the queried data is combined and returned to the user.

A copy of the

In order to improve the high availability of ES as well as the search throughput, we store one or more copies of the shard in other servers, so that even if the current server fails, the server with the copy can still provide services.

The master node

There will be one or more primary nodes in a cluster. The primary node is used for cluster management, such as adding or removing nodes. After the primary node fails, ES will select another primary node.

The node forwards

Each node knows the information of other nodes. We can initiate a request to any V, and the node receiving the request will forward the data to other nodes.

Three roles of a node

The master node

The master node is used for cluster management and index management, such as adding nodes, allocating partitions, and adding and deleting indexes.

Data nodes

Data shards are held on the data node, which is responsible for indexing and search operations.

Client node

The client node serves only as a request client and a load balancer. The client node does not store data and only forwards requests to other nodes.

configuration

In the/config/elasticsearch. Yml configuration node functions:

  • Node. master: # Whether to allow a master node
  • Node. data: # Allows data to be stored as data nodes
  • Node.ingest: # Whether it is allowed to be a coordinated node (forward requests when data is not on the current ES instance)

Four combinations:

  • Master =true, data=true: indicates both the master node and the data node
  • Master =false, data=true: data node only
  • Master =true, data=false: Only the master node does not store data
  • Master =false, data=false: it is not a master node or a data node. In this case, ingest can be set to true to indicate that it is a client.

Set up the cluster

Next we create a 2-node cluster, and index sharding we set 2 pieces, one copy for each piece.

Unzip elasticSearch-6.2.1. zip to es-1 and ES-2

Configuration file elasticSearch.yml

Node 1:

cluster.name: xuecheng
node.name: xc_node_1
network.host: 0.0. 0. 0
http.port: 9200
transport.tcp.port: 9300
node.master: true
node.data: true
discovery.zen.ping.unicast.hosts: [" 0.0.0.0:9300 ". "0.0.0.0:9301"]
discovery.zen.minimum_master_nodes: 1
node.ingest: true
bootstrap.memory_lock: false
node.max_local_storage_nodes: 2

path.data: D:\software\es\cluster\es-1\data
path.logs: D:\software\es\cluster\es-1\logs

http.cors.enabled: true
http.cors.allow-origin: /. * /
Copy the code

Node 2:

cluster.name: xuecheng
node.name: xc_node_2
network.host: 0.0. 0. 0
http.port: 9201
transport.tcp.port: 9301
node.master: true
node.data: true
discovery.zen.ping.unicast.hosts: [" 0.0.0.0:9300 ". "0.0.0.0:9301"]
discovery.zen.minimum_master_nodes: 1
node.ingest: true
bootstrap.memory_lock: false
node.max_local_storage_nodes: 2

path.data: D:\software\es\cluster\es-2\data
path.logs: D:\software\es\cluster\es-2\logs

http.cors.enabled: true
http.cors.allow-origin: /. * /
Copy the code

Test the shard

Create an index with two copies of each index:

PUT http://localhost:9200/xc_course

{
    "settings": {
        "number_of_shards": 2."number_of_replicas": 1}}Copy the code

Check index status with the head plugin:

Test primary/secondary replication

Write data

POST http://localhost:9200/xc_course/doc

{
	"name":"Java Programming Basics"
}
Copy the code

Both nodes have data:

Cluster health

You can view the Elasticsearch cluster health status by accessing GET /_cluster/health.

Use three colors to show your health: green, yellow, or red.

  • Green: All master and replica shards are working properly.
  • Yellow: All master shards are working properly, but some replica shards are not.
  • Red: The primary fragment is not running properly.