Spring Boot uses Docker to integrate ElasticSearch IK segmentation and pinyin search

Install ElasticSearch (ik) with Docker. Install ElasticSearch (IK) with Docker.

The preparatory work

Install ik word dividers

We installed the IK splitter directly using elasticSearch-plugin

Docker exec -it elasticsearch /bin/bash // install elasticsearch /bin/ plugin https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.8.0/elasticsearch-analysis-ik-7.8.0.zipCopy the code

The installation process is as follows:

Install the Pinyin plug-in

The pinyin plugin is installed using elasticSearch-plugin as well as ik splitter

/bin/elasticSearch -plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v7.8.0/elasticsearch-analysis-pinyin-7.8.0.zipCopy the code

Spring Boot for word segmentation and pinyin query

ElasticSearch is now available in Spring Boot with Docker, so we’re going to update it to support pinyin and word segmentation.

Create the setting and Mapping files

The files are stored in the resource directory, so I created a new elasticSearch directory to make it easier to distinguish between them.

elasticsearch_mapping.json

{
  "block": {
    "properties": {
      "userName": {
        "type": "text"."analyzer": "pinyin_analyzer"."search_analyzer": "pinyin_analyzer"."fields": {
          "pinyin": {
            "type": "string"."ignore_above": 256}}},"sex": {
        "type": "keyword"."fields": {
          "keyword": {
            "type": "keyword"."ignore_above": 256}}},"age": {
        "type": "keyword"."fields": {
          "keyword": {
            "type": "keyword"."ignore_above": 256
          }
        }
      }
    }
  }
}
Copy the code

Elasticsearch_setting. json File content

{
  "index": {
    "analysis": {
      "analyzer": {
        "pinyin_analyzer": {
          "tokenizer": "my_pinyin"}},"tokenizer": {
        "my_pinyin": {
          "type": "pinyin"."keep_first_letter": true."keep_separate_first_letter": false."keep_full_pinyin": true."keep_original": true."limit_first_letter_length": 16."lowercase": true."remove_duplicated_term": true
        }
      }
    }
  }
}
Copy the code

Configure setting and Mapping to UserEntity

If the index has been created before, you need to delete, or create a new all to create a word segmentation and pinyin related index, support for word segmentation and pinyin search

import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;
import org.springframework.data.elasticsearch.annotations.Mapping;
import org.springframework.data.elasticsearch.annotations.Setting;

import lombok.AllArgsConstructor;
import lombok.Builder;
import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@NoArgsConstructor
@AllArgsConstructor
@Builder
@Setting(settingPath = "classpath:elasticsearch/elasticsearch_setting.json")
@Mapping(mappingPath = "classpath:elasticsearch/elasticsearch_mapping.json")
@Document(indexName = "user")
public class UserEntity {

    @Id
    private Long userId;

    @Field(type = FieldType.Text, analyzer = "pinyin", searchAnalyzer = "pinyin")
    private String userName;

    @Field(type = FieldType.Keyword)
    private Integer age;

    @Field(type = FieldType.Keyword)
    private Integer sex;

}

Copy the code

Writing test classes

It is still written on the basis of the previous test class. First, the save method is executed to insert some data in batches to facilitate the test, and then the searchByUserName is executed to view the results.

import com.example.elasticsearch.entity.UserEntity;
import com.example.elasticsearch.service.ElasticSearchService;
import com.example.elasticsearch.util.StringUtils;

import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ThreadLocalRandom;

import javax.annotation.Resource;

@SpringBootTest
class ElasticsearchApplicationTests {

    @Resource
    private ElasticSearchService elasticSearchService;

	/ * * *@Author David
     * @DescriptionBatch insert test data **/
    @Test
    void save(a) {
        List<String> randomName = StringUtils.getRandomName(100);
        List<UserEntity> list = new ArrayList<>(100);
        for (int i = 0; i < randomName.size(); i++) {
            UserEntity userEntity = UserEntity.builder()
                    .userId(i + 1L).userName(randomName.get(i)).age(ThreadLocalRandom.current().nextInt(50))
                    .sex(ThreadLocalRandom.current().nextInt(2))
                    .build();
            list.add(userEntity);
        }
        elasticSearchService.saveUser(list);
    }
	
	/ * * *@Author David
     * @DescriptionSearch **/ by ID
    @Test
    void findById(a) {
        UserEntity byId = elasticSearchService.findById(1L);
        System.out.println(byId);
    }

	/ * * *@Author David
     * @DescriptionSearch * * /
    @Test
    void searchByUserName(a) {
        List<UserEntity> da = elasticSearchService.searchByUserName("wpt");
        System.out.println(da);
        List<UserEntity> da1 = elasticSearchService.searchByUserName("Smell flat"); System.out.println(da1); }}// searchByUserName output
[UserEntity(userId=7, userName= smear, age=44, sex=1)]
[UserEntity(userId=7, userName= smear, age=44, sex=1)]
Copy the code

conclusion

ElasticSearch is widely used and supports many scenarios, such as traditional search and highlighting, in addition to word segmentation and pinyin.

Spring Boot uses Docker to integrate ElasticSearch IK segmentation and pinyin search

The preparatory work

Install ik word dividers

Install the Pinyin plug-in

Spring Boot for word segmentation and pinyin query

Create the setting and Mapping files

Configure setting and Mapping to UserEntity

Writing test classes

conclusion

Related Posts

SpringBoot2.1 personal application development framework – integration with Druid + MybatisPlus

HTTP methods and features that SHOULD be understood by API design

Diffie-hellman key exchange