Making address: https://github.com/medcl/elasticsearch-analysis-ik

Elasticsearch – analysis – ik download address: https://github.com/medcl/elasticsearch-analysis-ik/releases


1. Install elasticSearch-Analysis-IK

Download the version of ElasticSearch you have installed!

Create a folder ik in the plugins directory of the elasticSearch installation directory and decompress it to the folder

[esuser@localhost ik]$PWD /usr/local/elasticSearch-7.10.2 /plugins/ik [esuser@localhost ik]$ll Root root 263965 2月 3 15:07 commons-codec-1.9.jar -rw-r--r-- 1 root root 61829 2月 3 15:07 commons-logging-1.2.jar Drwxr-xr-x. 2 root root 4096 2月 3 15:07 config-rw-r --r-- 1 root root 54626 2月 3 15:07 1 root root 736658 2月 3 15:07 httpClient-4.5.2.jar -rw-r--r-- 1 root Root 326724 2月 3 15:07 httpcore-4.4.4.jar -rw-r--r--. 1 root root 1807 2月 3 15:07 plugin-descriptor. Properties -rw-r--r--. 1 root root 125 2月 3 15:07 plugin-security.policy [esuser@localhost ik]$Copy the code

Restart ES and test

A post request,http://ip:9200/_analyze

Json request body

{" analyzer ":" ik_max_word ", "text" : "996, ill ICU"}Copy the code

The sample code

POST /_analyze HTTP/1.1 Host: 172.16.59.129:9200 Content-Type: application/json Content-Length: 60 {" Analyzer ": "Ik_max_word ", "text":" work 996, sick ICU"}Copy the code

Return the data

{" tokens ": [{" token" : "work", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 0}, {" token ": "996", "start_offset" : 2, "end_offset" : 5, "type" : "LETTER", "position" : 1}, {" token ":" sick ", "start_offset" : 6, "end_offset": 8, "type": "CN_WORD", "position": 2 }, { "token": "icu", "start_offset": 8, "end_offset": 11, "type": "ENGLISH", "position": 3 } ] }Copy the code

What is the difference between IK_max_word and IK_smart?

  • Ik_max_word: the text will be split into the most fine-grained, for example, “The national anthem of the People’s Republic of China” will be split into “The People’s Republic of China, the People’s Republic of China, the People’s Republic of China, the People’s Republic of China, the people, the people, the people, the Republic, the republic, and the guo guo, the national anthem”, which will exhaust all possible combinations, suitable for Term Query;
  • Ik_smart: Splits “National anthem of the People’s Republic of China” into “National anthem of the People’s Republic of China”, which is suitable for the Phrase query.

2. Custom Chinese thesaurus

  • 1. Disable ElasticSearch

  • 2. Switch back to user root

  • In 3./ elasticsearch – 7.10.2 / plugins/ik/configIn the directory, modifyIKAnalyzer.cfg.xmlfile

Modify this item to addcustom.dic

<entry key="ext_dict">custom.dic</entry>
Copy the code

Complete XML

<? The XML version = "1.0" encoding = "utf-8"? > <! DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd" > < properties > < comment > IK Analyzer extension configuration < / comment > <! -- Users can configure their own extended dictionary --> <entry key="ext_dict">custom.dic</entry> <! -- Users can configure their own extended stopword dictionary here --> <entry key="ext_stopwords"></entry> <! -- Users can configure remote extension dictionary here --> <! -- <entry key="remote_ext_dict">words_location</entry> --> <! -- Users can configure the remote extension stop word dictionary here --> <! -- <entry key="remote_ext_stopwords">words_location</entry> --> </properties>Copy the code
  • 4. Create a new directory in the same directorycustom.dicfile

Add in the inside to be divided into the vocabulary, such as I added SAO years, two words working people

[root@localhost config]$cat custom.dic [root@localhost config]$Copy the code
  • 5. Start ES