This is the 24th day of my participation in the August More Text Challenge

If ❤️ my article is helpful, welcome to like, follow. This is the biggest encouragement for me to continue my technical creation. More previous articles in my personal column

Elasticsearch IK word splitter installation & word splitter process

Install Elasticsearch IK word divider

Ik word splitter plug-in installed

Install elasticSearch-Analysis-IK on Github Select Elasticsearch for ik.zip package (Elasticsearch 7.8.0, so DOWNLOAD Elasticsearch – analysis-IK-7.8.0.zip)

Unzip the analysis-IK file to install

Create a new folder in elasticSearch’s plugins folder and name it IK. After downloading, you need to unzip the IK word segmentation zip file into the newly created IK folder

Extract the analysis – ik file [root @ elk rac-node1 plugins] # PWD/usr/share/elasticsearch/plugins unzip elasticsearch – analysis – ik – 6.8.4. Zip -d ik

Delete the source zip file rm -rf ElasticSearch-analysis-IK-6.8.4.zip

Then run the ElasticSearch plug-in installation command

./bin/elasticsearch-plugin install

Elasticsearch IK word segmentation process

English analyze: word segmentation filter (filter special symbols plus quantifiers, the, etc.) – character processing – word segmentation filter (word segmentation conversion, word stem conversion) Character filtering (filter special symbols plus quantifiers, etc.) – character processing –

After installing the word divider

You need to restart Elasticsearch to reload the segmentation

GET _analyze? Pretty {"analyzer":" IK_smart ", "text":" National anthem of the People's Republic of China Volunteer March "} # National anthem of the People's Republic of China Volunteer March GET _analyze? Pretty {" Analyzer ":" IK_max_word ", "text":" The National anthem of the People's Republic of China Volunteer March "} # greedy # Analyzer construct time word # search_Analyzer search time word, None specifies whether to use the build time word # ik_smart precision rate, which is used by the query; Ik_max_word recall rate, used in data construction; # if ik_smart doesn't work, use ik_max_word. Use the standard GET _analyze? Pretty {" Analyzer ":" Standard ", "text":" Anthem of the People's Republic of China "}Copy the code

In most cases, the search fails because the thesaurus is not new enough

This is to edit Elasticsearch/config/analyze_ik

Stopword – filter word main – the word that needs to be separated out