Jieba thesaurus built-in custom dictionary

Question: In Python, when neo4j is used to build the knowledge map, one of the steps is to use the stutter dictionary to customize the dictionary for precise word segmentation to improve efficiency when using Python multi-threading. However, when loading the dictionary using jieba.load_userdict(“”), if the number of dictionaries is large, The dictionary may take too long to load, causing errors in program execution.

Solution: Set the custom dictionary to be the built-in dictionary of the Jieba thesaurus, find the dict. TXT below the Jieba thesaurus, and add the custom dictionary to the dict dictionary. The steps are as follows: 1. Find the default stutter thesaurus and load the default thesaurus model into the local cache. Then the default thesaurus is loaded from the local cache each time on jieba.cache. 2. Add your own custom dictionary to dict.txt, column 1: words; Column 2: word frequency; Column 3: part of speech. If you don’t know how to set the word frequency and part of speech, you can set it to 3 and N. 3. After saving the jieba.cache, find and delete the previous cache file on the jieba.cache path C:\Users\20655\AppData\Local\Temp\jieba.cache. Note: Each time the built-in dictionary needs to be updated, the cache file jieba.cache needs to be regenerated.

Note: After comparison, the custom dictionary is set as the default jieba dictionary, which is more efficient than importing the dictionary from outside.

Resolving the problem of loading custom thesaurus too slowly on jieba dict load_userdict – Jianshu.com

Jieba thesaurus built-in custom dictionary

Related Posts

3d path planning based on MATLAB A_star algorithm

Fundamentals of machine learning – KL divergence

Machine learning advancements – Deep thinking logistic regression