1: ElasticSearch Chinese Word Seger IK Analysis(STAR :2471)

IK Chinese Word Segment on Elasticsearch. While native IK Chinese word segmentation reads dictionaries from the file system, ES-IK itself can be extended to read dictionaries from different sources. Reading from the SQLite3 database is currently available. 1. Elasticsearch. yml Set the location of your sqlite3 dictionary to ik_analysis_db_path /opt/ik/dictionary.db I provide the default dictionary: HTTPS :/…

2: Open source Java Chinese word segmentation library IKAnalyzer(STAR :343)

IK Analyzer is an open source, lightweight Chinese word segmentation toolkit developed based on the Java language. Since the release of version 1.0 in December 2006, IKAnalyzer has released four large versions. At first, it is a Chinese phrase segmentation piece based on the open source project Luence, which combines dictionary word segmentation and grammar analysis algorithm. Starting with version 3.0, IK has evolved into a common Java-oriented paraphrase component, independent of the Lucene project, and providing the default for Lucene…

3: ANSJ (STAR :3019) 4: ANSJ (STAR :3019)

ANSJ Chinese word segmentation this is a Java implementation of ICTCLAS. Basically rewrote all the data structures and algorithms. The dictionary is provided using the open source version of ICTCLAS. And has carried on the partial artificial optimization memory Chinese word segmentation is about 1 million words per second (the speed has surpassed ICTCLAS) the file reads the word segmentation is about 300,000 words per second the accuracy rate can reach more than 96% is achieved at present. Chinese word segmentation. Chinese name recognition. User-defined dictionary can be applied from…

Elasticsearch (Star:188)

Elasticsearch officially only provides SmartCN for Chinese word segmentation, the effect is not very good, fortunately there are two Chinese word segmentation plugins written by MEDCL (one is IK, the other is MMSEG)

5: Java Distributed Chinese Phrase Segmentation – Word Segmentation (STAR :672)

Word segmentation is a Java implementation of distributed Chinese word segmentation, which provides a variety of lexicograph-based word segmentation algorithms, and uses the Ngram model to disambiguation. Can accurately identify English, numbers, date, time and other quantifiers, can identify people’s names, place names, names of organizations and other unregistered words

6: JCSEG (Star:400)

What is JCSEG? JCSEG is a lightweight open source Chinese word segmentation based on MMSEG algorithm. It integrates the functions of keyword extraction, key phrase extraction, key sentence extraction and automatic summarization of articles. It also provides the latest version of Lucene, Solr, Elasticsearch word segmentation interface. JCSEG comes with a jcseg.properties file…

7: Chinese word segmentation database Paoding

Chinese word segmentation library is a JAVA development, can be combined into the Lucene application, for the Internet, corporate Intranet use of Chinese search engine phrase segmentation. Paoding fills the blank of open source Chinese word segmentation components in China, and is committed to this and hopes to become the preferred open source Chinese word segmentation components of Internet websites. Paoding Chinese word segmentation pursues high efficiency and good user experience. Paoding…

8: Chinese word segmentation device MMSEG4J

1, mmseg4j Chih – Hao Tsai MMSeg algorithm (http://technology.chtsai.org/)… Chinese word segmentation, and implement Lucene Analyzer and Solr’s TokenizerFactory to facilitate in Lucene and Solr to enable…

9: ANSJ (STAR :3015)

ANSJ Chinese word segmentation this is a Java implementation of ICTCLAS. Basically rewrote all the data structures and algorithms. The dictionary is provided using the open source version of ICTCLAS. And has carried on the partial artificial optimization memory Chinese word segmentation is about 1 million words per second (the speed has surpassed ICTCLAS) the file reads the word segmentation is about 300,000 words per second the accuracy rate can reach more than 96% at present the realization….

10: Lucene Chinese Word Segmentation Database ICTCLAS4J

ICTCLAS4J Chinese word segmentation system is a Java open source word segmentation project completed by Sinboy on the basis of FreeICTCLAS developed by Zhang Huaping and Liu Qun, the teacher of Chinese Academy of Sciences. It simplifies the complexity of the original word segmentation program and aims at providing a better learning opportunity for the majority of Chinese word segmentation enthusiasts.

Reprint to: http://www.cnblogs.com/zsuxio…