Introduction to the

Snownlp is a Python library that can easily process Chinese text content. It is inspired by TextBlob. Since most natural language processing libraries are basically for English, I have written a library that can easily process Chinese text content. Instead of NLTK, all the algorithms are self-implemented and come with some trained dictionaries. Note that this program is all Unicode encoding, so please use your own decode to Unicode.

The from snownlp import snownlp s = snownlp (u 'this thing really nice) s. ords # [u' this', u 'things', u' true ', # u 'is', u' wow '] s.t ags # [(u 'this', U 'r'), (u 'things' u' n '), # (u 'true', u 'd'), (u 'is', u' d '), # (u 'praise, U 'Vg')] s.sentiments # 0.9830157237610916 positive probability s.pinyin # [u'zhe', u'ge', u'dong', u'xi', # u'zhen', u'xin', The term "Traditional Chinese" is also very common in Taiwan. "Traditional Chinese" The term "Traditional Chinese" is also common in Taiwan. Natural language processing is an important direction in the field of computer science and artificial intelligence. It studies all kinds of theories and methods that can realize the effective communication between human and computer by natural language. Natural language processing is a science that integrates linguistics, computer science and mathematics. Therefore, the study of this field will involve natural language, that is, the language that people use every day, so it is closely related to the study of linguistics, but with important differences. Natural language processing is not the general study of natural language, but the development of the effective realization of natural language communication computer system, especially the software system. So it's part of computer science. "' s = SnowNLP (text) s.k eywords (# 3) [u 'language', u 'natural', u 'computer'] s.s ummary (# 3) [u 'natural language processing is a # combines linguistics, computer science, mathematics in the integration of science'. Abstract: Natural language processing is an important direction in the field of computer science and artificial intelligence. The purpose of this paper is to develop a computer system which can effectively realize natural language communication. [u 'that' u 'paper'], [u 'this']]) s.t f s.i df s.s im ([u]' articles') # [0.3756070762985226, 0, 0]

Features

  • Chinese Character-Based Generative Models
  • Part of speech standard (TNT 3-gram hidden horse)
  • Sentiment analysis (now the training data is mainly the evaluation of buying and selling things, so some other effects may not be very good, to be solved)
  • Text Classification (Naive Bayes)
  • Convert it to Pinyin
  • Switch from traditional to simplified
  • Extract text keywords (TextRank algorithm)
  • Extract text summaries (TextRank algorithm)
  • Tf, idf
  • Tokenization (breaking up into sentences)
  • Text Similarity (BM25)
  • Support for Python 3 (thanks to Erning)

Get It now

$ pip install snownlp

More information can be found on the project homepage: SNOWNLP


Edit: SegmentFault