The process of studying human language is called NLP. Those who study language deeply are called linguists, and the proper term “computational linguists” applies to those who apply computation to the study of language processing. Essentially, a computational linguist is a computer scientist with a deep knowledge of language. Computational linguists can use their computational skills to model different aspects of language. Computational linguists deal with theoretical aspects of language; NLP is simply an application of computational linguistics.

NLP is more about using computers, dealing with the nuances of different languages, and building real-world applications using NLP technology. In practice, NLP is similar to teaching a child language. Some of the most common tasks (such as understanding words and sentences and forming grammatically and structurally correct sentences) come naturally to humans. In the field of NLP, These tasks were translated into tokenization, chunking, part of speech tagging, parsing, and machine translation Translation, speech recognition, and many of these tasks remain among the toughest challenges computers face.

 

Why study NLP

This section begins with Gartner’s technology maturity curve, from which you can clearly see that NLP is at the top of the technology maturity curve. Currently, NLP is one of the rarest skills required by the industry. After the arrival of big data, the main challenge NLP faces is that it needs a large number of technical personnel who are not only proficient in structured data but also skilled in dealing with semi-structured or unstructured data. We’re generating petabytes of web blogs, tweets, Facebook feeds, chats, emails and comments. Some companies are collecting all these different kinds of data to better target customers and gain meaningful insights from them. To deal with these unstructured data sources, we need technical people who know NLP.

We live in the information age; We can’t even imagine life without Google. We use Siri for most basic speech functions. We use spam filters to filter out spam. In Word documents, we need a spell checker. There are many real-world examples of NLP applications all around us.

 

(Image from Gartner)

Here are some examples of amazing NLP applications built on top of NLP that you can use without realizing it.

  • Spelling correction (Microsoft Word/ any other editor)
  • Search engines (Google, bing, yahoo and WolframAlpha)
  • Voice engines (Siri and Google Voice)
  • Spam sorting (all email services)
  • News feed (Google, Yahoo, etc.)
  • Machine Translation (Google Translate, etc.)
  • IBM’s Watson

Building these applications requires a very specific skill set, and you need to know the language well and have the tools to deal with it effectively. So what makes NLP one of the most advantageous areas is not advertising, but the fact that applications can be created using NLP makes NLP one of the most unique skills necessary.

To implement some of the above applications, as well as other basic NLP preprocessing, there are many open source tools available. Some of these tools are developed by organizations to build their own NLP applications, and some are open source. Here is a list of available NLP tools.

  • GATE
  • Mallet
  • Open NLP
  • UIMA
  • Stanford Toolkit
  • Genism
  • Natural Language Toolkit (NLTK)

 

Natural Language processing book list

 

1. Python natural language processing

[US] Steven, BirdEwan, KleinEdward, Loper, Translated by Chen Tao, Zhang Xu, Cui Yang, Liu Haiping

 

This is a book about natural language processing. The so-called “natural languages” are the languages that people use in daily communication, such as English, Hindi and Portuguese. In contrast to artificial languages such as programming languages and mathematical symbols, natural languages evolve from generation to generation and are difficult to define with explicit rules. In a broad sense, Natural Language Processing (NLP) encompasses all the operations that computers perform on Natural Language, from the simplest to compare different writing styles by counting the frequency of words, to the most complex to fully “understand” what a person is saying, Or at least to the point of responding effectively to human speech.

This book provides an introductory guide to the field of natural language processing. It can be used for self-study, as a textbook for courses in natural language processing or computational linguistics, or as a supplement to courses in artificial intelligence, text mining, or corpus linguistics. This practical book includes hundreds of examples and graded exercises.

The book is based on the Python programming Language and an open source library called the Natural Language Toolkit (NLTK). NLTK contains a wealth of software, data, and documentation, all of which can be downloaded for free from http://www.nltk.org/. Distributions of NLTK support Windows, Macintosh, and UNIX platforms. We strongly encourage you to download Python and NLTk and try out the examples and exercises in the book with us.

 

Natural language processing (NPL) uses Python to understand, analyze, and generate text

Hobson Lane, Cole Howard, Hannes Max Hapke, Translated by Shi Liang, Lu Xiao, Tang Kexin, Wang Bin

 

This book is an introduction to natural language processing (NLP) and deep learning. NLP has become the core application field of deep learning, and deep learning is a necessary tool in the research and application of NLP. The book is divided into three parts: The first part introduces the basic of NLP, including word segmentation, TF-IDF vectorization and transformation from word frequency vector to semantic vector; The second part describes deep learning, including neural network, word vector, convolutional neural network (CNN), recurrent neural network (RNN), long and short-term memory (LSTM) network, sequence to sequence modeling, attention mechanism and other basic deep learning models and methods. The third part introduces the practical content, including information extraction, question answering system, human-machine dialogue and other real world system model building, performance challenges and solutions.

This book is aimed at middle – to advanced Python developers, combining basic theory with practical programming, and is a practical reference book for modern practitioners in the NLP field.

 

3. Natural Language Processing and Computational linguistics

By Bhargav Srinivasa-Desikan. Translated by He Wei

 

This book describes how to apply natural language processing and computational linguistics algorithms to reason with existing data and produce some interesting analysis results. These algorithms are based on current mainstream statistical machine learning and artificial intelligence techniques, and implementation tools are readily available, such as those of the Python community such as Gensim and spaCy.

The book starts with data cleansing, learning how to perform computational linguistics algorithms, and then explores the more advanced topics of NLP and deep learning using real language and textual data, using Python. We’ll also learn to use open source tools to tag, parse, and model text. Readers will learn hands-on knowledge of excellent frameworks, how to choose a gensim-like tool for topic models, and how to do deep learning through Keras.

This book covers theoretical knowledge and examples to help readers apply natural language processing and computational linguistics algorithms to their own scenarios. We’ll discover a rich ecosystem of Python tools that can be used to execute NLP, leading readers into the exciting world of modern text analysis.

 

4. Python and NLTK natural language processing

[India] Nitin Hardeniya, Jacob Perkins, Deepti Chopra, Nishi Shijoshi et al., Translated by Lin

 

Module 1 discusses all the pre-processing steps required in text mining /NLP tasks. This module discusses in detail tokenization, stem extraction, stop word removal, and other text cleansing processes and how they can be easily implemented in NLTK.

Module 2 explains how to use a corpus reader and create a custom corpus. It also describes how to use some of the corpora that come with NLTK. It covers the chunking process (also known as partial analysis), which identifies phrases and named entities in sentences. It also explains how to train your own custom blocker and create specific named entity recognizers.

Module 3 discusses how to calculate word frequencies and implement various language modeling techniques. It also discusses the concept and application of shallow semantic analysis (NER) and TSD using Wordnet.

Module 3 helps you understand and apply the concepts of information retrieval and text summarization.

 

5. Proficient in Python natural language processing

Deepti, Chopra, Nisheeth, Joshi, Iti… Translated by Wang Wei

 

This book details how to use Python to perform various natural language processing (NLP) tasks, and helps readers master the practice of designing and building NLP based applications using Python. This book guides the reader through the application of machine learning tools to develop a wide variety of models. This book provides a clear introduction to the creation of training data and the implementation of major NLP applications, such as named entity recognition, question answering system, discourse analysis, word sense disambiguation, information retrieval, sentiment analysis, text summarization, and coreference resolution. This book helps readers create NLP projects using NLTK and become experts in the field. By reading this book, you will be able to: ● Implement string matching algorithms and standardized techniques; ● Statistical language modeling technology; ● Have a deep understanding of the development of stem extractors, shape reducers, morphology analyzers and morphology generators; ● Develop search engine and implement pos tagging and statistical modeling (including N-gram method) and other related concepts; ● Familiar with concepts such as tree library construction, CFG construction, CYK and Earley line graph parsing algorithm; ● Develop NER based systems and understand and apply sentiment analysis concepts; ● Understand and implement relevant concepts such as information retrieval and text summarization; ● Develop a discourse analysis system and a system based on coreference resolution.