This article follows the content of “HanLP- HM-Viterbi Based on the principle of name recognition” to introduce the principle of cascading hidden horse. First of all, let’s talk about the comparison of the name recognition effect introduced last time:

1. Only Jieba identify the name

The accuracy rate is very low, and it is basically the part of place name or complex place name or complex institution name. Examples are as follows:

[1] Qiang Zhi can be bought and sold freely in war-torn Afghanistan, and the price of AK47 is about 500 RMB

“Afghanistan” was identified as a person’s name.

[2] How to plan a road trip from Anqing to Guilin?

Guilin is recognized as a person’s name.

[3] 2018 Tianjin Peace Branch recruitment of community drug addiction, community rehabilitation staff performance query entry

“Rehab” is identified as a person’s name.

2. Only the names identified by HanLP are correctly identified, except the names of the most commonly used surnames. Examples are as follows:

[1] Li Ming, deputy Head of Naxi District, led a team to “Huatian Wine” scenic spot to check the safety work before the festival

“Flower field wine” was identified as a person’s name.

[2] Xiuying “online and offline” work together to help poor households “micro-interaction” to expand the sale of agricultural products “work together” was identified as the name of the person.

[3] Urgent notice: Qin Baorong Media Group zu Mountain one-day tour daily registration fee adjustment! Qin Bao was identified as a person’s name.

3. The name that both HanLP and Jieba recognize

1. People’s names are basically wrong to be recognized by their surnames.

[1] The real estate executive’s salary is at the bottom, with Yu Liang of Vanke receiving an annual salary of 11.899 million yuan, only ranking the second

[2] A poster for the 10th anniversary of the Wenchuan earthquake was released to call for a moment of silence before the game

[3] Why can’t Iran have nuclear weapons while the United States does?

2. Basic errors in the formation of words in the name itself.

[1] A village in Zhoukou is on fire. What is the best way to deal with catkins?

[2] The first line: The Three Kingdoms, Wei, Shu and Wu, how to deal with the second line?

[3] First: brilliant lights wan Jiale. Strives for the bottom allied?

How to solve these bad cases depends on your time, if you have enough time, you can adjust the emission probability file which is the nr.txt file. If you don’t have enough time, as I do now, stick to common surnames and names that require special attention.

The content of the last article first said here, introducing the topic of this article “Named entity Recognition based on hierarchical Hidden horse”. I mainly read this article “Chinese named entity Recognition based on hierarchical hidden Markov model”. Cascade means cascading models, so the structure of the system is shown as follows:



As shown in the figure, the cascading hidden horse is to train three hidden horse models, each model marks an entity, and the three models are connected in a cascade form.

Different entities have different roles, characteristics, real is to have some knowledge of linguistics, these features are actually your reading quantity, summing up experience through you read a lot, such as a name can be used as the name of a feature (zhang, wang, li and zhao), common name suffixes can be used as a feature (province, city, district and county), The suffix of the agency list space can be used as a feature (bureau, office, office, hospital). Here is a brief list of the role labeling of place names: