Netease, an artificial intelligence technology and service brand owned by netease, beat several competitors in the “Thousand Words Dataset: Text Similarness” industry evaluation jointly held by CCF and Baidu and topped the list.

Text similarity, which is to identify two text on the semantics is similar, in the field of natural language processing (NLP) is an important research direction, has large-scale commercial in intelligent customer service, information retrieval, news recommendation, and other fields, such as the service has been more than 400000 corporate clients netease intelligent customer service, seven fish behind has the support of this technology.

“Netease Hangzhou Research Institute” in the list is netease Easy Intelligence Team

Knowledge precipitation and technology accumulation contributed to the study, and netease easy-to-read text similarity ranked first

The “Thousand words Dataset” series evaluation is a large-scale competition in the field of Chinese natural language processing. The text similarity open source project collected LCQMC and BQ Corpus from Harbin Institute of Technology, as well as Paws-X (Chinese) from Google to comprehensively evaluate the effect of text similarity model. Promote the application and development of text similarity in the field of natural language processing.

It is understood that these open data sets, supported by relevant papers, have carried out a more comprehensive evaluation of the existing open text similarity models, and are highly authoritative, representing the highest level of text similarity technology research.

Harbin Institute of Technology (Shenzhen) LCQMC dataset task example

In this text similarity assessment, netease Yizhi has achieved excellent results by combining years of technical experience accumulation, large-scale pre-training language model application, and targeted optimization of competition tasks.

According to netease’s team, there are two main difficulties in this competition. One difficulty is that the data set of BQ Corpus is the data of the financial field, which involves a lot of knowledge of the financial industry, while the general pre-training language model is difficult to capture the potential knowledge of specific industry. To this end, the team used semi-supervised learning and other methods to dig out the pan-financial domain knowledge from multiple business scenarios inside netease, and then obtained the pre-training language model in the financial domain. Finally, the team was far ahead of other teams in this task.

Another difficulty is PAWS – X the quality problem of the data set, the data from the English translation, translation content and real Chinese from, especially can cause interference to algorithm is the entity of words such as person names, place names translation, meaning that the same names, one sentence in English before the original, after a sentence is transliterated into Chinese. In view of this data feature, netease Yizi used the NER (Named Entity Recognition) service of its own research to identify and normalize entity words, and used the Chinese text correction service of its own research to correct the wrong words and language errors, and then carried out model training, and finally won the first place in this task.

Netease Easy Intelligence helps Qiyu robot to accurately understand customer demands

Based on a series of NLP technologies such as text similarity, netease has built an intelligent dialogue system to serve multiple businesses within the Group, such as Yanxuan customer service, IT consulting, etc., and jointly created intelligent customer service robot products with Qiyu business to serve external customers of the Group.

Take Joyoung Co., Ltd as an example, one of its core demands is to guarantee users’ shopping experience through efficient, accurate and humanized consulting services, such as users’ consultation on the function, operation, price, preferential activities, maintenance and maintenance of small home appliances.

To this end, Jiuyang has connected to the netease Seven Fish online robot, which provides intelligent service experience that better understands users on the basis of the matching rate of questions reaching more than 90%.Based on netease intelligent text similarity algorithm, Qiyu online robot realizes core semantic matching, and thus achieves BOT, FAQ and other functions. In addition, through the semantic matching technology, Qiyu online robot also realized the intelligent mining and generation of knowledge base.With these capabilities, Qiyu online robots can efficiently and accurately answer customer questions in different scenarios.

In the field of express delivery, STO Express also has access to seven fish intelligent customer service to deal with express delivery consultation problems, which is a completely different field from the above finance and small home appliances. However, using the same technical principle of netease Easy Intelligence, intelligent customer service quickly achieved similar results.

Netease Easy Intelligence NLP promotes digital business innovation

The commercial value of text similarity technology is not limited to intelligent customer service. According to netease Yi Zhi controller introduces, the text similarity technology categories to text matching, in addition to the dialogue in the engine, application of this technology in netease internal more fall to the ground, such as netease cloud music in the comments of the lyrics of intelligent mining, short live/video matching, and knowledge of highway in innovative solutions for applications such as video selected topic similarity detection.

From the perspective of the whole technology field, as a technology that enables machines to understand human language, NLP is known as “the pearl on the crown of artificial intelligence”. It is not only a frontier subject that is difficult to overcome, but also of great significance to digital business innovation. In addition to text similarity, netease has been exploring the greatest common divisor between NLP technology and business innovation, and has achieved some initial results.

For example, the use of semantic parsing technology in software testing significantly improves the level of automation and achieves cost reduction and efficiency increase, which is very beneficial to guarantee the quality of digital software. Text error correction technology is widely used in text correction scenarios such as netease news, which can detect and correct spelling and grammar errors in a timely manner, greatly improving users’ reading experience and reducing the workload of content production.

In the future, netease will also cooperate with several teams under netease Datafan to explore the application of NLP in big data system, such as supporting natural language interaction between business personnel and analysis system, so that enterprises can better play the value of big data.