Intelligent question answering system, the last two years of speculation. But strip away the fancy powerpoint presentation and all the dirty technology in the paper, and the question we most want to know is: How does this thing actually land?

Recently, Baidu has opened an open source q&A system framework AnyQ(ANswer Your Questions), which is mainly oriented to FAQ collection. What about this framework? Let’s take a look.

Abstract

AnyQ is short for ANswer Your Questions and goes to the core business of the question answering system: answering Your Questions. In fact, the project includes AnyQ, a question-answering system framework for FAQ collection, and SimNet, a text semantic matching tool.

AnyQ adopts configuration and plug-in design, and all functions are added in the form of plug-ins. Currently, a total of 20+ plug-ins are opened. Developers can use AnyQ system to quickly build and customize FAQ system suitable for specific business scenarios, and accelerate iteration and upgrade.

SimNet is a semantic matching framework independently developed by Baidu Natural Language Processing Department in 2013. This framework is widely used in various Baidu products, mainly including BOW, CNN, RNN, MM-DNN and other core network structure forms. Meanwhile, based on this framework, it also integrates the mainstream semantic matching model in the academic world. Such as MatchPyramid, MV-LSTM, K-NRM and other models. SimNet is implemented using PaddleFluid and Tensorflow to facilitate model extension. The model built by SimNet can be easily added into AnyQ system to enhance the semantic matching ability of AnyQ system

AnyQ framework

AnyQ system framework is mainly composed of Question Analysis, Retrieval, Matching, re-rank and other parts. The functions contained in the framework are added in the form of plug-ins, such as Chinese word cutting in Analysis, inverted index and semantic index in Retrieval. Jaccard feature and SimNet semantic Matching feature in Matching, a total of 20+ plug-ins are currently open. The configuration and plug-in design of AnyQ system helps developers to quickly build and customize FAQ system suitable for specific business scenarios, and accelerate iteration and upgrade. AnyQ’s framework structure is shown as follows:

Configuration change

AnyQ system integrates the retrieval and matching of many plug-ins, effective through configuration; Take the plug-in in retrieval mode and text matching similarity calculation as an example:

  • Retrieval method
  • Inverted index: Based on the open source inverted index Solr, added baidu open source segmentation
  • Semantic retrieval: ANN retrieval is based on SimNet semantic representation using PIERCED language
  • Manual intervention: Control output by providing accurate answers
  • Matching
  • SimNet Semantic matching: Using the model of SimNet architecture training for semantic matching, the similarity degree of the problem at the semantic level is constructed
  • Cosine similarity
  • Jaccard similarity
  • BM25
  • Literal matching similarity: the literal matching feature is calculated after the Chinese problem is processed by cutting words
  • Semantic matching similarity

pluggable

In addition to the framework, all functions of AnyQ are added in the form of plug-ins. User-defined plug-ins can be easily added into AnyQ system by implementing corresponding interfaces, such as custom dictionary loading, Question analysis method, retrieval method, matching similarity, sorting method, etc., to truly achieve customization and plug-in.

Text semantic matching framework SimNet

SimNet is a semantic matching framework independently developed by Baidu Natural Language Processing Department in 2013. This framework is widely used in various Baidu products, mainly including BOW, CNN, RNN, MM-DNN and other core network structure forms. Meanwhile, based on this framework, it also integrates the mainstream semantic matching model in the academic world. Such as MatchPyramid, MV-LSTM, K-NRM and other models. SimNet is implemented using PaddleFluid and Tensorflow to facilitate model extension. The model built by SimNet can be easily added into AnyQ system to enhance the semantic matching ability of AnyQ system.

According to the text semantic matching network structure, the network models realized in SimNet can be divided into the following two categories:

Representation-based Models

E.g. BOW, CNN, RNN(LSTM, GRNN)

Features: Input from both ends of the text matching task is represented separately, and then the representation is fused to calculate the similarity;

Interaction-based Models

For example, MatchPyramid, MV-LSTM, K-NRM, mm-DNN

Features: After obtaining the word level text sequence representation, the similarity matching matrix is calculated according to the two sequence representations, and the final similarity score is given by integrating the matching information at each position.

SimNet is implemented using PaddleFluid and Tensorflow. Visit this open source project for more details

Semantic model based on massive search data

Based on baidu’s massive search data, a Simnet-BOW semantic matching model was officially trained. In some real FAQ scenarios, the effect of this model was more than 5% higher than that of literal similarity method AUC.

Follow the Android Green Alliance public account for more technical dry goods.