What is a knowledge graph

On May 17, 2012, Google officially introduced the concept of Knowledge Graph, which was originally designed to optimize the results returned by the search engine and enhance the quality and experience of the user's search. Suppose we want to know "who is" the son of wang jianlin, height or yao Ming, baidu or Google, search engines will return to sicong accurate information and yao Ming, 226 cm tall, that search engines understand the user's intention, know that we are looking for "sicong", not just return key words for "the son of wang jianlin" page:Copy the code

At present, with the continuous development of intelligent information service applications, KNOWLEDGE graph has been widely used in intelligent search, intelligent question answering, personalized recommendation and other fields.Copy the code

Ii/Knowledge graph definition

A knowledge graph is, in essence, a semantic network that reveals the relationships between entities. In the program, he distinguishes between information and knowledge: information refers to external objective facts. Example: Here is a bottle of water. It is now 7 degrees Celsius. Knowledge is the induction and summary of external objective laws. Water will freeze at zero degree. "The induction and summary of objective laws" seems to be a little difficult to achieve. There's another classic reading on Quora that distinguishes between 'information' and 'knowledge'.Copy the code

With such a reference, it is easy for us to understand, and on the basis of information, to establish links between entities, we can make "knowledge". Of course, I think facts are more appropriate. In other words, a knowledge graph is made up of pieces of knowledge, each of which is represented as an SPO triplet (subject-predicate -Object). For example: Guo Degang's son is Guo Qilin, yao Ming is 226 centimeters tallCopy the code

This is actually how the knowledge graph works. It used to be very popular to build knowledge graphs in a top-down way. Top-down means that the ontology and data schema are defined for the knowledge graph before the entities are added to the knowledge base. This approach requires the use of some existing structured knowledge base as its base, such as the Freebase project, which derives most of its data from Wikipedia. Currently, however, most knowledge graphs are built from the bottom up. Bottom-up refers to the process of extracting entities from some open linked data (that is, "information"), adding those with high confidence to the knowledge base, and then building relationships between entities. High confidence: Because there are many different versions of the same information on the web, we need to choose which information to fight for.Copy the code

Iii/Architecture of knowledge graph

The construction and application of large-scale knowledge base need the support of various intelligent information processing technologies. Through knowledge extraction technology, knowledge elements such as entities, relationships and attributes can be extracted from some open semi-structured and unstructured data. Through knowledge fusion, the ambiguity between entities, relations, attributes and fact objects can be eliminated, and a high-quality knowledge base can be formed. Knowledge inference is to further excavate the implicit knowledge based on the existing knowledge base, so as to enrich and expand the knowledge base. The integrated vector formed by distributed knowledge representation is of great significance to the construction, reasoning, fusion and application of knowledge base.Copy the code

Four/Knowledge extraction

Knowledge extraction is mainly oriented to open linked data, which extracts available knowledge units through automated technology. Knowledge unit mainly includes entity (extension of concept), relationship and attribute, which is spO triple. Based on this, a series of high-quality factual expressions are formed, which lays the foundation for the construction of the upper model layer. Knowledge extraction has three main tasks: <1> Entity extraction: technically, we call it NER (named entity recognition), which refers to the automatic recognition of named entities from the original corpus. Since entity is the most basic element in knowledge graph, the integrity, accuracy and recall rate of its extraction will directly affect the quality of knowledge base. Therefore, entity extraction is the most basic and key step in knowledge extraction. <2> Relationship extraction: The goal is to solve the problem of semantic links between entities. The early relationship extraction is mainly to identify the entity relations by manually constructing semantic rules and templates. Subsequently, the relationship model between entities gradually replaced the manually predefined syntax and rules. <3> Attribute extraction: Attribute extraction is mainly for entities, and complete characterization of entities can be formed through attributes. Since the attribute of entity can be regarded as a name relation between entity and attribute value, the extraction problem of entity attribute can be transformed into a relational extraction problem.Copy the code

Five/knowledge representation

Said in recent years, represented by depth study of the learning technology made important progress, the semantic information of the entity can be expressed as dense low dimensional real value vector, and then in a low dimensional space efficient computing entities, relationships, and the complicated semantic relation between, for the construction of knowledge base, reasoning, fusion and application all has the vital significance. Friends who have been following our public account must read the last blog post. Graph embedding is a kind of representation learning.Copy the code

6 / Knowledge fusion

Due to the wide range of knowledge sources in the knowledge graph, the quality of knowledge is uneven, the knowledge from different data sources is repeated, and the correlation between knowledge is not clear enough, so it is necessary to carry out knowledge fusion. Knowledge fusion is a high-level knowledge organization, which enables knowledge from different knowledge sources to carry out heterogeneous data integration, disambiguation, processing, reasoning verification, updating and other steps under the same framework specification, so as to achieve the integration of data, information, methods, experience and people's thoughts, and form a high-quality knowledge base. Among them, knowledge update is an important part. Human cognitive abilities, knowledge reserves, and business needs all increase over time. Therefore, the content of knowledge graph also needs to keep pace with The Times. Both general knowledge graph and industry knowledge graph need to be updated iteratively to expand existing knowledge and add new knowledge.Copy the code

7 / Knowledge graph application

Knowledge graph provides a more effective way for the expression, organization, management and utilization of large, heterogeneous and dynamic data on the Internet, which makes the intelligence level of the network higher and more close to human cognitive thinking. Intelligent search as in the example we introduced at the beginning, after the user's query input, the search engine does not just look for keywords, but first to understand the semantics. For example, after the query is segmented, the description of the query is normalized to match the knowledge base. The returned result of the query is a complete knowledge system given by the search engine after retrieving the corresponding entities in the knowledge base. The deep question answering system is a kind of advanced form of information retrieval system, which can provide the answer to the user's question in the natural language accurately and succinct. Most question-answering systems tend to decompose a given question into several small questions, and then extract the matching answers from the knowledge base one by one, and automatically detect their coincidence degree in time and space, and finally combine the answers and present them to users in an intuitive way. Siri, Apple's intelligent voice assistant, can provide answers and introductions to users as a result of the introduction of knowledge graph. The knowledge graph makes the interaction between machines and people look more intelligent. Facebook launched Graph Search in 2013. Its core technology is a knowledge Graph that connects people, places, things, etc., and supports accurate natural language queries in an intuitive way, such as typing queries: "My friend's favorite restaurant," "My friend who lives in New York and loves basketball and Chinese movies," the Knowledge Graph helps users find the most relevant people, photos, places and interests across the vast social network. Graph Search offers these services that are close to the lives of individuals and meet users' need to discover knowledge and find the most relevant people. In terms of domain, knowledge graphs are usually divided into general knowledge graphs and domain-specific knowledge graphs. In finance, healthcare, e-commerce and many other verticals, the knowledge graph is bringing better domain knowledge, lower financial risk, and more perfect shopping experience. More, such as education and research industry, library, securities industry, biomedical industry and some industries that need to do big data analysis. These industries are in urgent need of integrated and relevant resources. Knowledge graph can provide more accurate and standardized industry data and rich expression for them, helping users to obtain industry knowledge more easily.Copy the code

Eight/summary

Technically, the difficulty with knowledge mapping is NLP, because we need machines to be able to understand vast amounts of written information. But in engineering, we face more problems, from the acquisition of knowledge, knowledge fusion. Search is getting better and better because there are millions (millions) of users who are actually optimizing search results as they search, and that's why It's impossible for Baidu to beat Google in English search because there aren't that many English users. The same is true of knowledge graph. If user behavior is applied to update knowledge graph, it can go further. Knowledge graph is certainly not the final answer to artificial intelligence, but the application direction of knowledge graph, which integrates various computer technologies, must be one of the future forms of artificial intelligence. What are the steps that need to be taken in core collection? What are the disambiguation and fusion steps? Disambiguation is entity reference, divided into association, normalization, and edging. Association is whether the entity to be included is the same entity as the core set entity, normalization is whether the entity to be included is the same entity, and association fusion with edge O is the preferred entity attributeCopy the code