Abstract: Graph databases, if you are new to them, may be misled by their literal meaning. In fact, graph database does not refer to the storage of pictures, images of the database, but refers to the storage of graph data structure of the database. So what is the graph?

This article is shared by huawei cloud community “What is graph database on Earth”, the original author: Hello _TT.

In recent years, there is a database widely mentioned and used in the process of big data processing, that is graph database. So what exactly is a graph database?

Graph databases, for those new to them, can be misleading by their literal meaning. In fact, graph database does not refer to the storage of pictures, images of the database, but refers to the storage of graph data structure of the database. So what is the graph?

What is the figure

Let’s look at the following example.

At the end of the Eastern Han Dynasty, the allied forces of Sun Quan and Liu Bei destroyed cao’s army by attacking enemy ships in the Red Cliffs.

If we abstract the relationships between the camps, using the camps as points and the relationships between the camps as edges, we can visualize the relationships as follows:

So that’s what’s called a graph.

A Graph is composed of points and edges. A point is an entity, such as the camp in the above example. The relationship between two entities is represented by edges with or without direction, such as the alliance between Liu Bei and Sun Quan. This generic architecture can model a variety of real-world scenarios, from transportation systems to organizational architecture management, process design to social networking.

What is a graph database

Knowing the concept of graphs, you can understand what a graph database is. Simply put, graph databases are tools for dealing with graph data structures.

Different from the traditional relational database that uses two-dimensional tables to store data, graph database is classified as NoSQL(Not Only SQL) database in the traditional sense, that is, graph database belongs to non-relational database.

The general graph database contains at least three functions: graph storage, graph query and graph analysis.

Why a graph database

So why do we use graph databases? Let’s use an example from the end of the Eastern Han Dynasty to illustrate the advantages of graph databases over relational databases.

Suppose there are three tables in a relational database, which are the table of annual figures in the late Eastern Han Dynasty, the table of annual campaigns in the late Eastern Han Dynasty and the table of annual figures who participated in the war in the late Eastern Han Dynasty.

When we want to know “the defender who is” the battle of FanCheng, queries are generally faster, from table 2 can be directly, but when we want to know “what liu bei group launched the war”, although we can also find the answer from table 2, but we may need to traverse the entire table 2, the query efficiency will decrease instantly. When we want to query such as “which wars did Guan Yu fight in liu Bei Group”, let’s look at the relational database to execute this query:

A. Firstly, find guan Yu’s character ID through the figure table in the late Eastern Han Dynasty

B. Then use the war table of figures in the late Eastern Han Dynasty to find out the battles they participated in

C. Finally, through the battle table of the last years of the Eastern Han Dynasty, we can find out which battles in which liu Bei group attacked

We will find that this query is too tedious.

If we translate the above table into a graph of relationships, it becomes clear who is what.

In case you haven’t really appreciated the power of graph databases, let’s take a look at querying performance comparisons in one of the most classic social networks.

In the book Neo4j in Action, the author made a test: in a social network of 1 million people, each of whom has about 50 friends, the friend with the maximum depth of 5 is found, and the experimental results are as follows:

The test results show that the performance of the two databases has little difference when the depth is 2. When the depth is 3, the relational database takes half a minute to complete the query, and the graph database still completes the query in 1 second. When the depth is 4, it takes nearly half an hour for relational database to return results, and less than 2 seconds for graph database. However, when the depth reaches 5, the relational database is slow to respond, but the graph database can still “kill”, showing very good performance.

Based on this, we can understand the use of graph databases from the following aspects:

  • Relational databases are not good at dealing with the relationship between data, while graph databases are flexible and high performance in dealing with the relationship between data

We undeniably relational database since the 1980 s has been the main field of database development, at present, with social networking, Internet of things, the rapid development of finance, electricity and other fields, the resulting data show exponential growth, and the traditional relational database in the treatment of the complex relations between data on the performance is very poor, This is because relational databases implement relational references between multiple tables through foreign key constraints. Querying the relationships between entities requires a JOIN operation, which is often time-consuming.

The original design motivation for graph databases was to better describe the relationships between entities. The biggest difference between graph databases and relational databases is index-free adjacency. Each node in the graph data model maintains relationships with neighboring nodes, which means that the query time is not related to the overall size of the graph, but only to the number of neighboring points of each node. This enables the graph database to maintain good performance when dealing with a large number of complex relationships.

In addition, the diagram’s structure makes it easy to expand. It is not necessary to consider all the details at the beginning of the model design, because it is easy to add new nodes, new relationships, new attributes, and even new tags later on without breaking existing query and application capabilities.

  • The relationship between data is increasingly important

When we ask why graph databases are so important, we are asking, why is the relationship between data so important? Just as we all know the value of interpersonal relationships, the value of data also lies in the correlation between them.

Let me give you an example. Recently, livestreaming is very popular. If an anchor has millions of fans on Weibo, this data is of little value if it is not used. However, if he livestreaming connects his fans who follow him with customers who may come to his livestreaming studio to shop, these data will immediately show great commercial value.

  • Using diagrams to express many things in the real world is more direct, intuitive, and easy to understand

There are all kinds of relationships in nature, and the relational database can only flatten these into table form of row and row data, and graph data based on graph model to simulate these relationships in an intuitive way, so more image.

In addition, most graph databases now provide visual graph presentation, making query and analysis very intuitive.

  • Professional graph analysis algorithms provide solutions for practical scenarios

Graph database originated from graph theory, with the help of professional graph analysis algorithm, it can provide suitable solutions for practical scenes.

How to store, query and analyze graph database

  • Figure storage

How a graph database stores graphs is critical to query and analysis efficiency. Graph databases use graph models to manipulate graph data. Graph model refers to the way graph database describes and organizes graph data.

At present, the graph model selected by the mainstream graph database is the attribute graph. An attribute graph consists of points, edges, labels and attributes. Let’s take a look at it with an example of a specific attribute graph.

The above attribute diagram can help us understand some related concepts:

1) Labels can be set for points, such as Person, war, etc. Points with the same label are considered to belong to a group, a collection, so That Liu Bei and Cao Cao belong to a group;

2) You can also set labels for edges, such as Relation, etc.

3) The node can have many attributes, such as style name and year, and these attribute values are expressed in the form of key-value pairs. For example, Liu Bei’s style name is Xuande;

4) Edges can also have attributes, such as army;

5) Sides are allowed to have directions, for example, the direction of the sides between Liu Bei and the Battle of Hanzhong is from Liu Bei to the battle of Hanzhong;

6) Metadata is used to describe the attribute information of points and edges. Metadata consists of several labels, and each label consists of several attributes.

  • Figure query

If we want to know where Liu Bei was born, what is the relationship between Liu Bei and Cao Cao, who launched the Hanzhong war and so on, these belong to the category of map query.

As we know, SQL is the query language for relational databases, but the query language for graph databases does not reuse SQL. This is because the above database is essentially dealing with high-dimensional data, while SQL is applicable to two-dimensional data structures, which are not good at relational queries and operations. Using a specialized graph query language is more efficient than SQL.

At present, the mainstream graph query languages include Gremlin and Cypher.

  • Figure analysis

Graph analysis is a technique of mining graph information through various graph algorithms.

The core graph algorithms can be divided into three categories: path search, centrality analysis and community discovery.

Path search is to explore the direct or indirect relationship between nodes in a graph through edges. For example, in the figure below, through path search, we found such a path: Sun Ce -[husband and wife]- Big Qiao -[sister]- Xiao Qiao -[husband and wife]- Zhou Yu, according to which we know that Sun Ce and Zhou Yu are brother-in-law. Path search algorithms are widely used in logistics distribution, social relationship analysis and other scenarios.

Centrality analysis is to analyze the importance and influence of a particular node in a graph. In the figure above, for example, sun Quan is intuitively an important figure because he has the most edges directly connected to him. Centrality analysis algorithms are generally used in web page sorting, opinion leader mining, influenza transmission and other scenarios.

Community discovery is intended to find a more cohesive group structure in the map. If more figures and relationships of The Three Kingdoms are added to the figure above, and the association mining algorithm like Louvain and others is used, we can easily find that these figures belong to three camps, as shown in the figure below.

Community discovery algorithms can be used in criminal gang mining and other scenarios.

What are graph databases for

After introducing the main functions of graph database, let’s look at the application scenarios of graph database. Graph database is good at application areas include:

  • Social: Facebook, Twitter uses it for social relationship management, friend recommendations

Recommended by friends we know well. You can recommend friends of friends.

Xu shu and Sima Hui recommended Zhuge Liang to Liu Bei, which can be vividly shown in the picture below

  • E-commerce: Huawei mall uses it to realize real-time product recommendation

By analyzing the preferences of target users and other users, find other users who are similar, and recommend the products purchased by these users to target users.

  • Financial sector: Industrial and Commercial Bank of China, Morgan Stanley general use it to do risk control management

At present, the financial field is in urgent need of graph database. For example, in the whole loan cycle, graph database can play a huge role.

  • Anping area: public security used it to examine suspected relations, criminal gang excavation

At the end of the Eastern Han Dynasty, Cao Cao assassinated Dong Zhuo, Diao Chan set Dong Zhuo off against his father, and Lu Bu killed Dong Zhuo, but Dong Zhuo did not know that one of the culprits behind these events was Wang Yun, as shown in the picture below. In reality, the real killer may not be directly involved in the target case, only indirectly.

What scenarios are appropriate for graph databases

You can determine whether your problem requires a graph database based on the following:

If many-to-many relationships occur frequently in your problem, a graph database is recommended. If the relationship between data in your problem is important, a graph database is recommended. Graph databases are recommended if you need to deal with relationships between large data sets.

Graph Database product

Now the graph database products have appeared the situation of contention of a hundred schools of thought. Neo4j, as the representative of the old graph data, still has many fans, but due to its own defects, challengers are increasing, and Huawei’s GES cloud map engine graph database, as the light of domestic graph database, is becoming the leader among them.

GES interface

Click to follow, the first time to learn about Huawei cloud fresh technology ~