The author | RedisGraph development team
The translator | GaiLei
Edit | Emily
AI Front Line introduction:RedisGraph implements a high-performance in-memory graph database on Redis. The project is provided as an open source Redis module, supporting openCyper query language, which can complete the operation of graph creation, query, condition matching and so on. To support efficient graph search operations, the underlying RedisGraph implements a triplet storage structure called Hexastore. This article introduces some of the internal design and features of RedisGraph and shows what it currently has.






Please pay attention to the wechat public account “AI Front”, (ID: AI-front)
The introduction

Nowadays, data based on graph structure (hereafter referred to as “graph data”) is everywhere. Companies such as Facebook, Twitter and Pinterest have seen and maximised the power of this interconnected data. This has led directly to an increase in interest in graph data solutions, and the number of solutions is increasing.

With the introduction of the Redis loadable module system, we realized the great potential of introducing graph data structures into the Redis toolset. Redis is implemented in the native C language and focuses on providing high performance features. We have now introduced new graph database capabilities to Redis by developing it. RedisGraph is available as an open source project on GitHub.

This article looks at some of the internal design and features of RedisGraph and shows what it currently has.

RedisGraph overview

RedisGraph is a graph database based on the new design of Redis. It uses the new Redis Modules API, extends Redis and provides some new commands and features. Key features include simple and fast indexing and querying, in-memory storage of data, use of custom memory-efficient data structures, disk-based data persistence, result sets in the form of tables, support for simple and widely used graph query language Cypher, and data filtering, aggregation, and sorting.

Test: RedisGraph

To introduce some of the key features of RedisGraph, let’s give an example running on the Redis-CLI tool.

Building a graph

Entities are usually represented as nodes in a diagram. This example creates a small-scale diagram where the entities in the diagram are actors and movies, and the “actors” relationship connects the actors and movies in the movie. Use graph.query to publish the CREATE statement in the following format, which implements adding new entities and relationships to a GRAPH.

The following command creates a graph:


Figure of the query

RedisGraph implements a subset of the openCypher graph query language. Although only a few features of the language are supported, they are sufficient to extract useful information from diagrams. The format of the command to execute a QUERY using graph. QUERY is as follows:

GRAPH.QUERY <graph_id> <query>Copy the code

Let’s perform a series of queries on the movie map data we created.



The first line is the header information for the result set, and the columns are named according to the RETURN statement. The second line contains the results of the query.

The second query is to find out how many films each actor has appeared in.


The idea behind RedisGraph

Different graph databases may use different structures to represent graphs. Some use an adjacency list, others use an adjacency matrix. Each representation structure has its own pros and cons. For RedisGraph, the key is to provide a data structure that allows quick searches of graphs. We use a structure called “Hexastore” to preserve all the relationships in the diagram.

Hexastore representation of a graph

The structure of Hexastore consists of a series of triples. Each triplet consists of the following three parts:

  • Subject;

  • Predicate;

  • Object.

The subject represents the source node, the predicate represents the relationship, and the target represents the destination node. For each relationship in the diagram, Hexastore stores all six possible permutations between the source node, the relationship edge, and the destination node. Take this relationship as an example:

(Aldis_Hodge)-[act]->(Straight_Outta_Compton)Copy the code

Aldis_Hodge was the source node, ACT was the relationship, and Straight_Outta_Compton was the target node.

The six possible permutations of the relationship are as follows:

SPO:Aldis_Hodge:act:Straight_Outta_Compton
SOP:Aldis_Hodge:Straight_Outta_Compton:act
POS:act:Straight_Outta_Compton:Aldis_Hodge
PSO:act:Aldis_Hodge:Straight_Outta_Compton
OPS:Straight_Outta_Compton:act:Aldis_Hodge
OSP:Straight_Outta_Compton:Aldis_Hodge:actCopy the code

Once Hexastore is built, graph search can be easily implemented. For example, if you wanted to find actors from the “Straight Outta Compton” movie, this could be implemented by searching for all strings in Hexastore that contained the prefix OPS:Straight_Outta_Compton:act:*.

If you want to find all Aldis Hodge movies, you can implement this by finding all strings that contain the prefix SPO:Aldis_Hodge:act:*.

Although Hexastore consumes a lot of memory because each relation is represented as six triples, such a triplet data structure not only supports fast searches, but also uses memory efficiently because it does not repeatedly generate existing string prefixes.

OpenCypher query language

There are already some graph query languages out there, and we didn’t want to reinvent the wheel and implement our own. Therefore, we decided to implement a subset of openCyper, the most widely used graph query language. Although the language parser creation method provided by the Open-CYPher project was easy to use, we decided to create our own parser. The parser uses Lex as a word splitter and Lemon to generate a C target parser.

As mentioned above, we currently only support a subset of the language. We will continue to add new capabilities and extend the language.

Query execution at runtime

Now, let’s walk through the steps the module in RedisGraph takes in executing the query. Take the following query, which finds all actors over the age of 30 who have appeared in films with Aldis Hodge:


RedisGraph parses queries, builds abstract syntax trees, builds execution plans (consisting of tag scan operations, Filter operations, Expand operations, Expand into operations, and so on), execution plans, and adds matching entity attributes to the result set.

Query resolver

For a given valid query, the parser will generate an abstract syntax tree. The following six types of statements generate corresponding primary nodes in the abstract syntax tree:

  1. MATCH

  2. CREATE

  3. DELETE

  4. WHERE

  5. RETURN

  6. ORDER

Generating abstract syntax trees is a common way to describe and structure a language.

Filter the tree

Queries filter out entities by creating predicates. Take the query above for example, where actors younger than 30 need to be filtered. In a query, you can use a combination of predicates such as OR AND AND to construct granularity conditions. The filter tree is built at run time based on the WHERE statement. Each node in the filter tree can represent either A condition (for example, A>B) OR an operation (AND OR OR). Candidate entities are evaluated through the filter tree.

Query processing

The MATCH statement describes the relationship between the queried entities (nodes). Nodes can have aliases to support the execution of references (WHERE, RETURN statements) late in the query lifecycle. Eventually, however, you must assign an ID to all the nodes. The process of assigning an ID to a node is called the search phase.

The search phase queries Hexastore according to the MATCH statement to find all ids. Take the above query for example. The search phase starts with finding Aldis Hodge movies. Expand the search for each movie to find the actors in the current movie.

Such a search process can be regarded as an iterative process of traversing the graph. The iterative process finds a new ID at each step. Once the node has specified an ID, you can confirm that the current entity meets the criteria for the filtering statement. You can then extract the request attributes specified in the RETURN statement and add the new record to the final result set.

The benchmark

The complexity of inserting a new relationship is O(1), and RedisGraph can create 100,000 new relationships in one second. The test results may differ on different underlying hardware.

The performance of retrieving data depends entirely on the size of the graph and the type of query being executed. For a small graph, such as about 1,000 entities with 2,500 edges, RedisGraph can perform about 65,000 friend of a friend (FOAF) queries per second.

It should be noted that, except for Hexastore, we do not index entities. In the future we will implement entity indexing, which will greatly reduce query execution time.

Licensing conditions

Redis-graph is distributed under the AGPL-3.0 license.

conclusion

Although the RedisGraph project is new, it is already available as an alternative to graph databases. RedisGraph supports a subset of operations that analyze and explore graph data. As a Redis module, all Redis customers can use this module functionality without making any adjustments. We are considering continuing to improve and further expand RedisGraph with the help of the open source community.

英文原文 :

http://redisgraph.io/design/


Please pay attention to the wechat public account “AI Front”, (ID: AI-front)