Developer Li Lei maintains his own graph database. How can he query the graph database for his friends who are older than 30 and follow each other?

Here are some graph native query languages:

1.gremlin

G.V (" li lei "). OutE (' friend '.) from the (' age, gt (30)). The otherV () where (out (' friend '.) (hasId (' li lei))). The limit (100).Copy the code

2.cypher

Match (a)-[r1:friend]->(b)-[R2 :friend]->(c) where a.id =' r1.age 'and a=c return ID (b) limit 100Copy the code

The two approaches are equivalent, but the graph query languages are different. The former is written using Gremlin (Graph Traversal Language developed by TinkerPop under Apache Software Foundation), while the latter is openCypher, an open source version of graph query language released by Neo4j in 2015.

Query Methods overview

GES supports a variety of query methods. This article mainly discusses complex queries, such as multi-hop filtering, simple loop query, pattern matching and other complex query classes. GES currently supports query requirements for a variety of scenarios through Gremlin, CYPher, and some native apis.

Advantages of | | | | show disadvantage: – : | : – : | : | : — | — – | : – Gremlin | functional language | 1. Power of expression (turing-complete) 2. Support the groovy script | 1. Poor performance (an exponential runtime tree) 2. Complex queries | dysgraphia is suitable for the experience, demo, or not to pursue performance scene of Cypher | the Pattern match style, the declarative | 1. SQL writing class 2. GES in depth integration, performance is better than gremlin | 1. Presentation power is poor for diagrams (SQL Complete only) 2. Streaming support co., LTD. | is applicable to general scenario The native API | json parameter | 1. Performance is very good. 2. A fool parameter form | 1. Poor flexibility (2) limited usage scenario 3. Poor expression ability | provide common query API, applicable to the performance, high concurrency, low latency scenarios

performance

In this section, the friend scenario of Li Lei described above shows the performance test of a query k=2 loop, to help you more intuitively understand the performance gap between various queries:

Among them,

  • Filterv2-ges native API, which has the best performance, TPS and latency performance.
  • Cypher – GES Cypher query.
  • Neo4j-neo4j Community 4.2.3 Version CYPher.
  • Gremlin of Gremlin-ges is not reflected in the figure and has the worst actual performance, so no comparison is made.

Gremlin user manual

Gremlin includes both OLTP and OLAP syntax. Its OLTP syntax is flexible and conforms to the graph native expression mode, which is widely integrated in various graph database manufacturers.

GES queries are excellent, but we only recommend using Gremlin in demo/ simple queries. Its performance is much lower than that of kernel native apis and CYPher, due to the way Gremlin is integrated. Although GES is optimized for common syntax, it does not completely cover all Gremlin queries. We recommend that you use a more integrated CYPher to refer to the subsequent CYPHER section for regular queries, and use native apis for scenarios that require more performance.

See the following Resources for more information:

  1. Tinkerpop Documentation – Tinkerpop documentation The content is detailed, each step has case introduction.
  2. Gremlin Practice Cases Gremlin practice cases

Common grammar

G.V () // Get all the points in the graph. Note This is a high-risk operation on a large scale. Use it with caution. G.V ().limit(10) // take 10 points in the figure. Not random. G.V (' xiaoxia ',' Xiaoxia '). Values ('age') // Get the value of the attribute age of xiaoxia and xiaoxia in the figure. > [20, 22] g.V (' little wisdom.) out (' friends'.) out (' friends') / / little wisdom for friends of friends > [xiao Ming] g.V (' little wisdom). OutE (' friends'). InV () outE (' friends'). InV () / / on which the meaning of sentence and the article completely consistent. G.V (' little wisdom.) out (' friends') from the (' age, gt (30)) / / to get little wisdom age greater than 30 friends g.V (' little wisdom.) out (' friends') values (" age "). The sum () / / statistical little wisdom all friends age combinedCopy the code

GES Gremlin special syntax/optimization

GES integrates OLTP function in Gremlin, and makes some function enhancement and strategy optimization to some extent.

Enhanced Text Predicate

g.V().has('name', Text.textSubString('xx'))...
Copy the code
Predicate describe
textSubString The substring
textCISubString Ignore case substring
textFuzzy Fuzzy matching
textPrefix The prefix queries
textRegex Regular match

Matters needing attention

When specifying a schema, it is best not to name attributes id, label, Property, or properties.

There are many steps that convert results to map results when performing gremlin operations. Insert multiple keys into a map, the value of each key will be overwritten or the operation will be cancelled.

So if we name the property ID, label, property, properties, in many operations, if the ID is returned with the ID in the property, the result will be incomplete.

Cypher User manual

Cypher is more syntactically similar to SQL and slightly less expressive (SQL complete), but it can handle most scenarios.

GES integration of CYPher is closer to the kernel side, making full use of the performance advantages of the kernel.

See the following Resources for more information:

  1. Cypher RefCard Cypher Grammar Reference card.
  2. GES CYPHER GES CYPher Public Cloud API documentation, including compatibility notes, data type support, expressions, functions, etc.

Common grammar

Match (n) return n // Get all points in the graph. Note that this behavior should be used with caution on large images. Match (n) return n limit 10 Not random. Match (n) where n.movieid IN [' x ',' x '] return n.age Match (n) - > (m1: friends) - > (s1) - > (m2: friends) - > (n) (s2) where id = 'little wisdom' return s2 / / little wisdom for friends of friends match (n) - > (m1: friends) -- - > the where (s) Id (n)=' smarty 'and s.age>30 return s // Get smarty friends older than 30Copy the code

Compatibility Note

GES will continue to optimize and strengthen CYPher. Currently not supported due to feature/development schedule:

  1. Currently, operations such as union, merge, foreach, and optional are not supported. Adding or deleting indexes using Cypher statements is not supported.
  2. As the metadata of GES is not Schema Free and the dot edge label attribute is strictly limited, Remove operation is not supported.
  3. The Order by clause does not support sorting of the List type, and the sorting result is unknown when the Cardinality of the attribute value is not single.

The latest support is available in the CYPher documentation.

GES Cypher specialized treatment

Preset conditions

To speed up queries and optimize query plans, GES Cypher queries are compiled using label-based point-edge indexes. You can add related indexes on the INTERFACE /API. Create an index API.

POST to http:// {SERVER_URL} / ges/v1.0 / {project_id} / graphs / / indices {{graph_name} "indexName" : "cypher_vertex_index", "indexType": "GlobalCompositeVertexIndex", "hasLabel": "true", "indexProperty": [] }Copy the code

Native API manual

The kernel native API is a high-performance version specifically for common scenarios. The input and output are in JSON format.

GES provides a variety of native API interfaces, including management API(mainly graph operation, import and export backup, etc.) and business API. The business side API includes basic graph database functions such as point-edge CRUD, index management, advanced path Query, and advanced algorithm libraries.

See the following Resources for more information:

  1. Management side API Overview
  2. For an overview of the business API, see this blog post for how to invoke the business API.
  3. Execution Algorithm Description

Common APIS

A. The Path of the Query

The interface supports multi-hop filtering and cyclic execution of traversal queries to speed up.

For example, the following gremlin statement:

g.V('node1').repeat(out('label_1')).times(3).emit()
Copy the code

The above gremlin is a three-hop query, starting from node1, to query the neighbor of the neighbor whose out-side relationship is Label1. And returns all the points involved in the process. The script organizes the query by repeat, controls the number of loops by times, and even then terminates the traversal with the keyword until.

The above gremlin can be used with the path query call:

POST/ges/v1.0 / {projectId} / graphs / {graphName} / action? action_id=path-query { "repeat": [ { "operator": "outV", "edge_filter": { "property_filter": { "leftvalue": { "label_name": "labelName" }, "predicate": "=", "rightvalue": { "value": "label_1" } } } } ], "emit": true, "times": 3, "vertices": [ "node1" ] }Copy the code

1. by mode

For example, for a two-hop neighbor, we can filter the ID, label, and property by:

G.V (" a "). Repeat (out ()). The Times (2) by (id ()) / / all jump jump two outputs a neighbor's id. G.V (" a "). Repeat (out ()). The Times (2) by (label ()) / / all jump jump two outputs a neighbor's label.Copy the code

Using Path Query, we can easily convert the above Gremlin into a native API for better performance, with the same keyword meaning. In BY, you can specify the form of the output, such as the output ID:

{
  "repeat": [
    {
      "operator": "outV"
    }
  ],
  "times": 2,
  "vertices": [
    "a"
  ],
  "by": [{"id": true}]
}
Copy the code

2. select+by mode

This mode can optionally select N levels traversed on traverse paths.

As shown in the figure above, we want to show the association between Li Lei and his second hop friend:

Gremlin writing:

G.V (' li lei.) as (' where v0). Repeat (out ()). The Times (2) as (" v2 "). The select (' where v0 ', 'v2) by () (id) by (id ()). The dedup () {/ / return sample "results" : [{" where v0 ":" li lei ", "v2" : "little wisdom"}, {" where v0 ":" li lei ", "v2" : "misty"}]}Copy the code

Cypher writing:

Match (v0)<--(v1)<--(v2) WHERE ID (v0)='1' return DISTINCT ID (v0), ID (v2) // Returns json in the following redundant form: {"data": [{"row": / "li lei", "little wisdom", "meta", null, null}, {" row ": [" li lei", "misty"], "meta" : [null null]}]}Copy the code

Using the path query:

{" repeat ": [{" operator" : "outV}]," times ": 2," are: "[]" li lei ", "by" : [{" id ": true}, {" id" : true}], "select" : [" where v0 ", "v2"]} / / return json as follows, direct return to select the parameter: {" select ": [[" li lei", "little wisdom"], [" li lei ", "misty"]]}Copy the code

Algorithm API

The algorithm API provides a rich library of native algorithms that can be used through an interface or API.

Running the algorithm directly on the interface allows you to visually view the results of the algorithm. For details about each algorithm, you can also query the official algorithm document.

parameter Whether the choice instructions type Value range The default value
alpha no Weight coefficient (also known as damping coefficient) Double 0 to 1, not including 0 and 1 0.85
convergence no The convergence accuracy Double 0 to 1, not including 0 and 1 0.00001
max_iterations no Maximum iteration Integer 1~2000 1000
directed no Whether to consider the direction of the edges Boolean True or false true

The pagerank algorithm parameter list above lists the parameter description, value range and default values.

Alternatively, we can call algorithms directly using RESTful apis, most of which interact asynchronously: after calling the algorithm we get a jobId. Using jobId, we can query the running status and results of the algorithm. Results can also be paginated and quantified.

GET http:// {SERVER_URL} / ges/v1.0 / {project_id} / graphs / {graph_name} / jobs / {job_id} / status? offset=0&limit=2Copy the code

The shortest path shortest_path

The shortest path belongs to a traditional graph theory problem, which contains various forms. For example, the single source is the shortest, and the multi source is the shortest. Commonly used algorithms such as Dijkstra, Bellman-Ford, Floyd-Warshall, Johnson and so on.

GES provides shortest paths for a variety of scenarios, such as shortest_Path, ALL_SHORtest_Paths, shortest_PATH_of_vertex_sets, etc. Both are provided as native apis or graphical interfaces

This API is an algorithmic API that is called to view documents.

POST/ges/v1.0 / {project_id} / graphs / {graph_name} / action? action_id=execute-algorithm { "executionMode": "sync", "algorithmName": "shortest_path", "parameters": { "source": "Alexander Payne", "target": "Harrison Ford", "weight":"score" } }Copy the code

Detailed description of different query modes

Case 1- Friend recommendation

We recommend friends to Li Lei, the idea is: recommend his friend’s friend to him, but the recommended friends should not include Li Lei’s own friends, for example, Han Meimei in the picture is li Lei’s one-hop friend and two-hop friend at the same time. At this time we should not recommend Han Meimei to Li Lei, because she is already li Lei’s good friend.

The following shows how to use gremlin, CYPher, and Path query to return recommended paths: Li Lei -> Li Lei friends -> Recommended friends.

gremlin

Gremlin > g.V (" li lei "). Repeat (out (" friend "). SimplePath () where (without (' 1 hop ')). The store (' 1 hop ')). Times (2). The path () by (" name "). The limit (100) gremlin > [li lei, xiao Ming, little wisdom] [li lei, han meimei, little wisdom] [li lei, han meimei, little chardonnay]Copy the code

cypher

Match (a) - [: friend] - > (a) (d) where id = 'li lei' with a, collect(d) as neighbor match (a)-[:friend]-(b)-[:friend]-(c) where not (c in neighbor) return a.name, b.name, C.n ame / / return sample [{" row ": [" li lei", "Ming", "little wisdom"], "meta" : [null, null, null]}, {" row ": [" li lei", "han meimei", "little wisdom"], "meta" : Null, null, null}, {" row ": [" li lei", "han meimei", "little chardonnay]", "meta" : [null, null, null]}]Copy the code

path query

{ "repeat": [ { "operator": "outV", "edge_filter": { "property_filter": { "leftvalue": { "label_name": "LabelName"}, "predicate" : "=", "rightvalue" : {" value ":" friend "}}}}], "times" : 2, "are:" [] "li lei", "by" : [{"id": true},{"id": true},{"id": true}], "select": ["v0", "v1", "v2"] }Copy the code
{" select ": [[" li lei", "Ming", "little wisdom"], [" li lei ", "han meimei", "little wisdom"], [" li lei ", "han meimei", "misty"]]}Copy the code

Case 2-(k=2 loop)

Developer Li Lei maintains his own graph database. How can he query the graph database for his friends who are older than 30 and follow each other?

Back to the beginning of the article, how can we help Li Lei complete his query? It really boils down to a loop problem.

We need to get Li Lei’s two-way friends. That is, the loop starting from Li Lei whose k=2 and edge filter condition is age>30,friend.

The following shows queries using Gremlin, CYPher, Path Query and loop algorithm respectively, all of which can help Li Lei achieve his goals.

gremlin

Gremlin > g.V (" li lei "). OutE () from the (' age, gt (30)). The otherV () where (out (' friend '.) (hasId (' li lei))). The limit (100).Copy the code

cypher

Match (a)-[r1]->(b)-[R2 :friend]->(c) where a.id =' r1.age>30 and a=c return ID (b) limit 100 PATH query POST Ges/v1.0 / {projectId} / graphs / {graphName} / action? action_id=path-query { "repeat": [ { "operator": "outV", "edge_filter": { "property_filter": { "leftvalue": { "property_name": "age" }, "predicate": ">", "rightvalue": { "value": "30" } } } }, { "operator": "outV", "edge_filter": { "property_filter": { "leftvalue": { "label_name": "label_name" }, "predicate": "=", "rightvalue" : {" value ":" friend "}}}}], "limit" : 100, the "times" : 2, "are:" [] "li lei", "by" : [{" id ": true}], "select": ["v1"] }Copy the code

The loop algorithm

{
    "algorithmName": "filtered_circle_detection",
    "parameters": {
        "sources": "李雷",
        "n": 100
    },
    "filters": [
        {},
        {
            "operator": "out",
            "edge_filter": {
                "property_filter": {
                    "leftvalue": {
                        "property_name": "age"
                    },
                    "predicate": ">",
                    "rightvalue": {
                        "value": "30"
                    }
                }
            }
        },
        {
            "operator": "out",
            "edge_filter": {
                "property_filter": {
                    "leftvalue": {
                        "label_name": "label_name"
                    },
                    "predicate": "=",
                    "rightvalue": {
                        "value": "friends"
                    }
                }
            }
        }
    ]
}
Copy the code

This document is published by Huawei Cloud.