• -Powering PHP With JanusGraph
  • By Don Omondi
  • Translation from: The Gold Project
  • This article is permalink: github.com/xitu/gold-m…
  • Translator: GanymedeNil
  • Proofreader: allenlongbaobao

JanusGraph helps PHP

As JanusGraph’s popularity grows, there’s no doubt that developers are building tools around it. In this post from ComposeWrite StuffDon Omondi, founder and CTO of Campus Discounts, talks about developing his new PHP library for JanusGraph and shares how to use it.

In the world of programming languages, PHP doesn’t need much introduction. It was released in version 1.0 in 1995. PHP is now the backbone of a number of unicorns, most notably Facebook and, more recently, Slack. As of September 2017, W3Techs reports that 82.8% of all known web sites use PHP as a server-side programming language!

JanusGraph is a newcomer to the database world, but it has a deep technical heritage because it builds on Titan, the former leader of open source graphics databases. To give you some background on graph databases, see Graph Databases Introduction. While JanusGraph is still young, it’s already being used by a well-known unicorn — Uber.

So the big question is, how do you build a unicorn using PHP and JanusGraph? Believe me, I wish I knew the answer! But what if the question is how to use JanusGraph to enhance PHP? I know there’s more than one way.

Introduction to gremlin-ogm PHP library

The Gremlin-Ogm PHP library is an object graph mapper for Tinkerpop 3+ compliant graph databases (JanusGraph, Neo4j, etc.), allowing you to save data and run Gremlin queries.

The library is already hosted on Packagist, so you can easily install it using Composer.

composer require the-don-himself/gremlin-ogm  
Copy the code

Using the library is also easy because it has a lot of PHP comments. But before we get started, let’s delve into some of the problems you might encounter when using a graph database like JanusGraph and how the library can help you avoid them.

Matters needing attention

First, all attributes with the same name must have the same data type. If you already have data in a different database, such as MySQL or MongoDB, you might encounter this situation.

A good example is the field called ID in each entity class or document. Some ids may be of an integer data type (1, 2, 3, etc.), others may be strings (for example, ids EN_1, ES_1, fr_1 in the FAQ library), Another example is MongoDB’s UUID (example 59be8696540bbb198c0065C4). Using the same property name for these different data types raises an exception. The Gremlin-ogm library will find such a conflict and refuse to execute. As a solution, I suggest combining the tag with the word ID; For example, the user’s identifier is changed to users_ID. The library comes with a serializer that allows you to map fields to virtual properties to avoid this conflict.

Second, property names, edge tags, and vertex tags must all be unique across the graph. For example, mark Vertex as tweets and reference an object, then create an Edge mark as tweets and reference user actions, or create a Property tweets in Users Vertex to reference the number of tweets a user has sent. The library will also find this conflict and refuse to execute.

Third, for performance and schema validity, I recommend making sure that each element, or at least each vertex, contains a unique attribute on which a unique composite index (also known as a key index) will be created. This ensures that all elements are unique and improves performance because adding edges between vertices first requires a query to see if they existed first. The library allows you to tag properties with @ID annotations for this purpose.

Finally, the index. This is worth a book or two. In JanusGraph, you’re basically indexing properties (it’s a property graph, after all), but you can use the same property names for different vertices and edges. Do this with great care. Remember the first thing to notice. So, for example, by default, the index on the attribute total_comments will span all vertices and edges. A query for vertices where total_comments is greater than 5 returns a mixture of users total_comments > 5, blog posts total_comments > 5, and any other vertices that satisfy the query. Worse, after a while, if you add a total_comments attribute to your recipes vertex, your existing queries will be wrong.

To prevent the potential problems mentioned above, JanusGraph allows you to set label parameters when you create an index to limit its scope. I recommend doing this to keep indexes smaller and higher performance, but it means that you must provide a unique name for each index. The Gremlin-ogm library looks for any conflicting index names and rejects execution if found.

How to use Gremlin-ogm

To start using Gremlin-ogm, we first need to create a directory called Graph in our source folder, such as SRC /Graph. Within this directory we need to create two different directories: one called Vertices and one called Edges. These two directories will now contain the PHP classes that define the elements of our diagram.

Each class in the vertex folder mainly uses annotations to describe vertex tags, associated indexes, and attributes. For more advanced use cases, if you use MongoDB and have a class that holds embedded documents (such as a collection of annotations), you can also define the most appropriate embedding edge.

Each class in the edge folder is also annotated to describe the edge tags, associated indexes, and attributes. Two attributes in each edge class can also be marked with annotations, one describing where vertices are linked from and the other describing where vertices are linked to. It’s really simple to use, but let’s use an example.

A practical example: Twitter

Twitter and graphic databases are really made for each other. Objects like users and tweets can form vertices, while actions like follow, likes, verifications, and retweets can form edges. Note that Edge twitter is named this way to avoid conflicts with vertex tweets. A graphical representation of this simple model can be seen in the figure below.

Let’s create the corresponding classes in the Graph/Vertexes folder and the Graph/Edges folder. The tweets class might look like this:

<? php namespace TheDonHimself\GremlinOGM\TwitterGraph\Graph\Vertices; use JMS\Serializer\Annotation as Serializer; use TheDonHimself\GremlinOGM\Annotation as Graph; /** * @Serializer\ExclusionPolicy("all")
* @Graph\Vertex(
* label="tweets",
* indexes={
* @Graph\Index(
* name="byTweetsIdComposite",
* type="Composite",
* unique=true,
* label_constraint=true,
* keys={
* "tweets_id"
* }
* ),
* @Graph\Index(
* name="tweetsMixed",
* type="Mixed",
* label_constraint=true,
* keys={
* "tweets_id" : "DEFAULT",
* "text" : "TEXT",
* "retweet_count" : "DEFAULT",
* "created_at" : "DEFAULT",
* "favorited" : "DEFAULT",
* "retweeted" : "DEFAULT",
* "source" : "STRING"
* }
* )
* }
* )
*/
class Tweets  
{
 /**
 * @Serializer\Type("integer")
 * @Serializer\Expose
 * @Serializer\Groups({"Default"})
 */
 public $id;
 /**
 * @Serializer\VirtualProperty
 * @Serializer\Expose
 * @Serializer\Type("integer")
 * @Serializer\Groups({"Graph"})
 * @Serializer\SerializedName("tweets_id")
 * @Graph\Id
 * @Graph\PropertyName("tweets_id")
 * @Graph\PropertyType("Long")
 * @Graph\PropertyCardinality("SINGLE")
 */
 public function getVirtualId()
 {
 return self::getId();
 }
 /**
 * @Serializer\Type("string")
 * @Serializer\Expose
 * @Serializer\Groups({"Default"."Graph"})
 * @Graph\PropertyName("text")
 * @Graph\PropertyType("String")
 * @Graph\PropertyCardinality("SINGLE")
 */
 public $text;
 /**
 * @Serializer\Type("integer")
 * @Serializer\Expose
 * @Serializer\Groups({"Default"."Graph"})
 * @Graph\PropertyName("retweet_count")
 * @Graph\PropertyType("Integer")
 * @Graph\PropertyCardinality("SINGLE")
 */
 public $retweet_count;
 /**
 * @Serializer\Type("boolean")
 * @Serializer\Expose
 * @Serializer\Groups({"Default"."Graph"})
 * @Graph\PropertyName("favorited")
 * @Graph\PropertyType("Boolean")
 * @Graph\PropertyCardinality("SINGLE")
 */
 public $favorited;
 /**
 * @Serializer\Type("boolean")
 * @Serializer\Expose
 * @Serializer\Groups({"Default"."Graph"})
 * @Graph\PropertyName("retweeted")
 * @Graph\PropertyType("Boolean")
 * @Graph\PropertyCardinality("SINGLE")
 */
 public $retweeted;
 /**
 * @Serializer\Type("DateTime<'', '', 'D M d H:i:s P Y'>")
 * @Serializer\Expose
 * @Serializer\Groups({"Default"."Graph"})
 * @Graph\PropertyName("created_at")
 * @Graph\PropertyType("Date")
 * @Graph\PropertyCardinality("SINGLE")
 */
 public $created_at;
 /**
 * @Serializer\Type("string")
 * @Serializer\Expose
 * @Serializer\Groups({"Default"."Graph"})
 * @Graph\PropertyName("source")
 * @Graph\PropertyType("String")
 * @Graph\PropertyCardinality("SINGLE")
 */
 public $source;
 /**
 * @Serializer\Type("TheDonHimself\GremlinOGM\TwitterGraph\Graph\Vertices\Users")
 * @Serializer\Expose
 * @Serializer\Groups({"Default"})
 */
 public $user;
 /**
 * @Serializer\Type("TheDonHimself\GremlinOGM\TwitterGraph\Graph\Vertices\Tweets")
 * @Serializer\Expose
 * @Serializer\Groups({"Default"})
 */
 public $retweeted_status;
 /**
 * Get id.
 *
 * @return int
 */
 public function getId()
 {
 return $this->id; }}Copy the code

The Twitter API is very expressive, although we can actually hold much more data than the vertex classes allow. For this example, however, we are only interested in a few properties. The comments above will tell the serializer to populate these fields only when deserializing The Twitter API data into vertex-class objects.

Create a similar class for the Users vertex. The complete sample code is in the TwitterGraph folder in the library.

In the Graph/Edges folder we can create an example Follows edge class that looks like this:

<? php namespace TheDonHimself\GremlinOGM\TwitterGraph\Graph\Edges; use JMS\Serializer\Annotation as Serializer; use TheDonHimself\GremlinOGM\Annotation as Graph; /** * @Serializer\ExclusionPolicy("all")
* @Graph\Edge(
* label="follows",
* multiplicity="MULTI"
* )
*/
class Follows  
{
 /**
 * @Graph\AddEdgeFromVertex(
 * targetVertex="users",
 * uniquePropertyKey="users_id",
 * methodsForKeyValue={"getUserVertex1Id"}
 * )
 */
 protected $userVertex1Id;
 /**
 * @Graph\AddEdgeToVertex(
 * targetVertex="users",
 * uniquePropertyKey="users_id",
 * methodsForKeyValue={"getUserVertex2Id"}
 * )
 */
 protected $userVertex2Id;
 public function __construct($user1_vertex_id.$user2_vertex_id)
 {
 $this->userVertex1Id = $user1_vertex_id;
 $this->userVertex2Id = $user2_vertex_id;
 }
 /**
 * Get User 1 Vertex ID.
 *
 *
 * @return int
 */
 public function getUserVertex1Id()
 {
 return $this->userVertex1Id;
 }
 /**
 * Get User 2 Vertex ID.
 *
 *
 * @return int
 */
 public function getUserVertex2Id()
 {
 return $this->userVertex2Id; }}Copy the code

Create similar classes for likes, veres, and retweets edges. Once finished, we can check the validity of the model by running the following command:

php bin/graph twittergraph:schema:check  
Copy the code

If exceptions are thrown, we need to resolve them first; Otherwise, our model is already set up, and now all we need to do is tell JanusGraph.

JanusGraph connection

TheDonHimself\GremlinOGM\GraphConnection class is responsible for initializing the GraphConnection. You can do this by creating a new instance and passing some connection options in the array.

$options = [
 'host'= > 127.0.0.1,'port'= > 8182,'username' => null,
 'password' => null,
 'ssl' = [
 'ssl_verify_peer'= >false.'ssl_verify_peer_name'= >false].'graph'= >'graph'.'timeout'= > 10,'emptySet'= >true.'retryAttempts'= > 3,'vendor' = [
 'name' => _self', 'database'= >'janusgraph', 'version'= >'0.2', 'twitter'= > ['consumer_key'= >'LnUQzlkWlNT4oNUh7a2rwFtwe', 'consumer_secret'= >'WCIu0YhaOUBPq11lj8psxZYobCjXpYXHxXA6rVcqbuNDYXEoP0', 'access_token'= >'622225192-upvfXMpeb9a3FMhuid6oBiCRsiAokpNFgbVeeRxl', 'access_token_secret'= >'9M5MnJOns2AFeZbdTeSk3R81ZVjltJCXKtxUav1MgsN7Z']];Copy the code

The vendor array can specify vendor-specific information such as the Gremlin-compatible database, version, service host name (or _self native), and the name of the graph.

To finally create the model, we will run this command.

php bin/graph twittergraph:schema:create  
Copy the code

This command will require an optional configPath parameter, which is the location of the YAML configuration file that contains the options array when the connection is established. The library has three sample configurations in the root folder, JanusGraph. Yaml, JanusGraphCompose. Yaml and Azure cosmosdb.yaml.

The above command iterates recursively through our TwitterGraph/Graph directory and looks for all @graph annotations to build the model definition. An exception will be thrown if found; Otherwise, it will start a Graph transaction to commit all the attributes, edges, and vertices at once, or roll back if it fails.

The same command will ask you if you want to run a dry run. If specified, the command is not sent to the Gremlin server, but dumped to a command-groovy file that you can check. For the Twitter example, these 26 lines are commands to send or dump according to your configuration (such as Janusgraph _self native).

mgmt = graph.openManagement()  
text = mgmt.makePropertyKey('text').dataType(String.class).cardinality(Cardinality.SINGLE).make()  
retweet_count = mgmt.makePropertyKey('retweet_count').dataType(Integer.class).cardinality(Cardinality.SINGLE).make()  
retweeted = mgmt.makePropertyKey('retweeted').dataType(Boolean.class).cardinality(Cardinality.SINGLE).make()  
created_at = mgmt.makePropertyKey('created_at').dataType(Date.class).cardinality(Cardinality.SINGLE).make()  
source = mgmt.makePropertyKey('source').dataType(String.class).cardinality(Cardinality.SINGLE).make()  
tweets_id = mgmt.makePropertyKey('tweets_id').dataType(Long.class).cardinality(Cardinality.SINGLE).make()  
name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()  
screen_name = mgmt.makePropertyKey('screen_name').dataType(String.class).cardinality(Cardinality.SINGLE).make()  
description = mgmt.makePropertyKey('description').dataType(String.class).cardinality(Cardinality.SINGLE).make()  
followers_count = mgmt.makePropertyKey('followers_count').dataType(Integer.class).cardinality(Cardinality.SINGLE).make()  
verified = mgmt.makePropertyKey('verified').dataType(Boolean.class).cardinality(Cardinality.SINGLE).make()  
lang = mgmt.makePropertyKey('lang').dataType(String.class).cardinality(Cardinality.SINGLE).make()  
users_id = mgmt.makePropertyKey('users_id').dataType(Long.class).cardinality(Cardinality.SINGLE).make()  
tweets = mgmt.makeVertexLabel('tweets').make()  
users = mgmt.makeVertexLabel('users').make()  
follows = mgmt.makeEdgeLabel('follows').multiplicity(MULTI).make()  
likes = mgmt.makeEdgeLabel('likes').multiplicity(MULTI).make()  
retweets = mgmt.makeEdgeLabel('retweets').multiplicity(MULTI).make()  
tweeted = mgmt.makeEdgeLabel('tweeted').multiplicity(ONE2MANY).make()  
mgmt.buildIndex('byTweetsIdComposite', Vertex.class).addKey(tweets_id).unique().indexOnly(tweets).buildCompositeIndex()  
mgmt.buildIndex('tweetsMixed',Vertex.class).addKey(tweets_id).addKey(text,Mapping.TEXT.asParameter()).addKey(retweet_count).addKey(created_at).addKey (retweeted).addKey(source,Mapping.STRING.asParameter()).indexOnly(tweets).buildMixedIndex("search")  
mgmt.buildIndex('byUsersIdComposite',Vertex.class).addKey(users_id).unique().indexOnly(users).buildCompositeIndex()  
mgmt.buildIndex('byScreenNameComposite',Vertex.class).addKey(screen_name).unique().indexOnly(users).buildCompositeIndex()  
mgmt.buildIndex('usersMixed',Vertex.class).addKey(users_id).addKey(name,Mapping.TEXTSTRING.asParameter()).addKey(screen_name,Mapping.STRING.asParame ter()).addKey(description,Mapping.TEXT.asParameter()).addKey(followers_count).addKey(created_at).addKey(verified).addKey (lang,Mapping.STRING.asParameter()).indexOnly(users).buildMixedIndex("search")  
mgmt.commit()  
Copy the code

Now that we have a working model setup, all we need is the data. The Twitter API is well documented on how to request this data. The Gremlin-ogm library comes with a TwitteroAuth package (Abraham/TwitteroAuth) and a ready-to-use Twitter application to test the library and get you started.

After fetching data from the API, keeping vertices simple. First, deserialize the JSON to the corresponding vertex-class objects. So, for example, @twitterdev fetching Twitter data/API /users/show will be deserialized as shown in var_dump().

object(TheDonHimself\GremlinOGM\TwitterGraph\Graph\Vertices\Users){# 432 (8)
 ["id"]=>
 int(2244994945)
 ["name"]=>
 string(10) "TwitterDev"
 ["screen_name"]=>
 string(10) "TwitterDev"
 ["description"]=>
 string(136) "Developer and Platform Relations @Twitter. We are developer advocates. We can't answer
all your questions, but we listen to all of them!"  
 ["followers_count"]=>
 int(429831)
 ["created_at"]=>
 object(DateTime)# 445 (3) {
 ["date"]=>
 string(26) "The 2013-12-14 04:35:55. 000000"
 ["timezone_type"]=>
 int(1)
 ["timezone"]=>
 string(6) "+ 00:00"
 }
 ["verified"]=>
 bool(true)
 ["lang"]=>
 string(2) "en"
}
Copy the code

Serialized PHP objects are now beginning to form in their respective vertices and edges. However, we can only send the Gremlin command as a string, so we still need to serialize the object to a command string. We’ll use a conveniently named class GraphSerializer to do this. The deserialized object is passed to an instance of GraphSerializer, which will handle complex serialization such as stripping new rows, adding slashings, and converting PHP DateTime to the format expected by JanusGraph. The GraphSerializer also gracefully handles Geopoint and Geoshape serialization.

// Get Default Serializer
$serializer = SerializerBuilder::create()->build();
// Get Twitter User
$decoded_user = $connection->get(
 'users/show',
 array(
 'screen_name'= >$twitter_handle.'include_entities'= >false));if(= = 404$connection->getLastHttpCode()) {  
 $output->writeln('Twitter User @'.$twitter_handle.' Does Not Exist');
 return;
}
// Use default serializer to convert array from Twitter API to Users Class Object handling complex
deserialization like Date Time  
$user = $serializer->fromArray($decoded_user, Users::class);
// Initiate Special Graph Serializer
$graph_serializer = new GraphSerializer();
// Use graph serializer to convert Users Class Object to array handling complex deserialization like
Geoshape  
$user_array = $graph_serializer->toArray($user);
// Use graph serializer to convert array to a gremlin command string ready to be sent over
$command = $graph_serializer->toVertex($user_array);
Copy the code

GraphSerializer Output The command that strings Gremlin into the string. The string is ready to be sent to the JanusGraph server. So in the example above, it becomes:

"g.addV(label, 'users', 'users_id', 2244994945, 'name', 'TwitterDev', 'screen_name', 'TwitterDev', 'description', 'Developer and Platform Relations @Twitter. We are developer advocates. We can\'t answer all your questions, but we listen to all of them! ', 'followers_count', 429831, 'created_at', 1386995755000, 'verified', true, 'lang', 'en')"
Copy the code

Saving edges is a little easier because it assumes that a fixed point exists. Therefore, the library needs to know the property key-value pairs to find them. In addition, edges have orientation and multiplicity in the graph database. So it’s very important that the edges are added to the vertices.

This is the purpose of the @graph \AddEdgeFromVertex and @graph \AddEdgeToVertex property annotations in the Edge class. They all extend the @graph \AddEdge annotation to indicate the target vertex class as well as the attribute key and the array of methods needed to get the value.

Suppose we have queried tweets in the Twitter API, which contains an embedded field called user that holds the tweeter data. If users_id:5 creates tweets_id:7, then the serialized Gremlin command will look like this:

if (g.V().hasLabel('users').has('users_id',5).hasNext() == true  
   && g.V().hasLabel('tweets').has('tweets_id',7).hasNext() == true) 
     { 
       g.V().hasLabel('users').has('users_id',5).next().addEdge('tweeted', 
         g.V().hasLabel('tweets').has('tweets_id',7).next()) 
     }
Copy the code

Therefore, the two vertex query is a transaction, and then two edges are created between users and tweets. Note that because a user can tweet multiple times, but each tweet can only have one owner, the repeatability is ONE2MANY.

If the edge classes have properties like tweeted_on or tweeted_from, the library serializes them appropriately just like vertices.

JanusGraph query

We processed the captured and saved data. Data queries are also done with the help of the library. The TheDonHimself\Traversal\TraversalBuilder class provides a native API that matches Gremlin almost perfectly. For example, fetching users in the TwitterGraph can be done as follows.

$user_id = 12345;
$traversalBuilder = new TraversalBuilder();
$command = $traversalBuilder
 ->g()
 ->V()
 ->hasLabel("'users'")
 ->has("'users_id'"."$user_id")
 ->getTraversal();
Copy the code

A slightly more complex example, such as getting a user’s timeline, can be accomplished in the following ways.

$command = $traversalBuilder
 ->g()
 ->V()
 ->hasLabel("'users'")
 ->has("'screen_name'"."'$screen_name'")
 ->union(
 (new TraversalBuilder())->out("'tweeted'")->getTraversal(),
 (new TraversalBuilder())->out("'follows'")->out("'tweeted'")->getTraversal()
 )
 ->order()
 ->by("'created_at'".'decr') - >limit(10)
 ->getTraversal();
Copy the code

The detailed steps can be found in the \TheDonHimself\Traversal\Step class.

GraphQL to Gremlin

There has been a separate attempt to create a standard that supports the GraphQL to Gremlin command. It is at an early stage and supports only queries, not changes. Since I also wrote it, the Gremlin-Ogm library certainly supports this standard, and hopefully it will improve over time.

JanusGraph visualization

Sadly, it doesn’t have as many Graph Database GUIs as relational, document, and key-value databases. One, Gephi, can be used to visualize JanusGraph data and queries through streaming plug-ins. In the meantime, write a data browser with JanusGraph, which you can use to display some queries of the TwitterGraph.

Visualize the five users I follow

def graph = ConfiguredGraphFactory.open("twitter");  
def g = graph.traversal();  
g.V().hasLabel('users').has('screen_name',  
   textRegex('(i)the_don_himself')).outE('follows').limit(5).inV().path()
Copy the code

Visualize 5 users who follow me

def graph = ConfiguredGraphFactory.open("twitter");  
def g = graph.traversal();  
g.V().hasLabel('users').has('screen_name',  
    textRegex('(i)the_don_himself')).inE('follows').limit(5).outV().path()
Copy the code

Visualize my five favorite tweets

def graph = ConfiguredGraphFactory.open("twitter");  
def g = graph.traversal();  
g.V().hasLabel('users').has('screen_name',  
    textRegex('(? i)the_don_himself')).outE('likes').limit(5).inV().path()
Copy the code

Visualization of any 5 retweets and the original push

def graph = ConfiguredGraphFactory.open("twitter");  
def g = graph.traversal();  
g.V().hasLabel('tweets').outE('retweets').inV().limit(5).path()  
Copy the code

Now you have it. A powerful, thoughtful, and easy to use library that will help you get started working with JanusGraph in PHP in minutes. If you’re using the amazing Symfony framework, you’re in even better luck. An upcoming package, Gremlin-Ogm-Bundle, will help you copy data from an RDBMS or MongoDB into a Tinkerpop 3+ compatible graphics database. Please enjoy!


Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.