NebulaGraphCommunity first published by NebulaGraph, Follow NebulaGraphCommunity to take a look at technical practice in the dachang library.

In the previous article, we walked through the evolution of Nebula Graph’s integration testing. This article describes the process of adding a use case to the test suite and successfully running all the test cases.

Environment to prepare

When we first built the 2.0 test framework, we customized several utility classes to help the test framework quickly start and stop a single-node Nebula service with the ability to check for port conflicts, modify some configuration options, and more. The original execution process is as follows:

  1. Launching Nebula’s services through python scripts;
  2. callpytest.mainExecute all test cases concurrently;
  3. Nebula is out of service.

The inconveniences are that when you need to specify some parameter options for PyTest, you need to pass that parameter transparently to PyTest. main, and running a single test case each time requires a script generated by cmake, which is not very convenient. We want to “test where the test cases are”.

The service start

In the transformation process of the test framework, we changed the program entrance, most of the original packaged logic reuse. With Nebula’s current portfolio of use cases, a single-process run was no longer sufficient for rapid iteration, and after trying other parallel plug-ins for compatibility, we chose the Pytest-Xdist plug-in to speed up the testing process.

But PyTest provides only four scope fixtures: Session, Module, Class, and Function. We wanted a global-level fixture to start and initialize nebula services. Currently, the highest session level is still executed once per runner, so with eight Runners, you have to launch eight Nebula services, which is not what we want.

Refer to the documentation for Pytest-Xdist, which requires file locking for parallel control between different runners. To keep the control logic simple, we separated the start-stop and preparation logic of the program from the process of executing the tests, using separate steps to control nebula’s startup, and connecting to the tested services separately through Nebula Console for further validation debugging when there are problems with some of the tests.

Data import

Prior to this, Nebula’s data import process directly executed a concatenated nGQL INSERT statement. In doing so, there are the following problems:

  1. When the test data set is large, INSERT statements become lengthy and client execution times out.
  2. It is difficult to develop new test data sets, so existing CSV data files need to be constructed into corresponding nGQL statement files.
  3. The same data set cannot be reused. For example, if you want the same CSV to be imported into a space of different VID types for testing, you need to construct different INSERT statements.

With reference to the nebula-importer implementation, we re-implemented the Python version of the import module by separating the import logic from the data set. However, only CSV data files can be imported, and only one TAG /edge type can be stored in each CSV file.

After refactoring the imported logic, the current Nebula test data set becomes clear:

Nebula - graph/tests/data ├ ─ ─ basketballplayer │ ├ ─ ─ bachelor. CSV │ ├ ─ ─ config. The yaml │ ├ ─ ─ like the. CSV │ ├ ─ ─ player. The CSV │ ├ ─ ─ Serve. CSV │ ├ ─ ─ team. CSV │ └ ─ ─ teammate. CSV ├ ─ ─ basketballplayer_int_vid │ └ ─ ─ config. The yaml └ ─ ─ student ├ ─ ─ config. The yaml ├ ─ ─ is_colleagues. CSV ├ ─ ─ is_friend. CSV ├ ─ ─ is_schoolmate. CSV ├ ─ ─ is_teacher. CSV ├ ─ ─ person. CSV ├ ─ ─ student. CSV └ ─ ─ teacher.csv 3 directories, 16 filesCopy the code

Each directory contains a space all the CSV data file, through the config in the directory. The yaml to configure a description of each file, as well as detailed information space. With this configuration information, we also realize that the basketballPlayer and basketballplayer_int_vid Spaces share the same data. If you want to add new test data sets in the future, you can simply add a data directory like BasketballPlayer. See repo for details of config.yaml.

Install dependencies

In addition to the usual PyTest and Nebula Python libraries, current testing frameworks also use plug-ins such as Pytest-bDD and Pytest-Xdist. In addition, in order to better unify the format of the test case feature file, we introduced the community’s reformat-Gherkin tool, and made some format adjustments based on this to keep the format uniform with openCypher TCK feature file.

With nebula Python and Reformat-Gherkin installed directly from the source code, makefiles are available under Nebula Graph/Tests to simplify the process for users. All of the environment preparation to perform the test is simply to execute the command:

$ cd nebula-graph/tests && make init-all
Copy the code

We have also integrated the above format checking into the GitHub Action CI process. If the test file format is not as expected, users can use the make FMT command to do local formatting.

Write a use case

As described in the previous article, Nebula’s integration tests are now “black box” tests where users no longer have to worry about how their own statements are called and what functions are called that match their expectations. Just describe your use case in the feature file in a way that approximates “natural language” according to the agreed specifications. Here is an example of a test case:

Feature: Variable length pattern match (m to n)
  Scenario: both direction expand with properties
    Given a graph with space named "basketballplayer"
    When executing query:
      """
      MATCH (:player{name:"Tim Duncan"})-[e:like*2..3{likeness: 90}]-(v)
      RETURN e, v
      """
    Then the result should be, in any order, with relax comparison:
      | e                                                                                  | v                  |
      | [[:like "Tim Duncan"<-"Manu Ginobili"], [:like "Manu Ginobili"<-"Tiago Splitter"]] | ("Tiago Splitter") |
Copy the code

Given provides the initial conditions for the test, where a space called “BasketballPlayer” is initialized. When describes the test’s input, the nGQL statement. Then gives the expected result and the way in which the expected comparison is made, where the results in the unordered loose comparison table are represented.

Feature file structure

Feature file is a file format described by Gherkin language, which is mainly composed of the following parts:

  • Feature: You can add the “Title” of the current file or describe the file content in detail.
  • Background: steps commonly used in subsequent Scenario;
  • Scenario: describes the Scenario of each test case step by step;
  • Examples: Further separating test Scenarios from test data simplifies writing Scenarios in the current Feature file;

Each Scenario is divided into different steps, and each step has special significance:

  • Given: set the initial conditions of the current test scene. The Background above can only contain step of the Given type.
  • When: gives the input for the current test scenario;
  • Then: Describes the expected result after completing the step of When;
  • And: can follow any type of step in Given/When/Then to further supplement the action of the above step;
  • Examples: Similar to the above Examples, but the scope of effect is limited to a single Scenario, which does not affect other Scenario tests in the same Feature file.

Steps

A Scenario Scenario is composed of a series of steps, with nebula’s adaptation of the openCypher TCK steps to facilitate test case writing:

  1. Given a graph with space named "basketballplayer": using a space with pre-imported basketballPlayer data;
  2. creating a new space with following options: Creates a new space with the following parameters: name, partition_num, replica_factor, vid_type, charset, collate, etc.
  3. load "basketballplayer" csv data to a new space: Imports the “BasketballPlayer” dataset into the new space;
  4. profiling query: Executes the query statementPROFILEThe execution plan is returned.
  5. wait 3 seconds: Wait 3 seconds. This step is used when schema-related operations require a certain amount of data synchronization time.
  6. define some list variables: Define some variables to represent List types with many elements, so that the corresponding List can be written in the expected result.
  7. the result should be, in any order, with relax comparison: Execute results for disorderly and loose comparison, meaning that the expected results will be compared with what the user wrote, and the unwritten parts will not be compared even if they are returned;
  8. the result should contain: The return result must contain the expected result;
  9. the execution plan should be: Compares the execution plan in the returned result.

In addition to the above steps, additional steps can be defined as needed to speed up test case development.

Parser

According to TCK, openCypher defines a set of graph semantic representations to express the desired return result. The pattern of the dotted edges is referenced in the MATCH query, so if you are familiar with openCypher queries, the results in the TCK test scenario can be easily understood. For example, the semantic format of some graphs is as follows:

  1. Description of point:(:L {p:1, q:"string"});
  2. Description of edge:[:T {p:0, q:"string"}];
  3. Path description:<(:L)-[:T]->(:L2)>.

There are some differences between the Nebula Graph model and Neo4J’s, however. For example, with Nebula Graph, each Tag can have its own attributes, so it is not possible to describe vertex tags with multiple attributes. There are also differences in the representation of the edges. Nebula Graph’s Edge Key is made up of quads < SRC, Type, rank, DST >, and the existing representation cannot describe the values of SRC, DST, and rank for the edges. Therefore, after considering these differences, we extended the expected results expression of the existing TCK:

  1. Point descriptions support multiple tags with attributes:("VID" :T1{p:0} :T2{q: "string"});
  2. SRC, DST, and rank can be used to describe edges:[:type "src"->"dst"@rank {p:0, q:"string"}];
  3. The path is just the representation of the points above, same as TCK.

This expansion in the dot-edge description described above is compatible with the EXISTING TCK use cases and with the Nebula Graph design. Having solved the presentation problem, the next problem is how to efficiently and unerringly transform the presentation into a concrete data structure so that it can be compared with the actual query results. After considering regular matching, parser, etc., we choose to construct a parser to handle these strings with specific syntax rules. The advantages of doing so are as follows:

  1. According to the specific syntax rules, the AST can be parsed to conform to the data structure returned by the query. When the two are compared, it is the verification of the specific fields in the specific structure.
  2. Avoid dealing with complex re match strings and reduce parsing errors;
  3. Support for other string parsing requirements, such as regular expressions, lists, collections, and so on

With pliple. yacc and pliple. lex libraries, we can implement this complex requirement with a small amount of code, as detailed in the nbv.py file.

The test process

The current testing process is as follows:

1) Write Feature files

All of Nebula Graph’s feature cases are currently located in the Tests/TCK/Features directory at github.com/vesoft-inc/nebula-graph repo.

2) Start the Nebula Graph service

$CD /path/to/ Nebula -graph/tests $make upCopy the code

3) Perform tests locally

$make FMT # format $make TCK #Copy the code

4) Discontinue the Nebula Graph service

$ mak
e down
Copy the code

debugging

When you write a use case that needs debugging, you can use pyTest’s support for further debugging, such as rerun a use case that failed in the previous procedure:

$pytest -k "match" $pytest -k "match" TCKCopy the code

You can also mark a specific scenario in the feature file and only run the marked use case, for example:

# in feature file @testmark Scenario: both direction expand with properties Given a graph with space named "basketballplayer" ... In nebula-graph/tests directory $pytest -m "testmark" TCKCopy the code

conclusion

Standing on the shoulders of a generation before Nebula is a better fit for testing, and thanks to all the open source tools and projects mentioned in this article.

In the process of practicing Pytest-BDD, some imperfections were found, such as compatibility problems with Pytest-Xdist and other plug-ins (Gherkin-Reporter). Pytest doesn’t natively provide fixtures at the global Scope level. But ultimately, the benefits to Nebula Graph far outweigh these difficulties.

As mentioned in the previous part, users are not required to program, which is not imaginary. When the above mode is fixed, a set of scaffolding can be developed to add test cases, so that users can “fill in the blanks” on the page and automatically generate corresponding feature test files, which can further facilitate users. It’s up to the interested community to try it out.

Ac graph database technology? To join the Nebula Exchange group, please fill out your Nebula card and Nebula Assistant will bring you into the group

Want to share graph database technology with other big companies? NUC 2021 Is waiting for you to communicate: NUC 2021 registration portal