This is a hands-on post written by community user Fan Fan on how to perform Nebula performance testing and performance tuning for the data import section. The word “I” below refers to numerous users.

Summary of 0.

I have been doing research work for Nebula and then performance testing for use. During this period, I consulted the officials of Nebula many times. I would like to thank them for their hard work!

I sorted out the process of my own testing, I hope to have a little inspiration to you, if you have better opinions, please do not hesitate to give advice!

1. Deploy the Nebula cluster

First of all, we prepared 4 physical machines: 1, 2, 3, 4, each configuration, CPU: 96C, memory: 512G, disk: SSD. Machine allocation:

  • 1: Meta, Storage
  • 2: storage
  • 3: storage
  • 4: graphd

The installation process will not be detailed, but it uses RPM. Other plugins: nebula-import-2.0, nebula-bench-2.0. Download the source code and compile it. Install it on node 4.

2. Import data

The data structure imported in this time includes 7 point types and 15 edge types. The data volume is not large and the structure is very simple. The total data volume is 3400W, but it needs to be processed into so many point and edge tables in advance.

Create space and set vid=100, replica_factor=3, and partition_num=100.

Optimized data import from Nebula-Importer

It uses Nebula-importer to import and directly open the importer, and the speed is only 3W /s, which is too slow. Concurrency, ChannelBufferSize, BatchSize are the only parameters used

First adjust to try it, just change the change, the effect is not obvious, post to consult the big guy. Nebula-import 2.0 is too slow to import, so change the YAML parameter first

Concurrency: 96 # CPU Core Count: 20,000 BatchSize: 2500

The speed is about 7-8W. Well, it does look a lot faster. If you make it a little bigger, the GraphD collapses.

Then check the disk and network again, actually use mechanical disk and kilomabyte network… Change to SSD, and then switch to ten thousand M network, the speed is directly improved more than double, about 17W /s, it seems that the hardware is still very important.

And then I wondered if it had anything to do with the data, and I noticed that vid and partition_num, vid was long and I wanted to make it shorter but I couldn’t change it, because it was that long, and then the partition_num, and I looked at the official statement, 2-10 times the disk size, changed it to 15, and it did make a difference, The speed reached 25W /s. To here also calculate more satisfied, may be modified again will also have promotion, but has met the requirements come to an end first.


  • Concurrency is set to CPU cores, ChannelBufferSize and BatchSize as large as possible, but cannot exceed the load of the cluster.
  • Hardware to use SSD and 10,000 M network
  • The partition partition_num of space should be reasonable, not too many
  • I guess the VID length, the number of attributes, the number of GraphD all have an effect, but I haven’t tried that yet

3. Stress tests

Based on the metrics used in the business, one is selected for testing. Indicators are as follows:

match (v:email)-[:emailid]->(mid:id)<-[:phoneid]-(phone:phone)-[:phoneid]->(ids:id) where id(v)=="replace" with v, count(distinct phone) as pnum,count(distinct mid) as midnum,count(distinct ids) as idsnum , sum(ids.isblack) as black  where pnum > 2 and midnum>5 and midnum < 100 and idsnum > 5 and idsnum < 300 and black > 0 return v.value1, true as result

This statement is a third degree diffusion + condition judgment, and the number of points involved in the dataset is between 200 and 400.

The official Nebula-bench needs to make a few changes, opening the jmter’s go_step. JMX configuration file, changing threadgroup.num_threads to the CPU core, and then other parameters such as loop, NGQL set to the actual situation, Variables inside NGQL are replaced with replace.

Since the test data is relatively concentrated, the test result of this part is 700/s. If the data is expanded to all nodes, it will reach 6000+/s. Concurrency appears to be OK, and query speed is OK, up to 300ms.

Since it was a single node, I wanted to add 1 Graphd to test whether the concurrency was improved. Then I directly started a Graphd process, and the test result showed no improvement.

Then I saw the release of 2.0.1, so I rebuilt the cluster and reimported the data. Using 3 Graphd machines, the performance was directly tripled, the centralized data reached 2100+/s, and the total nodes reached nearly 2W. So oddly enough, see the forum post Nebula-Bench 2.0 to add Graph nodes and not be able to upload them.

It is speculated that there is no blance or Compact after the increase of GRAPHD, so you can try it at any time.

In addition, because some monitoring components are not used, only Linux commands are used to view, so we do not get too exact machine state information.


  • Before testing, make sure the cluster is load-balanced and do a good job of compact
  • Adjust the storage configuration appropriately to increase the number of available threads and the size of the cache memory
  • Concurrency has a lot to do with the data, the simple number is meaningless, need to combine their own data distribution.

4. Configuration

Below, I will directly paste the parameters I modified. Meta and graphd are all configured by default, and there is nothing special to be modified. I will just paste the storage and explain.

RockSDB_BLOCK_Cache =102400 # Num_IO_Threads =48 # Num_IO_Threads =48 # Number of threads available Min_vertices_per_bucket =100 # Min_vertices_bucket_exp =8 # The total number of buckets is 2 ^ 8 WAL_BUFFER_SIZE =16777216  # 16 M write_buffer_size:268435456 # 256 M

The parameters here are based on browsing the various posts and to the official code search, not necessarily particularly accurate, but also groping to, other parameters are not specially modified. There are a lot of parameters that are not exposed, and it is not recommended to modify them arbitrarily, so if you need to know, you should go to the GitHub source code.

At the end

Overall, this test was not particularly specialized, but Nebula also showed very good results for specific business scenarios. The adjustment of specific parameters is not thoroughly studied and needs to be studied later in use. Please feel free to voice your opinions if you have good ideas about tuning.

AC Graph Database Technology? Sign up for Nebula Exchange, NUC·2021 Sign up for Portal, we will be waiting for you in Beijing ~~