DolphinDB API performance benchmark report in the sequential database

1. An overview of the

DolphinDB is a high-performance, columnar relational, time-series database written in C++ with a built-in parallel and distributed computing framework for processing real-time data and large amounts of historical data.

DolphinDB provides its own scripting language as well as apis for programming languages such as C++, Java, C#, Python, and R, making it easy to use DolphinDB in a variety of environments.

This article tests the performance of DolphinDB interface interfaces (C++, Java, C#, Python, R) in the following scenarios:

A single user uploads data to a memory table
Multiple users upload data concurrently to a distributed (DFS) database
Multiple users concurrently download data from DolphinDB to the client
Multiple users concurrently send calculations (counting the minutes-level K-lines for a stock on a given day) to DolphinDB and return the results

2. Test environment

2.1 Hardware Configuration

This test uses three servers with the same configuration (SERVER1, SERVER2, and SERVER3). The configuration of each server is as follows:

Host: PowerEdge R730xd

CPU: E5-2650 24Cores 48 threads

Memory: 512 g

Hard disk: HDD 1.8 TB x 12

Network: 10 gigabit Ethernet

OS: CentOS Linux Release 7.6.1810

2.2 Software Configuration

C + + : GCC 4.8.5

JRE: 1.8.0 comes with

C # : the.net 2.2.105

Python: 3.7.0

R: 3.5.2

DolphinDB: 0.94.2

2.3 Test Framework

DolphinDB clusters are deployed on SERVER1. The API programs run on SERVER2 and SERVER3 and are tested through DolphinDB data nodes connected to the network on SERVER1.

The DolphinDB cluster configuration is as follows:

The cluster contains one controller node and six data nodes.

Memory: 32 GB/node x 6 nodes = 192G

Threads: 8 threads/node x 6 nodes = 48 threads

Hard disk: Each node is configured with an independent HDD. 1.8 TB/node x 6 = 9.6 TB

3. Test the performance of uploading data by a single user

This section tests a single user uploading data to the DolphinDB server through the API. A DolphinDB table is created on SERVER1 and the API is run on SERVER2 to write data to the table.

Data table fields include STRING, INT, LONG, SYMBOL, DOUBLE, DATE, TIME and other different types of fields, a total of 45 columns, 336 bytes each row, a total of 1 million rows uploaded, size of about 336Mb. Test throughput and latency for 10 to 100,000 rows per upload.

Since this scenario is a single user and does not involve disk operation, it mainly tests the conversion performance of API data format to DolphinDB data format. CPU performance and network have a significant impact on the test results. The test results of each API are as follows:

Table 1. C++ API single user upload data to memory table test results

Table 2. Test results of Java API single user uploading data to memory tables

Table 3. C# API single user upload data to memory table test results

Table 4. Python API single user upload data to memory table test results

Table 5. R API single user upload data to memory table test results

Table 6. Comparison of write speed of data uploaded to memory table by each API single user (unit: Mbit/s)

Figure 1. API upload data to memory table performance comparison

According to the test results of single user writing to the memory table, performance improved significantly with the increase of batch size. This is because when the total amount of data is the same, the more data rows are uploaded at a time, the fewer times of uploading and the fewer times of network communication.

C++ has the best performance, while C# has the worst performance. Both Python and R underlying implementations are rewritten in C++, and the performance trend is the same. When the batch size is small, more C++ modules are called more times, resulting in more performance loss and worse performance. When batch sizes exceed 1000 rows, performance increases significantly. We recommend maximising the batch size when uploading data using Python and R.

4. Perform the performance test of uploading data concurrently by multiple users

This section tests the DFS data table that can be concurrently uploaded to SERVER1 by multiple users through API. Multiple users can simultaneously initiate write operations on SERVER2 and SERVER3 over the network.

Each user writes a total of 5 million lines. Each user writes 25000 lines and 336 bytes. Therefore, each user writes a total of 840Mb of data. Test the latency and throughput of concurrent write in the case of 1 to 128 concurrent users.

We split the number of users equally between SERVER2 and SERVER3. For example, when testing 16 users, each SERVER runs 8 client programs. DolphinDB tests involve concurrent writes. Data is sent over the network to SERVER1 and stored on disk. This allows you to test how well DolphinDB systems can take advantage of server CPU, hard disk, and network resources. The test results of each API are as follows:

Table 7. Test results of C++ API multi-user concurrent uploading of data to DFS tables

Table 8. Test results of Java API multi-user concurrent uploading of data to DFS tables

Table 9. C# API multi-user concurrent upload to DFS table test results

Table 10.Python API test results for multi-user concurrent uploading of data to DFS tables

Table 11.r API test results for concurrent uploading of data to DFS tables by multiple users

Table 12. Comparison of test results of uploading various API data to DFS table (unit: Mbit/s)

Figure 2. API upload data to DFS table performance comparison

The test results show that when the number of users is less than 16, the performance of C++ and Java is obviously superior, while the performance of Python and C# is slightly worse, and the throughput basically increases linearly. When the number of users exceeds 16, the network transmission reaches its limit and becomes a performance bottleneck. The throughput is basically maintained at the network limit. The network is 10 gigabit Ethernet with a maximum throughput of 1.8G/ s due to the compression of the transmitted data.

5. Performance test of multi-user concurrent data download

This section tests how fast DolphinDB data can be downloaded from multiple users using the API concurrently. The database is deployed on SERVER1. Multiple users download data on SERVER2 and SERVER3 at the same time. Each user randomly selects a data node to connect to. Each user downloads 5 million lines of data, with each line containing 45 bytes and a total of 225Mb. Each user downloads 25000 lines of data. The concurrent performance is tested when the number of concurrent users ranges from 1 to 128.

We tested the performance of concurrent client data download in the following two scenarios:

Data volume of 5 years: Date and Symbol are randomly selected from the data of 5 years, and the data volume involved is about 12T. Since the amount of data greatly exceeds the system memory, each download needs to load data from disk;
Data volume of 1 week: Randomly select symbols from the data of the latest week to download, and the data volume involved is about 60G. DolphinDB is allocated enough memory to hold 60GB of data, all in the cache, so you do not need to load data from disk for each download.

The performance test results of each API are as follows:

Table 13. C++ API data download data test results

Table 14. Java API data download data test results

Table 15. C# API data download data test results

Table 16. Python API data download data test results

Table 17. R API data download data test results

Table 18. Comparison of data download throughput of each API in 5 years (unit: Mbit/s)

Figure 3. Comparison of API data download throughput in 5 years

DolphinDB did not cache all DolphinDB data and had to be loaded from disk each time, which was a bottleneck because the data set was 12 terabytes.

DolphinDB loads data by partition. When the user is downloading data for a particular stock on a given day, the partition is loaded into memory and the data is returned to the user. When a large number of concurrent users initiate data download requests at the same time and the amount of data is too large, data must be loaded from disks. As a result, 128 concurrent users read data from disks, intensifying I/O competition and reducing overall throughput.

Therefore, you are advised to configure multiple independent data volumes for each node to improve I/O concurrency.

Table 19. One-week data download throughput comparison of various apis (unit: Mbit/s)

Figure 4. One-week data concurrent download throughput comparison of various apis

DolphinDB memory was allocated for a week’s worth of data without loading it from disk. The maximum throughput was around 1.4GB, which was the limit of the network. The actual amount of service data is 1.4g).

6. Calculate the concurrency performance test

In this section, we tested DolphinDB using the API to perform a concurrent calculation of a stock’s minut-level K-lines on a given day. The total number of lines counted was about 100 million.

We test the computing performance of different concurrent users (1-128) in two scenarios of 5-year data volume and 1-week data volume.

The amount of data in five years is 12 TB, and the memory cannot be fully cached. Therefore, data must be loaded from disks for almost every calculation. This is an I/O intensive application scenario, and disks are expected to become a performance bottleneck.
DolphinDB data nodes can be fully cached, so they are computate-intensive applications. When multiple users are concurrent, CPU performance is expected to be a bottleneck.

The test results of each API are as follows:

Table 20. C++ API calculation of minute k line performance results

Table 21. Java API calculation of minute K line performance results

Table 22. C# API calculation of minute k line performance results

Table 23. Minute K-line performance results calculated by the Python API

Table 24. R API calculation of minute K line performance results

Table 25. Comparison of data computing throughput of various apis in 5 years (unit: Mbit/s)

Figure 5. Comparison of data concurrent computing throughput of various apis in 5 years

As can be seen from the figure above, when the number of users is less than 16, the throughput of each API basically increases linearly, and reaches the maximum when the number of users reaches 64. When the number of concurrent users increased to 128, throughput decreased for two reasons. On the one hand, when the number of concurrent users increased to a certain number, the number of DolphinDB memory could not accommodate all the data, resulting in a lot of data exchange between memory and disk, resulting in performance degradation. On the other hand, too many concurrent users lead to too many computing tasks, time-consuming task scheduling and distribution, and reduced throughput.

Table 22. One-week data computing throughput comparison of various apis (unit: Mbit/s)

Figure 5. Comparison of one-week data concurrent computing throughput of various apis

It can be seen from the test results that when the number of users is less than 64, the throughput increases steadily, and the performance of each API has little difference. When the number of concurrent users is 64, the performance reaches the maximum, and the computational data throughput is close to 7G/ SEC. When the user reaches 128G, there are too many tasks in the system, which greatly exceeds the number of threads on the physical machine (the physical machine where the cluster is located has 48 threads), leading to frequent thread switching, increasing the scheduling time of a large number of tasks in the cluster and reducing the throughput.

7. To summarize

DolphinDB C++, Java, C#, Python, and R apis were tested for data upload, download, and calculation performance with different concurrent users. The results are as follows:

** single user data upload to memory table, **C++ has the best performance, throughput up to 265 MB/s, Java, Python, R can also achieve 160-200 MB/s, C# performance is slightly worse, throughput around 60 MB. And throughput increases significantly as batch sizes increase, especially for Python and R. Therefore, when writing, the batch size should be increased as much as latency and memory allow.

** multi-user concurrent write to distributed DFS table, ** as the number of users increases, before reaching the network limit, throughput increases steadily, overall performance C++, Java performance advantages are obvious, when around 32 concurrent users, the network becomes a bottleneck, each API performance is basically the same, due to the reason of data compression, The maximum throughput of the system reaches 1.8g/s.

Multi-user concurrent download of data. In the scenario of 5-year 12T data set, the maximum throughput is about 380 MBIT/s when the number of users is 64. In this scenario, all data needs to be loaded from disks. Disk read becomes a performance bottleneck. When the number of users is 128, each node receives a large number of downloads and the DISK I/O competition is fierce, reducing the overall throughput. In data sets about 60 G scenarios, 1 week 6 all data node can cache data, so all the data in memory, do not need to load from disk, and can achieve network throughput limit, due to data compression, cluster throughput is 1.8 G/s, and with the increase of concurrent users, there is a steady increase throughput.

In multi-user concurrent computing, each API sends a minute k-line task to DolphinDB for a particular stock on a given day and returns the result. The amount of data transferred over the network is small, and most of the calculation is done on the server side. Therefore, the performance of each API is similar. In the case of 5-year data and 1-week data, the throughput trend is basically the same, reaching the maximum throughput when the number of users is 64. In the case of 5-year data, the maximum throughput is 1.3g. In the case of 1-week data, the maximum throughput reaches 7GB because all data is in memory. However, when the number of users reaches 128, the throughput decreases, mainly because there are too many tasks in the system, which greatly exceeds the number of physical machine threads (the physical machine where the cluster is located has 48 threads), resulting in frequent thread switching and increased time of task scheduling and distribution in the cluster.

DolphinDB uses apis to retrieve data, upload data, and calculate data. As concurrency improves, DolphinDB performance improves and meets most service requirements.

DolphinDB API performance benchmark report in the sequential database

1. An overview of the

2. Test environment

3. Test the performance of uploading data by a single user

4. Perform the performance test of uploading data concurrently by multiple users

5. Performance test of multi-user concurrent data download

6. Calculate the concurrency performance test

7. To summarize

Related Posts

Bytedance: Let’s talk about Android messaging

Microsoft to Win10 tailored to the efficiency of the magic device: PowerToys, with batch change file name and image size and other enhancements.

Four Spring Boot implementations of universal Auth authentication methods