At NetEase’s Digital + conference, which ended last week

NetEase Shufan announces:

Open source a high-performance distributed storage system called Curve,

Performance can reach 1.84 times of Ceph!

Wang Yuan, vice President of NetEase, Executive President of NetEase Hangzhou Research Institute and General Manager of NetEase Digital Fan:

The ability of basic software for the digital transformation is critical, the current storage areas need a higher performance and higher availability/reliability, autonomy ability stronger distributed storage system, the Curve on the number of open source not only represent the netease sails based software market in insist, as defined for the software infrastructure the prosperity of ecological added a fire.


With advanced architecture design, performance per volume is 1.84 times that of Ceph

Curve provides a storage base with high performance and low latency. Based on this base, enterprises can build storage systems suitable for different application scenarios, such as block storage, object storage, and cloud native database. At present, NetEase Shufan has implemented a high-performance block storage system.

According to Wang, the Curve has three main design features: high performance, high availability and autonomy.

High performance comes from advanced architecture. Curve refers to the storage system in the industry, adopts advanced and efficient open source technology, designs a new architecture to achieve the core capability of high performance and low delay, adopts high-performance RPC framework to ensure high performance and low delay of network data flow, and realizes low delay under the consistency of multiple copies based on Raft protocol. Further optimizations are made for Raft protocol snapshots. For disk I/O, Curve reduces I/O collisions and increases I/O concurrency by using fine-grained hash of the address space, and uses ChunkFilepool to reduce I/O magnification to maximize hardware performance.

Wang Yuan published the test data comparison between Curve and CephL. In the single-volume scenario, the core 4K random read/write IOPS performance of Curve is 1.84 times and 1.58 times of Ceph, and the latency is 48.39% and 37.50% lower than Ceph.

In an interview, he revealed that Curve still has some innovative performance optimizations to complete, such as fine-grained hashing and IO_uring, and expects another 30% performance improvement in the next version. In other words, Curve delivers more than twice the performance of Ceph per volume. The performance difference comes from different choices of architecture design. Distributed consistency Curve adopts Quorum mechanism, while Ceph implements strong consistency. The former has better delay than the latter, and can quickly recover from failures without much impact on IO performance.

Curve and Ceph performance test comparison

In terms of high availability, Curve is designed as a core component that can tolerate partial instance failures without affecting the availability of the whole cluster. Whether a single storage device fails or the system expands, Curve client I/OS are not affected at all. Common exceptions such as disk insertion or removal, service process interruption, and I/O jitter are also minimal, Wang said. Of course, the fault recovery process does not significantly affect upper-layer I/OS.

In addition, in terms of autonomy, Curve achieves one-click deployment and one-click upgrade, requiring little manual intervention in operation and maintenance, and builds a comprehensive measurement standard and alarm system based on open source technologies such as Promethues and Grafana.


Feedback on open source enhanced software definition infrastructure shortcomings

The high-performance block storage system based on Curve has been applied to some core services of NetEase. It supports snapshot cloning and recovery, and can be mounted to QEMU VMS and physical machines (NBD).

The system has been online for more than 400 days. There has never been any data inconsistency or data loss, no major failure, data reliability has reached 100%, and service availability has reached more than 4 9. Exception drills in the online environment also confirmed the impact of exceptions on the business as described above. But for NetEase, this result is just the beginning of Curve.

As there is no distributed storage system with high performance and low latency in the open source field, Wang Yuan announced that NetEase Sufan would open source Curve to give back to the community and encourage the industry to use it together. He also hoped that everyone would participate in making Curve better and better.

The latest stable version of the Curve project has been uploaded to Github with deployment documentation. As a homegrown project in China, Curve first provides Chinese documentation, hoping to lower the threshold for Chinese users to try new products. For Curve fans, check out Opencurve.github. IO for more information.