What is the core of high performance computing?

Of course, it must have been calculated. Yes, many people would think so.

A look at the twice-yearly Top500 list shows that today’s best supercomputers have been able to achieve 4.42 quadrillion calculations per second (442,010 TFlop/s), a step closer to the exascale goal.

With the increasing application of heterogeneous computing, traditional HPC and emerging artificial intelligence begin to adopt heterogeneous computing methods on a large scale. The emergence of GPU, FPGA, ARM and other chip architectures also makes the whole computing market appear “a hundred flowers bloom”.

The same is true at the network level. Contrary to the impression that many people have of HPCs using InfiniBand networks, information from the Top 500 lists shows that Ethernet still has a very high penetration rate, with 100GB networks being “standard” for Ethernet; In InfiniBand networks, which focus on transmission efficiency and low latency, the 200G HDR standard has become the mainstream, and Mellanox recently announced a 400G NDR product at SC20, which covers the whole range of network applications, including cable.

In this sense, HPC is evolving rapidly. Changes from computing to the web are making data processing and transmission more efficient. But even in this rapid development, we seem to have forgotten another important option: storage.

How important is storage for high performance computing?

In the past, when we talked about high performance computing, we always focused on the speed of computing, because there was a significant shortage of computing power. Now, the advent of heterogeneous computing has made computing exponentially more efficient, and high-speed networks have enabled those computing results to make the data itself more valuable…

But we found that storage was a bottleneck in many applications.

One of the traditional applications of high performance computing is biological genetic engineering. I still remember that since the 1990s, the content of “Human Genome Project”, “cloning”, “biochip” and so on are all over the world on TV and in books. Dr. Chen Zhangliang, the vice president of Peking University at that time, even said that “the 21st century is the century of biology”. In particular, in recent years, the discovery of PD-1 signaling pathway (leading to a big step closer to the fight against cancer), induced pluripotent stem cell iPSC technology and gene editing technology CRISPR (enabling a radical cure of type I diabetes) have all proved the importance of bioengineering.

Different from the biological laboratory in the impression of most people, now with the help of high technology, the scene of test tubes and beakers in movies and TV shows has been gradually replaced by high-performance computers, and “high-tech simulation” has become an important part of biochemical experiments.

Take BDA Gene, a famous domestic gene company, as an example. This company, which has made a significant contribution to the global gene development, has hundreds of sequencers, and the monthly data generated by this company is as high as 300TB-1PB. As a result, just storing the data is a headache, not to mention the need for subsequent analysis and utilization of the data. The amount of storage resources involved is “astronomical”.

Another thing that relies heavily on storage is video clipping and processing. Is accompanied by a short video application in the spring breeze, today many people can the trill, quickly, such as platform to shoot their own video, and let more people access to the video processing is such a amount of storage resources work — 4 k and 8 k video ultra-high resolution in bring us visual feast at the same time, also weighed on the backend storage devices.

The 2016 Rio Olympics reportedly used 8K video to report events, recording 20 minutes of uncompressed Ultra HD video, which took up a whopping 4TB of storage space.

Capacity challenges can be solved by adding more storage devices, but more importantly, storage efficiency. In order to meet the 4K/8K video storage workflow, in addition to the need for a large enough storage space, the device also needs to have high scalability and high performance to meet the read and write efficiency, and this brings a higher difficulty to 4K/8K storage.

Similarly, in the film industry, from the return of another to “wandering the earth”, to “Jin Gangchuan” broadcast after the National Day last year, the effects of the visual impact effect, show the rendering began to boost the development of Chinese film industry, to the production cycle, production cost and production of film and television works the convenience of great changes.

Special effects rendering as a matter of high performance computing commercial scene, a special effects film production time, in addition to spend on the shooting, the second is the special effects, such as the special effects and the main time spent on rendering, before rendering, need a large capacity storage device to store data, and in the process of rendering, need a lot of work to generate the final output of data processing, In a sense, computing power and storage, cloud rendering is becoming an important part of China’s film industry, promoting the rapid development of China’s film industry.

Different business scenarios have different storage requirements. Whether in the field of scientific research or the new intelligent video, storage devices have brought new requirements and challenges, and storage products themselves need to be able to make a breakthrough.

Just to see the gradually deepening importance of storage in high performance computing, since November 2017, in addition to the well-known TOP500 list, a set of list named IO500 will be released at the SC and ISC conferences in Germany every year. This list is also gradually becoming the performance “vane” of the storage industry.

What will the storage platform of the future look like?

If TOP500 is a computing performance leaderboard, IO500 is a storage system performance leaderboard.

Benchmark performance testing of performance storage systems has always been a complex task. Parallel I/O is affected not only by CPU latency performance and network, but also by underlying storage technology and software. The performance test results released by different manufacturers often have great differences due to the different testing methods, tools, parameters and even the sequence of testing steps.

That’s where IO500 comes in. As an internationally authoritative benchmark, IO500 defines a comprehensive suite of benchmark performance tests that can be used to test and compare high-performance storage systems. The purpose of IO500 is to provide users with a standard basis for evaluation.

Specifically, the standard IO500 test benchmark uses IOR, MDTEST, and standard POSIX to evaluate the performance of optimable sequential IO, random IO, and metadata operations, among other types of workloads.

IO500 includes benchmarks for bandwidth and metadata, and a final score is obtained by geometrically averaging the total scores of the two projects.

In the form of expression, these achievements can be divided into two categories: the total list and the 10-node list. Among them, the 10-node list will be closer to the possible scale of the actual parallel program, and can better reflect the I/O performance that the storage system can provide for the actual program, so it has higher reference value.

As we know, most HPC applications today are designed based on the POSIX protocol, so the standard POSIX interface for IO500 can also maximize the high bandwidth, high throughput, and low latency requirements of storage applications.

In other words, IO500 can reflect the difference of system storage performance by close to the real application. After all, the real model of HPC is complex, which requires storage with high bandwidth, high OPS fusion, protocol fusion and other capabilities. Because those who can enter IO500 or even win the lead in the list must be the leader in the storage field.

Therefore, despite the short launch time, IO500 is even more active than TOP500 in terms of participants and the intensity of competition.

The independence of the storage system from the computing system, the innovative optimization methods based on Rust, such as highly scalable concurrent access, large-grained data cache & bypass access, data access & disk flowing, zero-copy fast RPC processing technology, and so on, maximized performance with NVME-SSD support.

Looking at the current storage market, the coexistence of multiple storage protocols has always been a difficult problem for users to upgrade and iterate. Compared with traditional file storage and block storage, today’s popular object storage is more suitable for the needs of unstructured data and intelligent applications. Even in the high-performance computing storage application dominated by POSIX protocol, object storage can also occupy a place by virtue of convenient operation.

For example, the video rendering industry uses object storage to store massive video materials, which is convenient for late real-time query and call data. Therefore, for users, those who can achieve the coexistence of multiple storage protocols, effective management and convenient data processing will be able to take the lead in the future storage development.

In addition to multi-protocol coexistence, multi-protocol interoperability is also a concern of the entire industry. Commonly used distributed parallel file systems include Lustre, GPFS, Gluster, Isilon OneFS and so on. Object storage is Cephs and Cleversafe. In addition, SMB, CIFS, NFS and other data access protocols are also involved. For example, NAS is a typical example of common multi-protocol interchange.

The biological genetic testing mentioned above, now with the help of big data, artificial intelligence means, has been able to achieve intelligent testing standards, the whole industry has also shown a tendency of personalization. But it also brings with it issues of data interoperability, such as how data from distributed architectures can be coordinated with traditional sequencers or workstations.

And intelligent driving, which is the hottest thing right now. Recently, Tesla CEO Elon Musk mentioned in an interview that the company will achieve the L5 level of intelligent driving by 2022, which means that the car will completely replace the driver, realizing the true sense of “autonomous driving” that we see in movies and TV shows.

Autopilot is divided into different levels from L1 to L5. Under the L5 level, future cars will be transformed from the seat to the cockpit, and intelligent computer control can be realized under any conditions. Of course, the owner can also operate the vehicle.

But this has to solve the problem of data synergy. For a long time, autonomous driving requires sensors on road acquisition vehicles to collect massive road measurement data, and based on the processed data to carry out repeated AI training and simulation, so that the car can intelligently identify and deal with various road conditions and obstacles, so as to realize autonomous driving.

This includes the collection and import of data, the pretreatment of data in localization, the training of AI model and finally the application simulation of HPC. Through the simulation, the on-board AI system can be guided to make intelligent judgment and promote the upgrade and iteration of the program.

Figure Note: Autonomous driving research and development process

Automated driving training at different stages, the use of access protocol is also have very big difference, to improve the process efficiency, and need to be stored to support multiple communication protocol, reduce data copy, with the current mainstream L2, L3 automated driving, for example, every car produced every day the amount of data has reached 2-64 TB, along with the ascension of car mileage, The amount of data generated has reached the scale of petabytes or even exabytes.

In L5 level automatic driving, the amount of data transmission and computation can even reach an “astronomical number”, which requires an integrated computing method including on-board processor and back-end data center to realize fast transmission and corresponding through the Internet of vehicles and 5G network.

In other words, in the application scenario of autonomous driving, the on-board system will carry out large-scale data calculation and processing, which has the performance requirements of high bandwidth, high OPS and low delay, and there will also be cross-system collaborative interaction between massive data.

Therefore, it is necessary for future high performance computing systems to meet the high bandwidth & high OPS requirements at the same time, and to avoid the data copy redundancy caused by the use of different storage architectures in different process stages. Therefore, multi-protocol intercommunication has become an inevitable choice.

It seems, wants to improve storage performance of high performance computing, must break the original boundary, let the platform to be able to undertake the various forms of user data, realize the agreement of the data flow, so as to realize unified storage resource management, break the hardware lock, data can be balanced distribution in the resource pool, simplify the lasting protection design, different business systems can be Shared storage, It can also reduce the storage cost through intensification, avoid the repeated construction of multiple systems, and fully explore the potential value of existing data.

For a long time, we are moving towards ten billion times calculation on the road to face many challenges, to fetch a wall wall, communication problems, reliability and energy consumption of wall and become the four big problems, the “wall” to fetch problem requires the speed of computation, storage, I/O to achieve the balance matching, thus on the design of the system structure to realize balanced performance.

Therefore, the storage of HPC in the future needs to realize the compatibility of multi-protocol, interconnection of multi-protocol architecture and intelligent management.

In other words, with the development of high performance computing today, the rapid improvement of computing power and the high bandwidth and low latency of network data have entered a new stage, and the corresponding storage application also realizes the massive expansibility and intelligent data management.

For traditional high-performance computing applications, whether it is biological genes, high energy physics, fluid mechanics, video processing and other scientific research applications will use massive data, and improving the storage space and utilization of data can better promote the development of high-performance computing.

It’s time to choose a new storage platform for the exascale era. (This article is from the DT era, thanks)