Why should we care about the IO behavior of the business, or the IO access model? The reason is simple. A storage system needs to pay attention to its service objects. The service objects of a storage system are upper-layer applications. The overall design and architecture of a storage system are the result of multiple factors. When the performance of a storage system reaches its limit, both storage developers and users want to know the I/O performance behaviors. Developers want to find bottlenecks and optimize the storage system, and users want to better use the storage system to ensure stable service running.

I/O characteristics of common service types

Let’s first introduce the characteristics of the IO model in several typical business scenarios.

log

Log files are usually appending. Before each write, statfs is called to get the file size and write data as POS (new content insertion location), so the ratio of write to STATfs is 1:1. On the other hand, if you find that the ratio of write/stat requests on a client is close to 1:1, you can assume that the business on this client is basically appending files. If the log I/O type accounts for a large proportion of the total storage access, you are advised to expand the number of buffer writes to log files to reduce the number of write/ STAT requests and optimize the system performance.

System command du/ls

Several operating system commands are not particularly friendly to distributed file storage, such as ls/du, especially the DU command. Du will call readdir to get the entry list, and then call statfs to get the file size. If it is a directory, it will also recursively query statistics. Therefore, it is the combination of multiple readdir and multiple STATFS requests, and interoperatively call. Typically, each MDS handles a large number of readdir and statfs requests. The ls command is also a combination of readdir and stat, but unlike du, ls only counts the stat information in a single directory, so it is one readdir and multiple STATfs requests. If an MDS has a high load during routine inspection, you can view the load type of the MDS. If the load type of the MDS is readdir/statfs, you can basically determine that a service is running system statistics commands. You can obtain the client that initiated the request based on the CLIENT IP address of the request. Then you can find out which business triggered the system instruction.

The database

The read/write ratio of a database generally conforms to the 80-20 principle. 20% of the data is written, and 80% of the data is read. In the write process, the data is mainly 8KB and 16KB. For distributed file storage that uses buffered cache, fsync incurs high system overhead, resulting in poor database write performance. If you have data service requirements, direct I/O is recommended. MySQL’s storage engine, InnoDB, has a setting — Innodb_flush_method — that controls the system calls InnoDB uses to write data. You are advised to use O_DIRECT_NO_FSYNC, that is, O_DIRECT, bypass pagecache, write data directly to disk, and skip fsync() to update the metadata of the log during the write operation. After version 8.0.14, MySQL automatically calls fsync() to update the metadata information of MySQL files in the file system when a file is created, its length is increased, and a file is closed.

AI training

As we all know, in the AI training process, more than 90% of the data is read operation, and it is sequential read of small files or random read of large files. In the process of training, will not be modifying, and deleting data set, the operation of the metadata on the open/close/stat/revalidate, there will be no special metadata operations, data manipulation is read. Based on this I/O feature, to further improve performance, you can selectively weaken certain POSIX semantics or even reduce data consistency. For example, you can add read cache after the consistency is weakened on the client to greatly improve the data read speed and shorten the training time.

Methods for analyzing business IO models

You can analyze the SERVICE I/O model by reading the service source code or using the I/O analysis tool provided by the storage system. However, reading business source code is not practical most of the time, nor is it necessary for storage developers or operations personnel, so they have to rely more on IO analysis tools provided by storage systems.

In a large-scale storage system, analyzing SERVICE I/O behaviors is a complex process, especially distributed file storage. File storage requires a set of standard POSIX semantics file interfaces. The problem of abundant interfaces is that more I/O operation types need to be monitored and analyzed, making analysis more difficult. For file storage, we need to focus on two types of IO, metadata and data. Metadata IO mainly include metadata operations of files/directories, such as the open/close/mkdir/rmdir/stat/unlink revalidate/hardlink/rename, data IO mainly includes the read/write, When collecting read/write statistics, IOPS and BW statistics are also required. Unfortunately, the common file storage products and solutions in the market rarely provide convenient tools to help administrators systematically understand the characteristics of business IO.

How did we do that

As the saying goes, to do a good job, you must sharpen your tools. To better understand THE I/O characteristics of various service systems, YRCloudFile implements a set of I/O statistics framework to facilitate administrators to understand and analyze service I/O characteristics in real time and optimize storage systems using this function.

After the YRCloudFile client is connected to the storage cluster, the cluster automatically loads and monitors the metadata and data request behaviors of the client, and displays the metadata and data request behaviors of the client in real time.

  • Metadata request: Requests made by clients to obtain metadata information. There are various types of requests. Generally speaking, common metadata requests can be classified into three types:

  • File data request: Refers to the request made by the client or metadata service to obtain file data or information. Note that not all file data requests read or write file data. There are eight types of file data requests:

The client monitoring function of YRCloudFile covers 35 metadata operation requests and 17 file data operation requests. Users can set the most commonly used OPS options as key observation items according to the actual production situation. This function can not only display the situation of all OPS in real time, but also display, sort and export the daily average and weekly average OPS, providing accurate data support for CTO and management team to allocate system resources and optimize operation.

You can view THE I/O statistics distribution of each client. In addition, you can monitor certain operation types and clients based on your own situation. You can queue certain operation types to quickly locate and analyze clients. Yanrong technology’s R&D team can also carry out targeted system optimization for users’ business according to the analysis of these IO characteristics.