Public account: Operation and maintenance development story, author: Double_dong

The classification of the IO

File read and write modes vary, resulting in various I/O classifications. The most common ones are,

  • Buffered and unbuffered I/O. Depending on whether the cache interface of the standard library is used, write your own cache, etc. :

Buffered I/O refers to the use of the standard library cache to speed up file access, and the standard library internal access to files through the system scheduling.

2. Non-buffered I/O, which means that files are accessed directly through system calls without passing through the standard library cache.

  • Direct and indirect I/O. Depending on whether the kernel cache is used. The use of kernel caching is indirect IO. Open system call O_DIRECT parameter:

1. Direct I/O: skips the page cache of the operating system and accesses files directly by interacting with the file system.

2. Indirect I/O is the opposite. When a file is read or written to the disk, it passes through the system’s page cache and then is actually written to the disk by the kernel or additional system calls.

  • Synchronous blocking and non-blocking I/O. File I/O can be divided into blocking and non-blocking I/O depending on whether the application blocks itself:

1. Blocked I/O: If an application does not receive a response after performing I/O operations, the current thread is blocked and other tasks cannot be performed naturally.

2. Non-blocking I/O refers to that an I/O operation performed by an application does not block the current thread but returns immediately. The O_NONBLOCK flag is set to indicate access in a non-blocking manner, allowing you to continue with other tasks and then retrieve the result of the call either through polling or event notification. Strictly speaking, the location of the chokepoint is changed (select, poll, epoll, etc.). It is mainly used on standard output and the network.

  • Synchronous and asynchronous I/O. File I/O can be divided into synchronous and asynchronous I/O based on whether to wait for the response result:

1. Synchronous I/O: After an APPLICATION performs an I/O operation, it obtains an I/O response only after the entire I/O operation is complete.

2. Asynchronous I/O: After an APPLICATION performs an I/O operation, it continues to execute it without waiting for a response. After the I/O completes, the response tells the application in the form of event notification.

File system I/O on Linux

In order for a process to read or write data to a file system, many layers of components need to work together. How did they work together? Let’s take a look.

At the application layer, a process can use system calls such as sys_open, sys_read, and sys_write to read and write files. In the kernel, each process needs to maintain certain data structures for open files. In the kernel, the entire system open files, also need to maintain certain data structures.

General block layer

The common block layer is a kernel component that handles requests from all block devices in the system.

  • Map data from disk to memory. The page box is mapped to a linear address in the kernel only when the CPU accesses the data, and unmapped at the end of the data access.

  • Implement a “zero-copy” mode by means of attachments such as DMA, where disk data is stored directly in the user-mode address space rather than being copied to the kernel address space first. Because the lobes of the buffers used by the kernel for I/O data transfer are mapped to the user-mode linear address of the process.

  • Manage logical volumes, such as those used by LVM and RAID (software RAID).

The common block layer is the heart of Linux disk I/O. Up, it provides a standard interface for file systems and applications to access block devices. Heterogeneous disk devices are abstracted into a unified block device. I/O requests sent by file systems and applications are reordered and merged, improving disk access efficiency.

I/O scheduler layer

In fact, the Linux kernel supports four I/O scheduling algorithms: NOOP, CFQ, DeadLine, and Anticipatory. Here I also introduce them separately.

The first NOOP is also known as the elevator algorithm. Is one of the simplest I/O scheduling algorithms. It is essentially a first-in, first-out queue that does only the most basic merging of requests and is often used for SSD disks.

The second TYPE of CFQ (Completely Fair Scheduler), also known as the Completely Fair Scheduler, is the default I/O Scheduler in many current distributions. It maintains an I/O scheduling queue for each process and evenly distributes EACH process’s I/O requests over time slices.

Similar to process CPU scheduling, CFQ also supports priority scheduling of process I/O, so it is suitable for systems running a large number of processes, such as desktop environments, multimedia applications, and so on.

The third DeadLine scheduling algorithm. Four queues were used. Two of the sort queues contain read and write requests respectively, where the requests are sorted according to the starting sector. The other two deadline queues contain the same read and write requests, but are queued according to their deadlines. This improves the throughput of mechanical disks and ensures that requests that meet deadlines are processed first. This algorithm makes a trade-off between global throughput and latency, sacrificing some global throughput to avoid the possibility of hungry requests. When the system has a large number of sequential I/O requests, this algorithm may cause I/O requests to be not properly sorted, resulting in frequent pathfinding.

16 The fourth Anticipatory anticipation algorithm. Borrowing the basic mechanism of the “deadline scheduling algorithm” : two deadline queues and two sort queues, the I/O scheduler interactively scans the sort queue between read and write requests, but favors read requests. Scans are almost continuous unless a request times out. A wait time window of approximately 7ms is set for each read IO. If the OS receives a read I/o request from an adjacent position within 7ms, the request is immediately satisfied. To meet the mixed scenario of random I/O and sequential I/O, this algorithm is suitable for the environment where a lot of data is written. It is not suitable for the environment where a lot of data is read randomly, such as MySQL.

Disk I/O Detection

The metrics of greatest concern in disk testing are:

  • Iops (number of I/OS executed per second), BW (bandwidth, throughput per second), LAT (latency per IO operation)

  • If the block size of each I/o operation is small, for example, 512bytes, 4k, or 8k, the iops is tested

  • If the block size of each I/O operation is large, such as 256K, 512K, or 1M, the test is mainly BW

Use the FIO tool to check disk I/O

1. Introduction of FIO

FIO is a tool for testing disk performance. You can test major performance indicators such as IOPS, throughput, and I/O latency. And supports a variety of IO engines.

2. FIO download

Brick.kernel. dk/snaps/

Open the above url, select the version you need and download it. Such as:

Wget HTTP: / / http://brick.kernel.dk/snaps/fio-3.5.tar.gzCopy the code

3. Decompress the package and install it

# tar-xzvf. / fiO-3.5.tar. gz.... # CD fio-3.5 # make && make install.... # which fio /usr/local/bin/fioCopy the code

4. Instructions

Filename =/dev/sdb1 Indicates the name of the test file. Usually, select the data directory of the disk to be tested. The direct=1 test bypasses the machine's built-in buffer. Make the test results more realistic. Rw = randWrite Tests randomly written I/ OS. Rw = RandRW Tests randomly written and read I/ OS. Bs = 16K Block size of a single I/O is 16K. Test with 4k IO at a time. Numjobs =30 The test thread is 30. runtime=1000. The test time is 1000 seconds. Ioengine =psync The I/O engine uses pyNC rwMixWrite =30 In mixed read and write mode, write accounts for 30% group_reporting About displayed results, and summary information about each process lockmem=1g The test uses only 1 GB memory. Zero_buffers initializes the system buffer with 0. Nrfiles =8 Number of files generated by each processCopy the code

5. Test examples

Mixed test:  fio -filename=/tmp/test -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=512b -size=200m -numjobs=10 -runtime=60 -group_reporting -name=mytest  fio -filename=/dev/test -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=mytest  fio -filename=/dev/test -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=mytest  fio -filename=/dev/test -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=60  -group_reporting -name=mytestCopy the code

6. IO Read and write test reference script

https://github.com/sunsharing-note/fio_test.git
Copy the code