This is the 28th day of my participation in the August Challenge

One, foreword

In recent years, I have been studying sequential and random writes to disk, as well as the problems related to Java direct memory. Therefore, I have often seen the concepts and related uses of Flush and Mmap in various sources and source code. And then start breaking them down, one by one, to understand what they mean. After finally all reason clear, come back to sum up but discover, oneself more and more confused. The following questions arise:

What is the difference between fsync and Fwrite /fflush combinations?

What is the relationship between Mmap and fsync?

3. Why do people say that data will not be lost after fsync? Is it true?

4. Is it safe to write data to disk?

5, Why can’t we close the file directly?

With these questions I began a long research road, and finally gained a new understanding of these methods and calls. Let’s take a look and answer the above questions.

PS: As it is difficult to find the detailed introduction of these comparison and reference on the Internet, the following content of this paper is the result of sorting out all kinds of subclasses by myself. There may be some misunderstanding, please make corrections.

Two, each system call introduction

Most of the content comes from Baidu Baike.

1, fsync

Calling fsync ensures that the file modification time is also updated. The fsync system call lets you precisely force each write to be updated to disk. You can also open a file using synchronous I/O, which causes all write data to be committed to disk at once. Enable synchronous I/O by specifying the O_SYNC flag in open.

2, fwrite

Fwrite () is a file processing function in the C language standard library. It writes several data blocks to a specified file. If the operation succeeds, the number of data blocks actually written is returned. This function operates on files in binary form, not just text files.

3, fflush

Fflush is a function in the C language’s standard I/O library that flushers information from the stream and is typically used to work with disk files. Fflush () forces the buffered data to be written back to the file specified by the stream argument.

4, mmap

Mmap maps a file or other object into memory. Files are mapped to multiple pages, and if the file size is not the sum of all pages, the unused space of the last page is cleared. Mmap is very useful in the user space mapping call system.

Differences between system calls

In fact, familiar friends know what the function of each call is, but what is the relationship between them at the bottom, it is estimated that many students still can not make clear. To clarify these questions, I drew the following picture with reference to online materials of Daniu:

From the figure above, we can clearly see that during the whole process of file writing, it needs to go through many buffers. Such as IO Buffer, Page Cache, driver Cache, and Disk Cache. All of these caches exist to speed up our file reads and writes. However, in scenarios where we need to ensure that the data is 100 percent secure (such as WAL), these buffers can become one obstacle after another. In order for data to be written directly to disk from the application layer, we need to make use of the various method calls mentioned above. We can use different combinations of methods based on different business requirements.

1, Allow applications to crash writes

The above figure shows that only after data is written to the kernel’s Page Cache can the application crash prevent data loss. There are generally two ways to write data to the kernel.

A, Write (write/flush/close)

When we call write(fwrite), the data is just written from the application to the C library’s IO Buffer. The data is still in user space. If at this point we call close to close the operation. Then the data usually doesn’t understand being written to the kernel, let alone to disk. The Page Cache of the kernel Cache must wait until the C library’s IO Buffer is full. As I can see from the figure above, we can flush data actively to the kernel’s Page Cache. This is why we are usually advised to flush files before closing them. Because once the data is in the kernel, it is safe relative to the application. If the application fails at this point, our data is still safe. It can be written to the disk.

B, mmap

Mmap data, often referred to in persistence, is actually mapped between the Application Cache and the kernel Page Cache. Thus all operations on data at the application layer are actually mapped to the kernel’s Page Cache. So with Mmap we don’t have to call Flush, and we don’t have to worry about data being lost due to an application crash.

Of course, in addition to being able to manipulate data directly in the kernel at the application level, MMAP also reduces unnecessary context switching. In normal write, for example, we call flush with a context switch, which has some overhead. This is the main reason why mMap is often used in persistence scenarios.

Write operations that allow the operating system to crash

As you can see from the above figure, only when data is written to the disk cache or disk media can it be guaranteed that the data will not be lost in the event of a system crash (if data is in the disk cache, the disk must have a backup power supply).

To write data from the kernel’s Page Cache to disk (Cache), just call fsync (fdatasync). Even if the machine goes down, our data is still safe. This is why a lot of WAL’s are fsync flushes.

3. Data in the disk cache falls from the disk

After performing operations 1 and 2 above, the data is already on disk. But there is no guarantee that the data will be 100 percent successful. It is possible that the data is in the cache on disk. If the power goes out, our data may be lost. There are currently two main solutions to this problem: backup power and open Write Barriers to the OS.

A. Standby power supply

Many commercial disks have their own backup power supply. When the machine is powered off, the data in the cache can be dropped from the disk according to the backup power supply.

B, the Write Barriers

In Linux, the file system ext3 or ext4 is also known as the journaling file system. The reason is that it also has a WAL – like operation when it writes data.

In a journaling file system, the disk looks something like this. When data is written, it is first written to the cache, then the metadata of the write operation (from which all changes to the data are recorded) is written to the disk medium, and finally a COMMIT Record flag is written to indicate that the log is finished and the data is secure. At this point the write instruction returns. In the case of the fsync command, when the commit Record flag is written, it is returned. But the real data is still in the cache. However, even if the disk fails, the disk can recover the data based on the logs after the disk is restarted. The space for both logging and commit records is contiguous, so writes are fast. This is how journaling file systems can write quickly without losing data to power failures. In fact, it is our common WAL thinking.

There is a slight problem, however, that both the log and the Commit record are writes handed to the driver, and modern drivers generally reorder all writes to improve write performance. At this point, it is possible to reorder the log and commit Record, causing the Commit Record to fall first and the log later. If the disk is powered off after the Commit Record falls, data cannot be recovered because logs are not written.

So the file system takes write Barriers. Add Write Barriers before each commit Record is written to ensure that the data before it is written has been dropped. This ensures that the log and commit records will not be reordered and will fall correctly.

Answer questions

At this point, we have a comprehensive understanding of the entire OS IO operation process. Now let’s go back and look at the questions we started with. Are these questions well answered?

What is the difference between fsync and Fwrite /fflush combinations?

The combination of Fwrite and Fllush writes data from the application layer to the C library Buffer and then flusher it to the kernel’s Page Cache.

Fsync writes data from the kernel Page Cache to disk (not necessarily to the medium).

What is the relationship between Mmap and fsync?

Mmap: Creates a mapping between application and kernel Page Cache. This allows applications to directly manipulate data in the kernel Page Cache at the application level.

Fsync: flush data from the kernel Page Cache to disk.

3. Why do people say that data will not be lost after fsync? Is it true?

Fsync, as we know, flushes data directly from the kernel. But the data is not necessarily safe after it is flushed. First, if the file system does not Write Barriers, or Write Barriers are not turned on. And the disk has no backup power. Then if the system goes down (power failure), the data that is still in the disk cache will be lost. So fsync doesn’t necessarily guarantee data loss.

4. Is it safe to write data to disk?

This depends on where the data is written to the disk. If it is only in the disk cache, it can be risky. If the disk has been dropped or successfully write log and Commit record is safe.

In fact, even if the data really fell to the medium, it is not necessarily safe. Because the disk might get damaged or something. But this is beyond the scope of this article, so we won’t go into it.

5, Why can’t we close the file directly

Because the data written by write is actually still in the application cache, if the flCOSE file. Data may be lost due to an application crash. Therefore, data needs to be flushed to the kernel’s Page cache before closing.

Reference:

Linux IO process: blog.csdn.net/caogenwangb…

Linux OS: Write Barriers:www.rosoo.net/a/201211/16…

Barriers and Journaling Filesystems: lwn.net/Articles/28…

Five, the convention

If you have any questions or comments about this article, please add lifeofCoder.