In the previous article, Some best Practices for file IO manipulation, I introduced some common Java interfaces for file manipulation, and discussed PageCache and DIrect IO. I recently packaged a DIrect IO library myself. This article focuses on what Direct IO means in Java, and a brief introduction to my own wheels.

Direct IO in Java

If you’ve read my previous articles, you already know that the common file manipulation interface in Java is FileChannel, and that there is no interface for directly manipulating Direct IO. This means that Java cannot bypass PageCache to directly read and write storage devices, but for the use of Java language to write databases, message queues and other products, there is a need to bypass PageCache:

  • PageCache is a concept at the operating system level, and it is difficult to intervene at the User level. User BufferCache is obviously more controllable than Kernel PageCache
  • Modern operating systems use as much free memory as possible to act as PageCache. When the operating system recovers PageCache memory faster than the application write cache speed, the disk write rate will be affected, directly represented by the increase of write RT, which is called “burr phenomenon”.

PageCache may have good intentions but bad intentions. Using Direct IO + custom memory management mechanism will make the product more controllable and high performance.

Direct I/O restrictions

To use Direct IO in Java, you need to invoke the Pwrite interface of THE C language and set the O_DIRECT flag. Using O_DIRECT has many limitations

  • Restrictions on operating systems: Linux oss support the O_DIRECT flag in versions 2.4.10 and later. This flag is ignored in earlier versions. Mac OS also has a mechanism similar to O_DIRECT
  • The memory boundaries of the buffers used to pass data must be aligned to integer multiples of blockSize
  • The buffer used to transfer data must be an integer multiple of blockSize.
  • The data transfer start point, that is, the offset of the file and device, must be an integer multiple of blockSize

View the system blockSize: the size of the stat/boot / | grep “IO Block”

ubuntu@VM-30-130-ubuntu:~$ stat /boot/|grep “IO Block” Size: 4096 Blocks: 8 IO Block: 4096 directory

Usually 4 KB

Java uses Direct IO

The project address

https://github.com/lexburner/kdio

Introduction of depend on

<dependency>
    <groupId>moe.cnkirito.kdio</groupId>
    <artifactId>kdio-core</artifactId>
    <version>1.0.0</version>
</dependency>
Copy the code

Matters needing attention

// file path should be specific since the different file path determine whether your system support direct io
public static DirectIOLib directIOLib = DirectIOLib.getLibForPath("/");
// you should always write into your disk the Integer-Multiple of block size through direct io.
// in most system, the block size is 4kb
private static final int BLOCK_SIZE = 4 * 1024;
Copy the code

Direct IO write

private static void write(a) throws IOException {
    if (DirectIOLib.binit) {
        ByteBuffer byteBuffer = DirectIOUtils.allocateForDirectIO(directIOLib, 4 * BLOCK_SIZE);
        for (int i = 0; i < BLOCK_SIZE; i++) {
            byteBuffer.putInt(i);
        }
        byteBuffer.flip();
        DirectRandomAccessFile directRandomAccessFile = new DirectRandomAccessFile(new File("./database.data"), "rw");
        directRandomAccessFile.write(byteBuffer, 0);
    } else {
        throw new RuntimeException("your system do not support direct io"); }}Copy the code

Direct I/o read

public static void read(a) throws IOException {
    if (DirectIOLib.binit) {
        ByteBuffer byteBuffer = DirectIOUtils.allocateForDirectIO(directIOLib, 4 * BLOCK_SIZE);
        DirectRandomAccessFile directRandomAccessFile = new DirectRandomAccessFile(new File("./database.data"), "rw");
        directRandomAccessFile.read(byteBuffer, 0);
        byteBuffer.flip();
        for (int i = 0; i < BLOCK_SIZE; i++) {
            System.out.print(byteBuffer.getInt() + ""); }}else {
        throw new RuntimeException("your system do not support direct io"); }}Copy the code

The main API

  1. DirectIOLib.javaProvides Native Pwrite and preAD
  2. DirectIOUtils.javaProvides utility class methods, such as ByteBuffer, that allocate Block alignment
  3. DirectChannel/DirectChannelImpl.javaThe Direct package for FD is provided similarlyFileChannelRead and write API.
  4. DirectRandomAccessFile.javaOpen the file through DIO and expose the IO interface.

conclusion

This simple Direct IO framework references Smacke/Jaydio, which has its own Buffer interface that is incompatible with JDK libraries. And the read/write implementation adds a Buffer to cache the contents until Block alignment breaks the semantics of Direct IO. At the same time, thanks to Chen Yang’s guidance, the amount of code for this little wheel is not much, and the initial code is quoted from his small demo (which has been authorized by myself). Why do you need such a library? The main consideration is that there will be competitions like “Middleware Performance Challenge” and “PolarDB Performance Challenge”. Java’s API may not be enough to take advantage of it. Wouldn’t it be nice to have a library that can bridge the gap between Java and CPP players? I also posted the library to the central repository for reference in your own code.

In the future, according to the demand, we will add the mapping of system calls such as fadvise injection and MMAP to this little wheel. Students who are interested in file operation are also welcome to join us. Pull request & Issue are welcome!

Further reading

Best Practices for File IO operations

PolarDB Database Performance Contest Java Players share

Welcome to follow my wechat official account: “Kirito technology sharing”, any questions about this article will be answered, bring more Java related technology sharing.