background

Both the past Middleware Performance Challenge and the first PolarDB Data Performance Competition, which is currently underway, have involved file manipulation, with proper architecture design and proper read/write performance of the machine being key to better performance in these competitions. I have received the feedback of several readers of the public account, most of them expressed such annoyance: “very interested in the game, but do not know how to get started”, “can run the result, but compared to the front row of the players, the result is more than 10 times”… In order to encourage readers to participate in similar competitions in the future, I have compiled a brief list of best practices for file IO operations, regardless of the overall system architecture design, and hope that this article will make you happy to participate in similar performance challenges in the future.

Knowledge combing

This article focuses on java-related file operations that require some preconditions to understand, such as PageCache, Mmap, DirectByteBuffer, sequential read/write, random read/write… You don’t have to fully understand them, but at least you know what they are, because this article will focus on them.

Get to know FileChannel and MMAP

First of all, the most important thing in the file IO type competition is to choose the right way to read and write files. The native reading and writing methods can be roughly divided into three types: normal IO, FileChannel, and MMAP. For example, FileWriter and FileReader exist in the java. IO package. They are normal IO. FileChannel exists in the Java. nio package and is a kind of NIO, but note that NIO does not necessarily mean non-blocking. FileChannel here is blocking; More specific is the latter MMAP, which is a special way to read and write files, called memory mapping, derived from the FileChannel call map method.

How to use FIleChannel:

FileChannel fileChannel = new RandomAccessFile(new File("db.data"), "rw").getChannel();
Copy the code

To obtain MMAP, proceed as follows:

MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, filechannel.size();
Copy the code

MappedByteBuffer is the Operation class of MMAP in JAVA.

The traditional IO approach to byte transfer was deprecated, and we focused on the difference between FileChannel and MMAP.

FileChannel, speaking, reading and writing

/ / write
byte[] data = new byte[4096];
long position = 1024L;
// Specify position to write 4KB of data
fileChannel.write(ByteBuffer.wrap(data), position);
// Write 4KB of data from the current file pointer
fileChannel.write(ByteBuffer.wrap(data));

/ / read
ByteBuffer buffer = ByteBuffer.allocate(4096);
long position = 1024L;
// Specify position to read 4KB of dataFileChannel. Read (buffer, position);// Read 4KB of data from the current file pointer
fileChannel.read(buffer);
Copy the code

FileChannel works most of the time with the ByteBuffer class, which you can think of as a wrapper class for byte[] that provides a rich API for manipulating bytes. If you don’t know anything about FileChannel, you can familiarize yourself with the API. It’s worth noting that both write and read methods are thread-safe, and FileChannel internally uses a private final Object positionLock = new Object(); Lock to control concurrency.

Why is FileChannel faster than regular IO? This may be a bit of a stretch, but if you’re using it right, FileChannel only works well when you write an integer multiple of 4KB at a time, thanks to FileChannel’s use of memory buffers like ByteBuffer, which allows you to control the size of the disk very precisely. This is something you can’t do with normal IO. Is 4KB necessarily fast? It depends on the disk structure of your machine, and is affected by the operating system, file system, and CPU. For example, the disk in the middleware Performance Challenge should write at least 64KB at a time to get the highest IOPS.

PolarDB, on the other hand, is extremely tough. The exact performance is still in progress, but with the skills of Benchmark Everyting, we can find out.

The other thing that makes FileChannel so efficient is that I want to ask a question before I introduce it: Does FileChannel write data directly from the ByteBuffer to disk? Think for a few seconds… The answer is: NO. The data in the ByteBuffer is separated from the data on disk by one layer. This layer is called PageCache, which acts as a buffer between user memory and disk. We all know that the speed difference between disk IO and memory IO is several orders of magnitude. We can think of filechannel.write as completing the drive to the PageCache, but in fact, the operating system does the final write to the PageCache for us, understanding this concept, You should be able to understand why FileChannel provides a force() method to notify the operating system to flush in a timely manner.

Similarly, when we use FileChannel to read, we experience the following: Disk ->PageCache-> user memory these three stages, for daily users, you can ignore PageCache, but as a challenger competition, PageCache in the tuning process is absolutely can’t be ignored, on the read operation here do not do too much introduction, we will be mentioned again in the following summary, Think of this as the introduction to the concept of PageCache.

MMAP, speaking, reading and writing

/ / write
byte[] data = new byte[4];
int position = 8;
// Write 4b from the current mmap pointer
mappedByteBuffer.put(data);
// Specify position to write 4b data
MappedByteBuffer subBuffer = mappedByteBuffer.slice();
subBuffer.position(position);
subBuffer.put(data);

/ / read
byte[] data = new byte[4];
int position = 8;
// Read 4b from the current mmap pointerMappedByteBuffer. Get (data);// Specify position to read 4b data
MappedByteBuffer subBuffer = mappedByteBuffer.slice();
subBuffer.position(position);
subBuffer.get(data);
Copy the code

FileChannel is powerful enough, but what else can MappedByteBuffer play with? Please allow me to introduce the use of MappedByteBuffer first.

When we run filechannel. map(filechannel.mapmode. READ_WRITE, 0, 1.5 * 1024 * 1024 * 1024); Then, observing the changes on the disk, you immediately get a 1.5-gigabyte file, but at this point the contents of the file are all zeros (bytes of zeros). Whatever we do to the MappedByteBuffer in memory will eventually be mapped to that file.

Mmap maps the file to the user in the space of virtual memory, leave out from the kernel buffer is copied to the user space, the process of the positions of the files in virtual memory with the corresponding address, can operate the file like operating memory, is equivalent to have the entire file into memory, but not before really to use these data consumption physical memory, There will be no read/write operations to the disk. Only when the data is actually used, that is, when the image is ready to be rendered on the screen, VMS loads the corresponding data blocks from the disk to physical memory for rendering according to the page missing loading mechanism. This way of reading and writing files is efficient without copying data from the kernel cache to user space

After reading the slightly more official description, you might be a little curious about MMAP, and why FileChannel exists, given the existence of such dark technology. And many articles on the Internet say that MMAP is an order of magnitude better than FileChannel for large files! However, from my experience, MMAP is not a silver bullet for file IO, and only performs slightly better than FileChannel when writing very small amounts of data at once. And then I’ll tell you something that frustrates you. At least in JAVA, using the MappedByteBuffer is very cumbersome and painful. There are three main things:

  1. MMAP must be used to specify the size of the memory map, and the size of a map is limited to about 1.5G, repeated map will bring virtual memory recycling, redistribution of the problem, for the uncertain size of the file is too unfriendly.
  2. MMAP uses virtual memory, and the same as PageCache is controlled by the operating system to brush disk, although you can manually control by force(), but this time is not good, in small memory scenarios will be very headache.
  3. When the MMAP buffer is no longer needed, you can manually release the occupied virtual memory, but… In a very weird way.
public static void clean(MappedByteBuffer mappedByteBuffer) {
    ByteBuffer buffer = mappedByteBuffer;
    if (buffer == null| |! buffer.isDirect() || buffer.capacity() ==0)
        return;
    invoke(invoke(viewed(buffer), "cleaner"), "clean");
}

private static Object invoke(final Object target, final String methodName, finalClass<? >... args) {
    return AccessController.doPrivileged(new PrivilegedAction<Object>() {
        public Object run(a) {
            try {
                Method method = method(target, methodName, args);
                method.setAccessible(true);
                return method.invoke(target);
            } catch (Exception e) {
                throw newIllegalStateException(e); }}}); }private static Method method(Object target, String methodName, Class
       [] args)
        throws NoSuchMethodException {
    try {
        return target.getClass().getMethod(methodName, args);
    } catch (NoSuchMethodException e) {
        returntarget.getClass().getDeclaredMethod(methodName, args); }}private static ByteBuffer viewed(ByteBuffer buffer) {
    String methodName = "viewedBuffer";
    Method[] methods = buffer.getClass().getMethods();
    for (int i = 0; i < methods.length; i++) {
        if (methods[i].getName().equals("attachment")) {
            methodName = "attachment";
            break;
        }
    }
    ByteBuffer viewedBuffer = (ByteBuffer) invoke(buffer, methodName);
    if (viewedBuffer == null)
        return buffer;
    else
        return viewed(viewedBuffer);
}
Copy the code

That’s right, you read that right. The code is so long that it does just one thing: recycle the MappedByteBuffer.

I recommend that FileChannel be used first for the initial code submission, and then the MMAP implementation in cases where a small amount of data (such as a few bytes) must be flushed. In other scenarios, FileChannel can cover completely (if you understand how to use FileChannel properly). As to why MMAP performs better than FileChannel when writing a small amount of data at a time, I haven’t looked up a theory yet, so if you have any clues, please leave a comment. In theory, FileChannel also writes to memory, but has one more copy between the kernel buffer and user space than MMAP, so MMAP performs better in extreme scenarios. As for whether the virtual memory allocated by MMAP is the real PageCache, I think it can be approximated as PageCache.

Sequential read is faster than random read, and sequential write is faster than random write

This is true whether you are a mechanical hard disk or an SSD, although the reasons behind it are different. We won’t talk about mechanical hard disks today, the ancient storage medium, but foucs on SSDS, to see why random reads and writes on them are slower than sequential reads and writes. Even if the composition of SSDS and file systems varies, our analysis today is still useful.

First of all, what is sequential reading, what is random reading, what is sequential writing, what is random writing? Maybe we just contact file IO operation will not have such doubts, but write write, they began to doubt their own understanding, do not know if you have experienced such a similar stage, anyway, I have a period of time did doubt. So, let’s look at two pieces of code:

Write mode 1:64 threads. The user uses an atomic variable to record the location of the pointer and writes concurrently

ExecutorService executor = Executors.newFixedThreadPool(64);
AtomicLong wrotePosition = new AtomicLong(0);
for(int i=0; i<1024; i++){final int index = i;
    executor.execute(()->{
        fileChannel.write(ByteBuffer.wrap(new byte[4*1024]),wrote.getAndAdd(4*1024)); })}Copy the code

Write mode 2: Locks write to ensure synchronization.

ExecutorService executor = Executors.newFixedThreadPool(64);
AtomicLong wrotePosition = new AtomicLong(0);
for(int i=0; i<1024; i++){final int index = i;
    executor.execute(()->{
        write(new byte[4*1024]); })}public synchronized void write(byte[] data){
    fileChannel.write(ByteBuffer.wrap(new byte[4*1024]),wrote.getAndAdd(4*1024));
}
Copy the code

The answer is that mode two counts as sequential write, and the same is true of sequential read. For file operations, locking is not a very terrible thing, dare not write/read synchronization is terrible! One might ask: Why do FileChannel synchronize itself when it already has a positionLock inside of it to ensure thread-safe writes? Why is this faster? My answer in plain English is that multiple threads concurrent write without synchronization will lead to empty files, and its execution order may be

Thread1 write position[0~4096]

Sequence 2: Thread3 write position[8194~12288]

Thread2 write position[4096~8194]

So it’s not exactly “write in order.” But don’t worry about the performance impact of locking. We’ll summarize an optimization that uses file sharding to reduce lock collisions when multiple threads read and write.

Why are sequential reads faster than random reads? Why is sequential writing faster than random writing? Both of these comparisons are actually the same thing at work: PageCache, which we’ve already mentioned, is a layer of cache between the application buffer and the disk file.

In the case of sequential reads, when the user initiates a Filechannel.read (4KB), two things actually happen

  1. The operating system loads 16KB from the disk into the PageCache, which is called prereading
  2. Copy 4KB from PageCache into user memory

We ended up with 4KB in user memory. Why is sequential read faster? As you can imagine, when the user continues to access the next [4KB, 16KB] disk, it is directly from the PageCache. Think about it: when you need to access 16KB of disk content, is it four disk I/O fast or one disk I/O+ four memory I/O fast? The answer is obvious, all of this is an optimization brought about by PageCache.

Deep thought: Does PageCache allocation suffer when memory is tight? How to determine the PageCache size, is it fixed at 16KB? Can I monitor the PageCache hit? In what scenarios does PageCache fail, and if it does, what are the remedies?

I will ask myself a simple question and answer myself, and the logic behind it still needs to be considered by the reader:

  • When memory is tight, the PageCache preread will be affected, measured, and did not find literature support
  • PageCache is dynamically adjusted and can be adjusted using Linux system parameters. By default, it takes up 20% of the total memory
  • Github.com/brendangreg… There’s a github tool that monitors PageCache
  • This is an interesting optimization point, if you use PageCache to do the cache is not controllable, might as well do their own preread how?

The principle of sequential writing and sequential reading are consistent, are influenced by PageCache, left to the reader to consider.

Direct memory (out of the heap) VS in the heap

In the previous FileChannel sample code, we already used in-heap memory: bytebuffer.allocate (4 * 1024). ByteBuffer provides another way to allocate out-of-heap memory: ByteBuffer. AllocateDirect (4 * 1024). This raises a series of questions, when should I use in-heap memory and when should I use direct memory?

I’m not going to spend too much time on this, but I’m going to compare it:

Within the heap memory Out of memory
The underlying implementation Array, JVM memory Unsafe. allocateMemory(size) Returns direct memory
Allocation size limit -XMS-XMX configuration is related to JVM memory, and the array size is limited. When the JVM free memory is greater than 1.5G, bytebuffer.allocate (900M) will report an error This can be limited at the JVM level by the -xx :MaxDirectMemorySize parameter, but also by the machine’s virtual memory (physical memory is not quite right)
The garbage collection Don’t need to say more When DirectByteBuffer is no longer in use, the internal cleaner hook is used. To be on the safe side, consider manually recycling :((DirectBuffer) buffer).cleaner().clean();
Copy the way User mode <-> Kernel mode Kernel mode

Some best practices for in-heap and out-of-heap memory:

  1. When large chunks of memory need to be allocated, in-heap memory is limited and only out-of-heap memory can be allocated.
  2. Out-of-heap memory is suitable for objects with medium or long lifetimes. (If the object has a short lifetime, it is collected in YGC, so there is no performance impact of large memory and long lifetime objects in FGC.)
  3. Direct file copy operations, or I/O operations. Using memory directly out of the heap reduces the memory consumption of copying from user memory to system memory
  4. You can also use the pool + out-of-heap memory combination to reuse out-of-heap memory for objects that have a short lifetime but are involved in I/O operations (as is the case in Netty). During the game, try not to appear frequentlynew byte[], creating memory areas and then recycling is also a large amount of overhead, usingThreadLocal<ByteBuffer>ThreadLocal<byte[]>Will often bring you a surprise ~
  5. Creating out-of-heap memory costs more than creating in-heap memory, so when out-of-heap memory is allocated, reuse it as much as possible.

UNSAFE

public class UnsafeUtil {
    public static final Unsafe UNSAFE;
    static {
        try {
            Field field = Unsafe.class.getDeclaredField("theUnsafe");
            field.setAccessible(true);
            UNSAFE = (Unsafe) field.get(null);
        } catch (Exception e) {
            throw newRuntimeException(e); }}}Copy the code

There are a lot of things that you can do with the dark magic of UNSAFE, and I’m just going to talk a little bit about that.

Implement direct memory and memory copy:

ByteBuffer buffer = ByteBuffer.allocateDirect(4 * 1024 * 1024);
long addresses = ((DirectBuffer) buffer).address();
byte[] data = new byte[4 * 1024 * 1024];
UNSAFE.copyMemory(data, 16.null, addresses, 4 * 1024 * 1024);
Copy the code

The copyMemory method can realize the copy between memory, whether in the heap or out of the heap, 1 or 2 parameters are the source side, 3 or 4 is the target side, the fifth parameter is the size of the copy. If it is an array of bytes in the heap, pass the first address of the array and 16, the fixed ARRAY_BYTE_BASE_OFFSET offset constant; If it is out-of-heap memory, pass null and the offset of direct memory, which can be obtained by ((DirectBuffer) buffer).address(). Why don’t you just copy it, instead of using UNSAFE? Because it’s fast, of course! The young! Also, the MappedByteBuffer can be copied using its UNSAFE mode, so it can write/read to the disk.

UNSAFE and all those dark technologies, you can talk about them, but I won’t go into much detail here.

File partition

I’ve already mentioned that we need to lock write and read for sequential reads, and I can’t stress enough that locking is not a terrible thing. File IO operations are not that dependent on multiple threads. However, sequential reads and writes after the lock will not be able to full disk IO, today’s strong SYSTEM CPU must not be pressed, right? We can use the file partition method to achieve the effect of killing two birds with one stone: it satisfies the sequential read and write, and reduces the lock conflict.

So the question is, how much is appropriate? More files, less lock conflicts; If there are too many files, too much fragmentation, and too few individual file values, the cache is not easy to hit. How to balance such a trade off? No theoretical answer, benchmark everything~

Direct IO

And finally, let’s talk about a way of doing IO that we haven’t talked about before, Direct IO, what, Java and this thing? Blogger you lied to me? How did you tell me that there were only three IO modes? Don’t call me names, this is not exactly how JAVA native support works, but it can be done by calling native methods through JNA/JNI. As we can see from the image above, Direct IO bypasses The PageCache, but as we said before, PageCache is a good thing, so why not use it? On closer inspection, there are some scenarios where Direct IO can be useful, and yes, that’s the one we didn’t mention so much earlier: random reads. When using an IO such as fileChannel.read() that triggers a pre-read to the PageCache, we don’t really want the operating system to do much for us, unless we’re really lucky and random reads hit the PageCache, which we do. Direct IO, though often ignored by Linus, has its own value in random read scenarios, reducing the overhead of Block IO Layed to Page Cache.

Anyway, how does Java use Direct IO? Are there any restrictions? As mentioned earlier, Java is not currently supported native, but there are good people who have encapsulated Java JNA libraries to implement Java Direct IO, github: github.com/smacke/jayd…

int bufferSize = 20 * 1024 * 1024;
DirectRandomAccessFile directFile = new DirectRandomAccessFile(new File("dio.data"), "rw", bufferSize);
for(int i= 0; i< bufferSize /4096; i++){byte[] buffer = new byte[4 * 1024];
    directFile.read(buffer);
    directFile.readFully(buffer);
}
directFile.close();
Copy the code

It should be noted, however, that only Linux systems support DIO! So, boy, it’s time to get your hands dirty and install a Linux. It’s worth noting that Direct IO is said to be getting native support after the RELEASE of Jdk10. Let’s wait and see.

conclusion

The above are all personal experience accumulated from practice. Some conclusions have not been supported by literature, so if there is any mistake, please correct me. About PolarDB stage for the performance analysis of data, such as the semi finished I’ll start a new article specifically, under the analysis of specific on how to use the optimization point, of course, these tips are a lot of people know, decided to the final result is the overall design of architecture, as well as to the file IO, operating system, file system, CPU and the understanding of language features. This is not a popular performance challenge for JAVA, but it is still a lot of fun. I hope this knowledge of file IO will help you, and I will see you next time

Welcome to follow my wechat official account: “Kirito technology sharing”, any questions about the article will be answered, bring more Java related technology sharing.