See a look to understand: zero copy principle in detail

preface

Hello, everyone. I’m Tian Luo, a programmer.

Zero copy is a cliche that big companies like to ask. Why Kafka is fast, why RocketMQ is fast, etc., all involve zero copy knowledge. Recently, several partners in the technology discussion group shared the interview questions of Ali and Xiapei, which also involve zero copy. So this article is going to learn the zero-copy principle with you.

What is zero copy
Traditional I/O execution process
Zero copy related knowledge point review
Several ways to implement zero copy
Java provides zero-copy mode

Public number: a boy picking up snails

1. What is zero copy

Zero copy literally means two things, “zero” and “copy” :

Copy: Data is transferred from one storage area to another storage area.
Zero: indicates that the number of times is 0, which indicates that the number of times of copying data is 0.

Taken together, zero copy means there is no need to copy data from one storage area to another.

Zero-copy means that the CPU does not need to copy data from one storage area to another when performing I/O operations on a computer, thus reducing context switching and CPU copy time. It is an I/O operation optimization technique.

2. Traditional I/O execution process

Do server-side development partners, file download function should be implemented a lot of it. If you are implementing a Web application and the front-end request comes in, the server’s job is to send the files on the server’s host disk from the socket that has been connected. The key implementation code is as follows:

while((n = read(diskfd, buf, BUF_SIZE)) > 0)
    write(sockfd, buf , n);
Copy the code

Traditional IO processes include read and write processes.

read: Reads data from disk into the kernel buffer and copies it to the user buffer
write: Writes data to the socket buffer first and then to the nic device.

The flow chart is as follows:

The user application process invokes the read function to make IO calls to the operating system, changing the context from user mode to kernel mode (switch 1)
The DMA controller reads data from disk into the kernel buffer.
The CPU copies the kernel buffer data to the user application buffer, and the context changes from kernel state to user state (switch 2). Read returns
The user application process initiates IO calls through write function, and the context changes from user mode to kernel mode (switch 3)
The CPU copies the data in the application buffer to the socket buffer
The DMA controller copies data from the socket buffer to the nic device, the context is switched from kernel mode back to user mode (switch 4), and the write function returns

As can be seen from the flow chart, the traditional IO read and write process includes 4 context switches (4 user mode and kernel mode switches), 4 data copies (2 CPU copies and 2 DMA copies). What is DMA copy? Let’s review the operating system knowledge involved in zero copy.

3. Zero copy related knowledge point review

3.1 Kernel space and user space

The applications running on our computers actually need to go through the operating system to do some special operations, such as disk file reading and writing, memory reading and writing, and so on. Because these are relatively dangerous operations, can not be messed up by the application, only to the underlying operating system.

Therefore, the operating system allocates memory space for each process, partly user space and partly kernel space. Kernel space is the area accessed by the operating system kernel and is the protected memory space, while user space is the area of memory accessed by user applications. A 32-bit operating system, for example, allocates 4 gigabytes (2 ^ 32) of memory for each process.

Kernel space: mainly provides functions such as process scheduling, memory allocation, and connecting hardware resources
User space: space provided to each program process. It does not have access to kernel-space resources. If an application needs to use kernel-space resources, it needs to use system calls to do so. The process is switched from user space to kernel space, and then from kernel space to user space.

3.2 What are user mode and kernel mode

If the process is running in kernel space, it is called the kernel state of the process
If the process is running in user space, it is called the user state of the process.

3.3 What is Context Switching

What is context?

A CPU register is a small but fast memory built into the CPU. The program counter, on the other hand, is used to store the location of the instruction that the CPU is executing, or the location of the next instruction that will be executed. These are the environments that the CPU must depend on before running any task, hence the name CPU context.

What is CPU context switching?

It means that the CPU context of the previous task (i.e., the CPU registers and program counters) is saved, the context of the new task is loaded into these registers and program counters, and the new task is skipped to the new location indicated by the program counter.

Context switching refers to switching processes or threads on the CPU by the kernel (the core of the operating system). The transition from user mode to kernel mode is completed by system call. A CPU context switch occurs during a system call.

The original user mode instruction location in the CPU register needs to be saved first. Then, in order to execute the kernel code, the CPU register needs to be updated to the new location of the kernel instruction. The last step is to jump to kernel mode and run the kernel task.

3.4 Virtual Memory

Modern operating systems use virtual memory, that is, virtual addresses instead of physical addresses. Using virtual memory can have two benefits:

Virtual memory space can be much larger than physical memory space
Multiple virtual memories can point to the same physical address

Multiple virtual memory can point to the same physical address. You can map the virtual addresses of kernel space and user space to the same physical address, thus reducing the number of IO copies

3.5 the DMA technology

DMA stands for Direct Memory Access. DMA is essentially a separate chip on the motherboard that allows direct IO data transfer between peripherals and memory storage, without CPU involvement.

Let’s take a look at the IO process and what DMA helps do.

The user application process invokes the read function and makes an IO call to the operating system, which blocks and waits for data to return.
After the CPU receives the instruction, it initiates instruction scheduling to the DMA controller.
The DMA receives the IO request and sends it to disk.
The disk puts the data into the disk control buffer and notifies the DMA
DMA copies data from the disk controller buffer to the kernel buffer.
The DMA signals to the CPU that the data has been read, transfers the work to the CPU, which copies the data from the kernel buffer to the user buffer.
The user application process is switched from the kernel state to the user state, and the blocking state is removed

As you can see, what DMA does is very clear. It basically helps the CPU forward IO requests and copy data. Why is it needed?

The main is efficiency, it helps the CPU to do things, at this time, the CPU can be free to do other things, improve the CPU efficiency. In plain English, the CPU was too busy and tired, so he hired a younger brother (named DMA) to do some of the copying for him so that the CPU could move on to other things.

4. Several ways to implement zero copy

Zero copy does not mean no data is copied, but it reduces the number of user/kernel mode switches and CPU copy times. Zero copy implementation can be implemented in several ways, respectively

mmap+write
sendfile
Sendfile with DMA collect copy function

4.1 Zero copy implementation of MMAP +write

The mMAP function prototype is as follows:

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
Copy the code

Addr: specifies the address of the mapped virtual memory
Length: indicates the length of the mapping
Prot: indicates the protection mode of mapped memory
Flags: Specifies the type of mapping
Fd: file handle for mapping
Offset: indicates the offset of the file

In the previous section, we reviewed zero copy. We introduced virtual memory, which can map virtual addresses of kernel space and user space to the same physical address, thus reducing the number of data copies! Mmap takes advantage of this virtual memory feature. It maps the read buffer in the kernel to the buffer in user space, and all the IO is done in the kernel.

The zero-copy process of mMAP + Write is as follows:

User process passesMmap methodMake an IO call to the operating system kernel,Context is switched from user mode to kernel mode.
The CPU uses a DMA controller to copy data from the hard disk to the kernel buffer.
The context switches from kernel state back to user state, and the MMAP method returns.
User process passeswriteMethod makes an IO call to the operating system kernel,Context is switched from user mode to kernel mode.
The CPU copies data from the kernel buffer to the socket buffer.
The CPU uses the DMA controller to copy data from the socket buffer to the nic, switch the context from kernel mode back to user mode, and return the write call.

As you can see, the mMAP +write implementation has zero copies, I/O has four context switches between user space and kernel space, and three data copies. The three data copies include two DMA copies and one CPU copy.

Mmap maps the address of the read buffer to the address of the user buffer. The kernel buffer and the application buffer are shared, so a CPU copy is saved. The user process memory is virtual, and only the read buffer is mapped to the kernel, saving half of the memory space.

4.2 Zero copy of SendFile implementation

Sendfile is a system call function introduced after the Linux2.1 kernel version. The API is as follows:

ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);
Copy the code

Out_fd: indicates the file descriptor of the content to be written, which is a socket descriptor. .
In_fd: indicates the file descriptor of the content to be read. It must be a real file, not a socket or a pipe.
Offset: specifies the starting position of the read file. If NULL, it indicates the default starting position of the file.
Count: Specifies the number of bytes transferred between FDout and fdin.

Sendfile stands for transferring data between two file descriptors. It operates within the operating system kernel to avoid copying data from the kernel buffer to the user buffer, so it can be used for zero-copy.

The zero-copy process of SendFile is as follows:

The user process initiates the sendFile system call, and the context (switch 1) changes from user to kernel mode
DMA controller that copies data from disk to kernel buffer.
The CPU copies data from the read buffer to the socket buffer
The DMA controller asynchronously copies data from the socket buffer to the nic,
Context (switch 2) Switches from kernel state back to user state, sendfile call returns.

You can see that the sendFile implementation has zero copy, I/O has two context switches between user space and kernel space, and three data copies. The three data copies include two DMA copies and one CPU copy. Can we reduce the number of CPU copies to zero? Yes, sendFile with DMA collect copy!

4.3 SendFile +DMA Scatter/Gather Zero copy

After Linux 2.4, sendFile was optimized and upgraded to include SG-DMA, which is a SCATTER/Gather operation for DMA copies, which reads data directly from the kernel space buffer to the nic. Use this feature for zero copy, that is, one more CPU copy can be saved.

The zero-copy process of SendFile +DMA Scatter/Gather is as follows:

The user process initiates the sendFile system call, and the context (switch 1) changes from user to kernel mode
DMA controller that copies data from disk to kernel buffer.
The CPU sends the file descriptor information in the kernel buffer (including the memory address and offset of the kernel buffer) to the socket buffer
The DMA controller copies data directly from the kernel buffer to the nic based on the file descriptor information
Context (switch 2) Switches from kernel state back to user state, sendfile call returns.

It can be found that the zero copy of SendFile +DMA Scatter/Gather implementation, I/O has two context switches between user space and kernel space, and two data copies. Two of the data copies are packet DMA copies. This is true zero-copy technology. There is no TRANSFER of data through the CPU at all times. All data is transferred through DMA.

5. Zero-copy mode provided by Java

Java NIO support for MMAP
Java NIO support for SendFile

5.1 Java NIO support for MMAP

Java NIO has a MappedByteBuffer class that can be used to implement memory mapping. Underneath it is an API that calls the Linux kernel’s Mmap.

A small demo of Mmap is as follows:

public class MmapTest { public static void main(String[] args) { try { FileChannel readChannel = FileChannel.open(Paths.get("./jay.txt"), StandardOpenOption.READ); MappedByteBuffer data = readChannel.map(FileChannel.MapMode.READ_ONLY, 0, 1024 * 1024 * 40); FileChannel writeChannel = FileChannel.open(Paths.get("./siting.txt"), StandardOpenOption.WRITE, StandardOpenOption.CREATE); // Data transfer writechannel. write(data); readChannel.close(); writeChannel.close(); }catch (Exception e){ System.out.println(e.getMessage()); }}}Copy the code

5.2 Java NIO support for SendFile

FileChannel transferTo()/transferFrom(), the underlying sendFile () system call function. Kafka is an open source project that uses it. Ask your interviewer why you are so quick to mention sendFile zero.

@Override
public long transferFrom(FileChannel fileChannel, long position, long count) throws IOException {
   return fileChannel.transferTo(position, count, socketChannel);
}
Copy the code

A small demo of Sendfile is as follows:

public class SendFileTest { public static void main(String[] args) { try { FileChannel readChannel = FileChannel.open(Paths.get("./jay.txt"), StandardOpenOption.READ); long len = readChannel.size(); long position = readChannel.position(); FileChannel writeChannel = FileChannel.open(Paths.get("./siting.txt"), StandardOpenOption.WRITE, StandardOpenOption.CREATE); Readchannel. transferTo(position, len, writeChannel); readChannel.close(); writeChannel.close(); } catch (Exception e) { System.out.println(e.getMessage()); }}}Copy the code

Reference and thanks

Framework: small white can also understand the principle of Linux zero copy
In-depth analysis of Linux IO principle and several zero copy mechanism implementation
Ali: What is MMAP?