Read about zero copy technology

Zero copy technology is a key technology for writing high performance servers. Before introducing zero copy technology, I will explain user space and kernel space.

The user space

In layman’s terms, user space is the virtual memory space that runs user-written applications. On 32-bit operating systems, each process has 4GB of independent virtual memory space, and 0 to 3GB of virtual memory space is user space.

The kernel space

Kernel space is the virtual memory space where the operating system code is running, and 3GB to 4GB of virtual memory space is the kernel space.

Figure 1 shows where user space and kernel space are in the process virtual memory space:

Send a file

Why user space and kernel space?

Let’s start by recalling what a server typically does to send a file to a client. In general, the server sends a file to the client as follows:

  • You first need to call read to read the file’s data into the user-space buffer.

  • Write is then called to send the buffer’s data to the client Socket.

    The pseudocode is as follows:

while ((n = read(file, buf, 4069)) > 0) {
    write(sock, buf , n);
}
Copy the code

In the above procedure, two system calls, read and write, are called. The read system call reads data from a file into a buffer in user space, so the read call needs to copy data from kernel space into user space, as shown in Figure 2:

Figure 2 shows the data replication process, first reading data from the file into the kernel’s Page cache, and then copying data from the page cache into the user-space buffer.

When a write system call is called to send data from the user-space buffer to the client Socket, the buffer data is first copied to the kernel Socket buffer, and the nic driver sends the Socket buffer data out, as shown in Figure 3:

As can be seen from the figure above, the server sends the file to the client twice, the first time from the page cache of the kernel space to the buffer of the user space, and the second time from the buffer of the user space to the Socket buffer of the kernel space.

A closer look shows that the page cache in the figure above can be copied directly to the Socket buffer, not to the user-space buffer. As shown in Figure 4:

As shown above, technologies that do not require user space for data transfer are called zero-copy technologies. So, which system call can we use to implement the technique above? The answer is sendfile. Let’s look at the sendFile system call prototype:

#include <sys/sendfile.h>

ssize\_t sendfile(int out\_fd, int in\_fd, off\_t \*offset, size\_t count);
Copy the code

Here’s what the sendfile parameters do:

  • Out_fd: data receiver file handle (usually Socket handle).

  • In_fd: data provider file handle (usually file handle).

  • Offset: If offset is not NULL, it indicates the offset from where data is sent.

  • Count: Indicates the number of bytes of data to be sent.

The process of sendFile sending data is shown in Figure 5:

Comparing Figure 5 with Figure 3, we find that using SendFile saves one system call and one data copy.

conclusion

This article mainly introduces the zero-copy technology through sendFile system call, but the zero-copy technology is not only sendfile, such as Mmap, SPLice and direct I/O are the implementation of zero-copy technology, you can refer to the official Linux documentation or related materials.