preface

  • There are several IO models provided by Unix. What are they?
  • What are the characteristics of various IO models? What’s the difference?
  • What’s the difference between blocking, non-blocking, synchronous and asynchronous?
  • Why is Epoll efficient?

An overview of the

The steps involved in a normal input operation

  • Wait for the data to be ready
  • Copy data from the kernel to the process

The steps involved in network data entry

  • Wait for data to arrive from the network, and when it arrives, it is copied to the kernel buffer
  • Copy data from the kernel buffer to the application buffer

IO Model Introduction

Block type IO

  • Using system calls and blocking until the kernel is ready, and then copying from the kernel buffer to user state, nothing can be done while waiting for the kernel to be ready
  • The following function call blocks until the data is ready and copied from the kernel to the user program. This IO model is blocking IO
  • Blocking IO is the most popular IO model

Non-blocking IO

  • The kernel returns an error code when data is not ready, and instead of sleeping, the calling program constantly polls the kernel to see if the data is ready
  • The following function is called if the data is not ready, instead of being blocked like blocking IO, it returns an error code. When the data is ready, the function returns successfully.
  • The application calls such a non-blocking descriptor loop as polling.
  • Polling for non-blocking IO is cpu-intensive and is usually used on a system dedicated to a particular function. You can use this feature by setting non-blocking for the descriptor properties of the socket

IO multiplexing

  • Similar to non-blocking, except that polling is not performed by the user thread, but by the kernel. When the kernel listener listens to the data, it calls the kernel function to copy the data to the user state
  • The select system call, which acts as the proxy class, polls all the file descriptors registered with it that require IO. When it has results, it tells the recvFROM function that it wants to fetch the data itself
  • IO multiplexing has at least two system calls. If there is only one proxy object, the performance is not as good as the previous IO model, but it is better because it can listen on many sockets at the same time
  • Multiplexing includes:
    • Select: Linearly scans all listened file descriptors, whether they are active or not. Maximum number limit (1024 for 32-bit systems, 2048 for 64-bit systems)
    • Poll: The same as select, but with a different data structure, a PollFD array is allocated and maintained in the kernel. It has no size limit, but requires a lot of copying
    • Epoll: Used in place of poll and select. There is no size limit. Manage multiple file descriptors using a single file descriptor and use red-black tree storage. It also replaces polling with event-driven. The file descriptor registered in epoll_ctl is activated by the callback mechanism when the event is triggered. Epoll_wait is notified. Finally, EPoll also uses MMAP virtual memory mapping technology to reduce the overhead of user – and kernel-mode data transmission

Signal-driven IO

  • Using signals, the kernel notifies when data is ready
  • First turn on the signal-driven IO socket and use the SIGAction system call to install the signal handler. The kernel returns directly without blocking the user mode
  • When the data is ready, the kernel sends a SIGIO signal, which starts IO operations

Asynchronous I/o

  • Asynchronous IO relies on signal handlers for notifications
  • The difference between the asynchronous I/O model and the previous one is that both blocking and non-blocking are performed in the data preparation phase. The asynchronous I/O model notifies the completion of the I/O operation rather than the completion of data preparation
  • Asynchronous IO is truly non-blocking, with the main process doing its own thing and processing the data through callback functions when the IO operation is complete (data is successfully copied from the kernel cache to the application buffer)
  • Asynchronous IO functions in Unix start with aio_ or lio_

Comparison of IO models

  • The main difference between the previous four IO models is in phase 1, and they are the same in phase 2: data is blocked during copying from the kernel buffer to the caller buffer!
  • The first four types of IO are synchronous IO: the IO operation causes the requestor process to block until the IO operation completes
  • Asynchronous I/O: THE I/O operation does not block the request process

reference

Unix Network Programming. Volume 1