The article directories

  • Pre
  • I/O programming model
  • Data transmission and conversion costs
  • Data structure application
    • The buffer
    • I/O multiplexing model
  • conclusion
  • QA
    • What is the difference between BIO, NIO and AIO?
    • What is the difference between using coroutines for I/O multiplexing and using threads?


When we deal with network problems, we often deal with I/O issues — inputs and outputs. This may seem complicated, but it boils down to how the data received by the network card is sent to a specific program, which then copies the data to the network card.

When dealing with I/O, think about how to write programs based on specific scenarios. In terms of application API design, we often see three types of design: BIO, NIO, and AIO.

In essence, discussing the differences between BIO, NIO, and AIO is talking about the I/O model. We can think about it in three ways.

  • Programming model: Design API reasonably to make program writing more comfortable.
  • Data transmission and transformation costs: such as reducing the number of data copies, reasonable compression of data, etc.
  • Efficient data structures: make good use of buffers, red-black trees, etc

I/O programming model

Let’s start by discussing the differences between BIO, NIO, and AIO in terms of programming models.

The BIO (Blocking I/O) API is designed to block program calls. Such as:

byte a = readKey()
Copy the code

Suppose the readKey method reads a key from the keyboard. If the design is non-blocking I/O, the readKey will not block the current thread. You might ask: What if the user doesn’t have a button? In a blocking I/O design, the thread blocks and waits for the user to press a key if the user does not press a key. In a non-blocking I/O design, the thread does not block and no key returns a null value, such as NULL.

Finally, AIO (Asynchronous I/O), API design creates an additional timeline. Such as

Func callBackFunction(byte keyCode) {// Handle keys} readKey(callBackFunction)Copy the code

In asynchronous I/O, the readKey method returns directly, but with no result. The result needs to be received by a callBackFunction. From this perspective, there are actually two timelines. The first is the main timeline of the program, where readKey is executed up to readKey. The execution of the callBackFunction will be triggered when the user presses the button, that is, the time is not determined. Therefore, the program in the callBackFunction is another timeline, which is also generated based on this reason. We call it asynchrony, which describes the phenomenon that the timeline cannot be synchronized. You don’t know when the callbackFunction is going to execute.

However, we often say that the x language provides asynchronous I/O, not just the way the above program is written, which creates a problem called callback hell, essentially the asynchronous program’s timeline is distorted, resulting in high maintenance costs.

request("/order/123", (data1) -> {

  / /..

  request("/product/456", (data2) -> {

    // ..

    request("/sku/789", (data3) -> {

      / /...})})})Copy the code

For example, the program above (called callback hell) is expensive to maintain, so when we provide an asynchronous API programming model, we provide a syntax to convert asynchrony into a synchronous program. For example, the following pseudocode:

Future future1 = request("/order/123")

Future future2 = request("/product/456")

Future future3 = request("/sku/789")

// ...

// ...

order = future1.get()

product = future2.get()

sku = future3.get()
Copy the code

The request function is a network call that requests order data with order ID=123. The request function itself does not block and executes immediately, whereas the network call is an asynchronous request that does not end on the next line of Request (“/order/123”) but at some point in the future. Therefore, we encapsulate this asynchronous operation with a Future object. Future.get () is a blocking operation that blocks until the network call returns.

Between request and future.get, there are many other things we can do, such as sending more requests. An operation like a Future that synchronizes asynchronous operations back to the main timeline is called asynchronous transsynchronization, also known as asynchronous programming.

Data transmission and conversion costs

Above we thought about I/O in terms of a programming model. Next we looked at BIO, NIO, and AIO from an internal implementation. Regardless of the I/O model, data is either copied from the network card to the user program (receive) or transferred from the user program to the network card (send).

On the other hand, some data needs to be encoded and decoded, such as data in JSON format. Other data needs to be compressed and decompressed. Data is transferred twice from the nic to the kernel and then to the user program. Note that copying data from one region of memory to another is a CPU-intensive operation. Copying data boils down to doing it byte by byte.

This step from the nic to the kernel space can be controlled by Direct Memory Access (DMA) technology. DMA is a small device that copies data without using the CPU, saving computing resources. Unfortunately, usually when we write programs, we can’t control DMA directly, so DMA is only used by devices to transfer data into memory.

However, this copy from kernel to user space can map data from kernel space to user space using memory mapping techniques.

Data structure application

When dealing with network I/O problems, there is another key issue to pay attention to, is the use of data structures.

The buffer

Buffers are a common data structure used to deal with I/O problems,

  • On the one hand, buffer plays a buffering role. When the instantaneous I/O volume is large, queuing mechanism is used for processing.
  • The buffer, on the other hand, acts as a batch. For example, 1000 I/O requests entering the buffer can be combined into 50 I/O requests, and the overall performance will be improved.

For example, if you have 1000 orders to write to MySQL, if you can merge those 1000 requests into 50 at this point, the number of disk writes will be greatly reduced. Similarly, if there are 10000 network requests, if they can be sent together, the TCP handshake time can be reduced and the connection can be reused to the maximum extent. On the other hand, if these requests are small, you can also glue packets to reuse TCP segments. When working with Web sites, it is common to combine multiple HTTP requests into one to reduce the overall network overhead.

In addition to these two reasons, buffers can reduce the actual demands on memory. It is recommended to use buffers when data is transferred from the network card to the kernel and from the kernel to user space. When a large request is received, abstracting it into a stream and then using a buffer can reduce the stress on memory. This is because with buffers and streams, you don’t really need to prepare memory that is the same size as the requested data. Buffer-sized data can be divided into multiple processing sessions, with the actual memory overhead being the size of the buffer

I/O multiplexing model

When working with data structures, think about what model to use for I/O multiplexing.

Suppose you are dealing with a highly concurrent site, with a large number of requests coming to your server every second. How many threads do you use to handle I/O? For scenarios that do not require compression and decompression, the main overhead of processing I/O is copying data. So how many copies can a CPU core make per second?

Copying, in fact, is to copy the data in memory from one address to another address. Combined with DMA, memory mapping and other technologies, copying is very fast. Regardless of DMA and memory mapping, a 3GHz CPU can copy hundreds of megabytes of data per second. Of course, speed is also limited by the speed of memory itself. So overall, I/O does not require a lot of computing resources. Usually when we are dealing with high concurrency, we do not need a large number of threads to do I/O processing.

For most applications, the cost of processing I/O is less than the cost of processing business. Processing highly concurrent services may require a large amount of computing resources. Each transaction may also require more I/O, such as remote RPC calls.

Therefore, when dealing with high concurrency, a common I/O multiplexing pattern is that a small number of threads handle a large number of network receive and send tasks. More threads, usually a thread pool, then handle the specific business work.

In such a mode, a core problem needs to be solved: when the operating system kernel detects that an I/O operation has occurred, how does it tell which thread to call which program?

At this point, an efficient model would require us to register threads, the types of events that threads listen to, and the programs that respond to them with the kernel. Specifically, when a client sends a message to the server, we need to know as soon as possible which thread cares about the message (processing the data). Epoll, for example, is such a model, with a red-black tree inside. Specifically, we can see that the file descriptor forms a red-black tree, and the nodes of the red-black tree hold the thread corresponding to the file descriptor, the thread-listening event type, and the corresponding program.

What does BIO, AIO and NIO have to do with all this? There are two connections here.

The first is that no matter which programming model needs to use buffers, that is, BIO, AIO, NIO all need buffers, so it matters a lot. Whenever we use any programming model, if we don’t use buffers internally, we must add buffers externally. Another connection is that registration + notification push methods like ePoll can save us a lot of time in locating specific threads and event types. This is a general technique, not one specific I/O model.

However, from the perspective of capability analysis, using a model like epoll, there is no need to block the I/O processing thread, because the operating system will continuously push the events that need to be responded to to the processing thread, so you can consider not blocking the processing thread (such as NIO).


The I/O model is discussed from three aspects.

  • The first is the programming model, where blocking, non-blocking, and asynchronous apis are designed differently. Usually what we call asynchronous programming is asynchronous to synchronous. The greatest value of asynchrony is the readability of the code. Readable, which means lower maintenance costs and higher scalability.
  • Second, when designing system I/O, another consideration is the cost of data transfer and transformation. Transport is mainly copying, for example memory mapping can be used to reduce the transfer of data. Note here, however, that memory maps use memory that is the buffer of kernel space, so don’t forget to recycle. Because this portion of memory is often beyond the control of the memory reclamation mechanism provided by the language we use.
  • Finally, the application of data structure, the use of different buffers for different scenarios, as well as the choice of different message notification mechanism, is also a core problem to deal with high concurrency.

If you look at the I/O model from the above perspectives, you will find that the programming model is the programming model, the data transmission is the data transmission, and the message notification is the message notification. They are different modules, completely decoupled, and can also be selected according to their different business characteristics. Although in a complete system design, often proposed is a complete set of solutions, but in fact we should still separate them to think, so that can produce better design ideas.


What is the difference between BIO, NIO and AIO?

Taken together, these are the three I/O programming models. The BIO interface design directly causes the current thread to block. NIO is not designed to trigger blocking in the current thread. AIO provides asynchronous capabilities for I/O, which means that I/O responders are placed on a separate timeline for execution. But often AIO providers also provide an asynchronous programming model, which implements a data structure that encapsulates asynchronous computations and provides the ability to synchronize asynchronous computations back to the mainline.

Typically, all three apis are accompanied by I/O multiplexing. If the underlying layer manages registered file descriptors and events using a red-black tree, I/O messages can be sent by the kernel to the specified thread with little overhead. In addition, you can optimize I/O using DMA, memory mapping, and so on.

What is the difference between using coroutines for I/O multiplexing and using threads?

A thread is the smallest unit of execution. In I/O multiplexing, a single thread processes a large number of I/ OS. There is also a model for executing programs called coroutines, which are lightweight threads. The operating system allocates execution resources to threads and then schedules them to run. If you want to implement coroutines, you create smaller units of execution on top of the execution resources allocated to the thread. Coroutines are not scheduled by the operating system and share the execution resources of the thread.

The point of I/O multiplexing is to reduce the cost of switching between threads. So by design, threads and coroutines are the same as long as a single thread handles a lot of I/O work. If a single thread processes a large amount of I/O, the use of coroutines also depends on the ability of the corresponding thread to execute the coroutine.