Talk about the time it takes to connect to TCP. There are a few changes.

1. Introduction to a series of articles

1.1 Purpose

As an INSTANT messaging developer, you already know the concepts of high performance, high concurrency, and thread pooling, zero copy, multiplexing, event-driven, epoll, and so on. Or you might be familiar with a technical framework that features these features: Proficient in Java Netty, Php Workman, Go Nget, etc. But when it comes to the real face to face or in the process of technical practice, it is impossible to let go of the doubts, only to know that what they have mastered is just the surface.

What is the underlying principle behind these technical features? Understanding the fundamentals of high Performance, High Concurrency at its roots is what this series of articles is all about: understanding the fundamentals of high Performance, High Concurrency.

1.2 Article origin

I (Jack Jiang) have organized quite a number of resources and articles related to IM, message push and other instant communication technologies for instant messenger network, from the open source IM framework MobileIMSDK at the beginning, to the online version of the classic book of network programming “TCP/IP Details”, and then to the programmatic article of IM development “A Beginner’s Introduction is enough: From zero development of mobile TERMINAL IM, as well as network programming from shallow to deep “Network programming lazy introduction”, “Brain disability network programming introduction”, “high-performance network programming”, “unknown network programming” series of articles.

The deeper I went, the less I knew about instant messaging technology. Later, in order to help developers better understand the characteristics of the network (especially mobile network) from the perspective of basic telecom technology, I collected a series of advanced articles titled “Introduction to Zero-Base Communication Technology for IM Developers”. This series of articles is already at the edge of the network communication technology knowledge of the average IM developer, and together with the network programming materials, it is almost enough to solve the knowledge blind spots in network communication.

Knowledge of network communication is certainly important for the development of systems such as instant messaging, but what is the nature of getting back to the technical essentials of network communication itself: thread pooling, zero copy, multiplexing, event-driven, and so on mentioned above? What is the underlying principle? That is the purpose of this series of articles, which I hope will prove useful to you.

1.3 Article Contents

“Understanding High Performance, High Concurrency from the Roots: Understanding Threads and Thread Pools” (* this article)

Understanding High Performance, High Concurrency at its Roots (PART 2) : Getting deep into operating Systems, Understanding I/O and Zero-copy Technologies (to be released later..)

Understanding High Performance, High Concurrency at its Roots (PART 3) : Getting inside the Operating System and Understanding I/O Multiplexing from the ground up (to be released later..)

Understanding High Performance, High Concurrency at its Roots (4) : Dive deep into Operating Systems to Understand Synchronization and Asynchrony (to be released later)

Understanding High Performance, High Concurrency at its roots (5) : How high Concurrency High Performance Servers Really Work (to be released later..)

1.4 Overview of this Article

This article is the first in a series of articles on the principles of multi-threading and thread pooling from the CPU level. It is intended to avoid complex technical concepts and be accessible to all ages.

2. The author of this article

At the author’s request, no real names or personal photos were provided.

In this paper, the author’s main technical direction is Internet backend, high concurrency and high performance server, search engine technology, the net name is “The desert island survival of The Code Nong”, the public account “the desert island survival of the code Nong”. Thanks to the author for sharing.

3. It all starts with the CPU

You may be wondering, why do we start with cpus when we talk about multithreading? The reason is simply that without the fancy concepts, you can see the essence of the problem more clearly.

The reality is that the CPU does not know the concepts of threads, processes, etc.

The CPU only knows two things:

  • 1) Fetch instructions from memory;
  • 2) Execute the instruction, then return to 1).

You see, the CPU really does not know the concept of process, thread, etc.

The next question is where does the CPU fetch the instructions from? The answer comes from a register called Program Counter (PC for short), which is also known as Program Counter. Don’t think of registers as too mysterious here. You can simply think of registers as memory, but access is faster.

What is stored in the PC register? This is where the instruction is located in memory. What instruction? Is the next instruction that the CPU will execute.

So who sets the address of the instruction in the PC register?

The address in the PC register is automatically incremented by 1 by default, which makes sense, because most of the time the CPU executes the address one by one. When it encounters if or else, this sequence is broken, and the CPU dynamically changes the value in the PC register according to the calculation result. Then the CPU can correctly jump to the instruction that needs to be executed.

You’re smart enough to ask, well, how is the initial value set in the PC?

Before we can answer that question we need to know where do the instructions that the CPU executes come from? The instructions in memory are loaded from an executable stored on disk that is generated by the compiler. Where does the compiler generate the machine instructions from? The answer is the function we defined.

Functions are compiled to form instructions that the CPU executes, so how do we get the CPU to execute a function? Obviously we only need to find the first instruction that the function is compiled into, which is the function entry.

As you can see by now, if we want the CPU to execute a function, we simply write the address of the first machine instruction corresponding to the function into the PC register, and the function will be executed by the CPU.

You might be wondering, what does this have to do with threads?

4. From CPU to operating system

In the previous section, we learned how the CPU works. If we want the CPU to execute a function, we only need to load the first machine execution corresponding to the function into the PC register. In this way, we can make the CPU execute the program even without an operating system.

We need:

  • 1) Find a region loader of the right size in memory;
  • 2) Find the function entry and set up the PC register for the CPU to start executing the program.

These two steps are by no means easy, and if a programmer did them manually every time he ran a program, he would go crazy, so a smart programmer might want to write a program to automate them.

Machine instructions need to be loaded into memory for execution, so the start address and length of memory need to be recorded. Also find the entry address of the function and write it to the PC register, and think about whether you need a data structure to record this information.

The data structure is roughly as follows:

struct *** {

void* start_addr;

intlen;

void* start_point;

.

};

Then it’s name time.

There has to be a name for this data structure, and what information does this structure record? What does it look like when a program is loaded from disk into memory? Let’s just call it a Process. Our guiding principle is to sound mysterious, but not easy to understand. I call it the “rule of incomprehension.”

And so the process was born.

The first function executed by the CPU is also given a name. The first function to be executed sounds more important, so let’s call it main.

The program to complete the above two steps should also have a name, according to the “do not understand the principle” this “simple” program is called the Operating System.

Thus the operating system was born, and programmers no longer had to manually load programs to run them.

Now that you have processes and an operating system, everything looks perfect.

5. From single core to multi-core, how to make full use of multi-core

One of the hallmarks of human life is that it goes from a single core to multiple cores.

At this point, suppose we want to write a program that utilizes multiple cores?

Some students may say, “Isn’t there a process? Why not open a few more processes?”

It sounds reasonable, but there are mainly the following problems:

  • 1) Processes need to occupy memory space (as seen from the previous energy saving). If multiple processes are based on the same executable program, the contents of the memory area of these processes are almost identical, which will obviously cause memory waste;
  • 2) Computer processing tasks may be relatively complex, which involves interprocess communication. Since each process is in different memory address space, interprocess communication naturally needs to rely on the operating system, which increases the programming difficulty and system overhead.

What to do?

6. Process to thread

Let me continue to carefully think about this problem, the so-called process is a region of memory, the section area, save the CPU execution runtime machine instruction and function stack information, want to let the process run, is the main function of the first machine instruction register address written to the PC, this process is up and running.

The disadvantage of a process is that there is only one entry function, the main function, so machine instructions in a process can only be executed by one CPU. Is there a way to have multiple cpus execute machine instructions in the same process?

If we can write the address of main’s first instruction to a PC register, what’s the difference between main and other functions?

The answer is no, the main function is special only because it is the first function executed by the CPU, nothing else. We can point the PC register to main, and we can point the PC register to any function.

When we point the PC register to a function other than main, a thread is born.

This frees up the idea that there can be multiple entry functions within a process, meaning that machine instructions belonging to the same process can be executed by multiple cpus at the same time.

** Note: ** This is a different concept from a process. To create a process, we need to find a suitable area of memory to load the process, and then point the CPU PC register to the main function, which means that there is only one flow of execution in the process.

Now, however, multiple cpus can simultaneously execute multiple entry functions belonging to that process under the same roof (the memory area occupied by the process), meaning that there can now be multiple execution flows within a process.

Always called execution flow seems a little too easy to understand, but instead of “don’t understand the principle”, let’s call it thread instead.

That’s where threads come in.

The operating system maintains A stack of information for each process, which is used to record the memory space in which the process resides. This stack of information is denoted as dataset A.

Similarly, the operating system needs to maintain a stack of information for the thread, which is used to record the entry function or stack information of the thread. This stack of data is called dataset B.

Clearly data set is less than the amount of data from A to B, unlike the process at the same time, create A thread without having to go to the memory of looking for A period of memory space, because the thread is running in the process’s address space, the address space in the program startup is created, at the same time, the thread is program created during operation (process) since the start, So this address space is already there when the thread starts running and can be used directly by the thread. This is why, among other things, the creation thread mentioned in various textbooks is faster than the creation process.

It is important to note that with the concept of threads, we only need to create multiple threads after the process is started to keep all cpus busy, which is the root of the so-called high performance, high concurrency.

It’s as simple as creating the right number of threads.

Additional note: because each thread process of Shared memory address space, so the communication between threads do not need to use the operating system, it brings great convenience to the programmer at the same time also brought endless trouble, multithreading most of the problems are come from it’s convenient for communication between threads that are very easy to get wrong. The root cause of the error is that the CPU does not have the concept of thread when executing instructions. The mutual exclusion and synchronization problems faced by multi-threaded programming need to be solved by the programmer himself. The mutual exclusion and synchronization problems are limited to space and not detailed, and most operating system materials are detailed.

A final note of caution: Although the previous diagram on threads uses multiple cpus, it is not necessary to have multiple cores to use multithreading, in the case of a single core can also create multiple threads, the reason is that threads are operating system level implementation, and how many cores is irrelevant. The CPU executing machine instructions is also unaware of which thread the executing machine instructions belong to. Even in the case of only one CPU, operating system, also can let each thread by thread scheduling “and” moving forward, is the CPU time slice in the back and forth between each thread distribution, multiple threads looks like this is “and” run, but in fact any time or is it just a thread in the run.

7. Threads vs. memory

In the previous discussion we saw the relationship between threads and the CPU, which is to point the CPU’S PC register to the entry function of the thread so that the thread can run, which is why we must specify an entry function when creating a thread.

Creating a thread in any programming language is much the same:

// Set the thread entry function DoSomething

thread = CreateThread(DoSomething);

// Get the thread running

thread.Run();

So what does thread have to do with memory?

We know that the data generated when the function is executed includes: function parameters, local variables, return address and so on. This information is stored on a stack, and before the idea of a thread existed there was only one flow of execution in a process, so there was only one stack, and the bottom of the stack was the entry function of the process, the main function.

Suppose main calls funA and funcA calls funcB, as shown:

What about threads?

A thread after a process there are multiple entry, namely, at the same time there are multiple execution flow, so there is only one execution flow process requires a stack to save runtime information, so obviously there are multiple execution flow multiple stacks will be necessary to preserve the execution flow of information, that is to say, the operating system for each thread in the process’s address space is allocated in a stack, It is critical to realize that each thread has its own stack.

It is also worth noting that thread creation consumes process memory space.

8. Use of threads

Now that we have the idea of threads, how do we as programmers use them?

From a lifecycle perspective, threads handle two types of tasks: long tasks and short tasks.

1) Long-lived tasks:

As the name implies, is the task of survival time is very long, we often use the word, for example, for example, we are in word editing text needs to be saved on the disk, to write data on disk is a task, then a better approach is to create a special writing disk thread, the writer thread process of life cycle and word is the same, The writer thread is created whenever word is opened, and is destroyed when word is closed. This is a long task.

This scenario is ideal for creating dedicated threads to handle specific tasks, which is relatively simple.

There are long tasks, and there are short tasks.

2) Short-lived tasks:

The concept is simple: a task with a short processing time, such as a network request, a database query, etc., can be quickly processed and completed in a short period of time. Therefore, short tasks are commonly seen in various servers, such as Web Server, Database Server, File Server, mail Server, etc., which is also the most common scenario for students in the Internet industry. This scenario is what we will focus on.

** This scenario has two characteristics: ** One is that the task takes a short time to process; Another is the sheer number of tasks.

What if you were asked to handle this type of task?

This is easy, you might think. When the server receives a request, it creates a thread to handle the task and then destroys the thread.

This method is often referred to as thread-per-request, where a thread is created for each request:

This approach works well for long tasks, but for a large number of short tasks it is simple to implement but has drawbacks.

Specifically, the following shortcomings:

  • 1) As we can see from the previous sections, threads are a concept in the operating system (we will not discuss user-mode thread implementation, coroutines, etc.), so the creation of threads naturally needs to be accomplished by the operating system. The operating system creates and destroys threads, which takes time.
  • 2) Each thread needs to have its own stack, so creating a large number of threads will consume too much memory and other system resources.

It’s like you are a factory owner (think happy is there), there are a lot of orders in hand, each to a batch order to recruit a group of workers, the production of products is very simple, the workers will soon be able to finish processing, processing after this batch of order will take these recruits workers fired off, when there is a new order you hardship of recruit workers again, Five minutes of work and 10 hours of hiring probably wouldn’t happen if you weren’t trying to shut down your business.

Therefore, a better strategy is to hire a group of people and feed them on the spot, processing orders when there are orders, and sitting idle when there are no orders.

That’s where thread pools come in.

9. From multithreading to thread pools

The concept of the thread pool is very simple, no more than is to create a number of threads, later will not be released, a task is submitted to the thread processing, there is no need to create, destroy, thread, frequently at the same time due to the number of threads in the pool is usually fixed, also won’t consume too much memory, so the idea is to reuse, controllable.

10. How do thread pools work

You might ask, how do I submit tasks to a thread pool? How do these tasks get assigned to threads in the thread pool?

Obviously, queues in data structures are a natural fit for this scenario, where the producer submits the task and the thread consumes the task is the consumer. In fact, this is a classic producer-consumer problem.

Now you know why operating system courses and job interviews ask this question, because if you don’t understand the producer-consumer problem, you essentially can’t write thread pools correctly.

Limited by space, I do not intend to explain the producer-consumer problem in detail here, refer to the operating system related materials for the answer. Here I’m going to talk about what a typical task submitted to a thread pool looks like.

Typically, a task submitted to a thread pool consists of two parts:

    1. Data that needs to be processed;
    1. A function that processes data.

Pseudocode description:

struct task {

void* data; // The data carried by the task

handler handle; // How to process data

}

(Note: you can also refer to structs in code as classes, or objects.)

The thread in the thread pool will block on the queue. When the producer writes data to the queue, a thread in the thread pool will wake up. The thread retrieves the structure (or object) from the queue, takes the data in the structure (or object) as parameter and calls the handler function.

The pseudo-codes are as follows:

while(true) {

struct task = GetFromQueue(); // Fetch data from the queue

task->handle(task->data); // Process data

}

That’s the core of the thread pool.

Understanding this will give you an idea of how thread pools work.

Number of threads in the thread pool

Now that we have a thread pool, what is the number of threads in the pool?

Think about this for yourself before moving on. If you can see this, you’re not asleep.

Too few threads in the thread pool will not make full use of the CPU, and too many threads will cause performance degradation, excessive memory usage, consumption due to thread switching, and so on. So the number of threads should be neither too many nor too few. What should it be?

To answer that question, you need to know what kinds of tasks a thread pool handles, and some of you might say well, didn’t you say there were two kinds? Long task and short task, this is from the perspective of the life cycle, then from the perspective of the resources required to process the task, there are also two types, this is nothing to find the extraction type… Oh no, CPU intensive and I/O intensive.

1) CPU intensive:

CPU intensive means processing tasks that do not rely on external I/O, such as scientific computing, matrix computing, and so on. In this case, CPU resources can be fully utilized as long as the number of threads and cores is approximately the same.

2) I/O intensive:

This type of task may not take up much of the computing time, and most of the time is spent on things like disk I/O, network I/O, and so on.

In this case, it is a little more complicated. You need to use performance testing tools to estimate the time spent on I/O waits (WT) and the CPU computing time (CT), so the appropriate number of threads for an N-core system is approximately

N * (1 + WT/CT)

, assuming the I/O latency is the same as the computation time, you need approximately 2N threads to fully utilize CPU resources. Note that this is only a theoretical value, and the exact number should be tested based on real business scenarios.

Of course, CPU utilization is not the only consideration, as the number of threads increases, memory footprint, system scheduling, number of open files, number of open sockers, number of open database links, etc.

So there’s no one-size-fits-all formula, it’s a case by case basis.

Thread pools are not a panacea

Thread pool is only a form of multithreading, so the problems that multithreading faces thread pool can not be avoided, such as deadlock problems, race condition problems and so on, about this part can also refer to the operating system related information can be answered, so the foundation is very important ah old iron people.

Best practices for thread pool use

Thread pool is a powerful weapon in the hands of programmers. Thread pool can be seen in almost every server of Internet companies.

But before using thread pools you need to consider:

  • 1) Fully understand whether your task is long or short, CPU intensive or I/O intensive, and if both, it might be better to put the two types of tasks in separate thread pools to better determine the number of threads;
  • 2) If a task in the thread pool has AN I/O operation, it is important to set a timeout for the task, otherwise the thread processing the task may block forever;
  • 3) Tasks in the thread pool should not wait for the results of other tasks synchronously.

14. Summary of this paper

In this article, we started with the CPU and worked our way through the common thread pools, from bottom to top, from hardware to software.

Note: there are no specific programming languages in this article, and threads are not a language-level concept (again, not user-mode threads), but once you really understand threads and believe that you can use them in any language, you need to understand the tao, which comes later.

Hopefully this article has helped you understand threads and thread pools.

The next article will be another key technology that works closely with thread pools to achieve high performance and high concurrency. “Understanding High Performance and High concurrency from the roots (2) : Understanding THE OPERATING system, I/O and Zero copy Technology”, please look forward to it.

(This article has been simultaneously published at: www.52im.net/thread-3272…)