The purpose of this article is not to teach you how to build rockets, but to help you build concurrent thinking through a limited analysis of principles and application cases, and to integrate operating system theory with engineering practice throughout the course of your study. Of course, although we start from a practical point of view, in-depth knowledge with practical significance will always be a killer in the interview, which is more attractive to the interviewer than just verbal rocket science.

Who is this article for:

  1. Beginners who want to understand the concept of concurrency
  2. Engineers who need to understand concurrency concepts and techniques
  3. Readers who are interested in the use of concurrency at work and its underlying implementation

In this article, you will learn about a series of concepts related to concurrency and multithreading, using examples to form an abstract understanding of the various concepts related to concurrency and multithreading without getting bogged down in the technical details. With these concepts in mind, it’s easy to learn the specific theories and technical details.

What is concurrency?

Taobao has been booming in recent years, with a number of self-made sellers emerging. Imagine you’re a start-up retailer running your own online clothing store, packing and shipping your own clothes every day. At the beginning of the business is not so much, I can do it by myself for an hour every day.

As your business boomed, delivery times slowly grew from an hour to two to four hours, and you finally felt the need to hire more people after you were complained about a late delivery. Soon, two friends join your career, packaging speed began to have a qualitative improvement. This is the most basic concurrency, everyone can be regarded as a thread, the same workload, dry more people naturally fast.

Therefore, concurrency is a method to shorten the execution time and improve the execution efficiency by using multiple actuators to execute a large task simultaneously.

Data competition

But the good times don’t last long, a weekend of goods, you notice a lot of missing. There were no burglaries in this office. How could there be a shortage? If you check the express bill carefully, you will find that some of them are heavier. After a few days, you carefully pay attention to the delivery process, and finally realize that everyone will take a delivery list to prepare goods, if some orders are accidentally printed duplicate, it may be repeated by different people. It’s not a lot, but it hurts. The reason for this problem is that the order status (unshipped) that each person received prior to stocking changed when the order was actually stocked (shipped by someone else). This supposedly exclusive process of reading, checking, and modifying a shared piece of data (the shipping status of an order) is called data race when concurrency occurs. This process of reading, checking, and modifying is called a critical section, and a critical section is a piece of code with data contention.

The root cause of data contention is that data should be read, checked, and modified completely by only one executor, but if concurrency occurs, there is no way to guarantee that the data at the “modify” stage will retain the value at the “read” stage.

After determining the cause, someone came up with a good idea to print a general shipping list so that everyone would have to decide whether to stock and ship the order based on whether the order on the list had been shipped. Since there is only one list, only one person at a time can modify the shipping status of the order. The fact that only one executor can modify data to avoid data contention is known as a mutex, or lock.

Distributed concurrency concept

distributed

Because you managed it well, the business grew so fast that there’s no room for all the clothes in the office now. So you rent another warehouse to do the same. Since both places deliver goods, each warehouse can be understood as an independent computer, so that the way multiple computers complete the same task can be called distributed, and such a set of computers is called a cluster.

So we need to put this bill of lading in the cloud so that people can edit it over the web, but only one person at a time. In this case, we can think of each warehouse as a computer/process, and the people in each warehouse are the threads in that process. In this way, the general shipping list is a distributed lock, and because it can only be edited by one person at a time, it is a mutex, or lock for short; Because it can be used by two processes/machines (repositories), it is called a distributed lock.

What is a process/thread?

Process can be simply understood as an application on our computer/mobile phone. Every App on the same mobile phone is a process, and the same App is also a process on each mobile phone. Processes and processes can be understood as two warehouses, physically isolated from each other; A thread is everyone in the warehouse, sharing the same office space. The office space here can be understood as the virtual memory space in the operating system, but this article will focus on the concepts related to concurrency without further elaboration.

Distributed data is inconsistent

Everyone is busy because business is good. Sometimes it happens because some people have already checked an order on the cloud form, but they get busy and forget. Others are afraid of repeated delivery and will not deal with the orders already listed, because the undelivered orders caused by this have caused the store to be complained for many times, which has a great impact. This situation where the state of the data is changed during concurrency but no subsequent execution is completed leads to data inconsistencies, where the order has been checked but has not actually been shipped.

But as a smart boss, you’ve come up with a solution. Every hour, one person from each warehouse will check to see if all the listed orders have been packed and pasted with the express. The task of checking and dealing with missing data inconsistencies every once in a while is called a bottom-of-the-pocket task. The situation that all orders will reach the data consistency state through the bottom-of-the-pocket task is called final consistency.

Optimize the way

You may already think that the general shipping list approach is silly, as long as you only print one shipping list for each order, and a single person is responsible for distributing the list, while everyone else takes care of the orders they are assigned to. Finally, a bottom-pocket task is added to carry out a second check on the delivery of the order so that there will basically be no missing or re-delivery. The map-reduce method, in which tasks are split by one executor, then processed by a group of executors, and finally summarized by one or a group of executors, is very popular nowadays. This approach is widely used in big data and programming language standards, such as Hadoop and ForkJoinPool.

review

In this article, we covered the following technical terms:

  1. Concurrency is a method to shorten the execution time and improve the execution efficiency by using multiple actuators to execute a large task simultaneously.
  2. Data contention, the concurrent reading, checking, and modifying of an otherwise exclusive process of shared data.
  3. The critical region existsData competitionCode snippets.
  4. Mutex (also referred to simply as “lock”), an entity that can only be acquired by one executor at a time, used for mutually exclusive (only one) access to a critical region.
  5. Distributed, in which multiple computers perform the same task.
  6. Cluster: a group of machines that perform the same task.
  7. Distributed locks, locks that provide mutual exclusion on different machines/processes.
  8. Inconsistent data. A series of operations do not have atomicity. Some operations are successfully executed while others are not, resulting in contradictions between different data.
  9. The bottom of the bag, deal withData inconsistencyStatus tasks.
  10. Final consistency, passOut the taskOr otherwiseData inconsistencyWill eventually disappear.
  11. Map-reduce, a task split-execution-remerge task execution mode, can effectively utilize the performance of multiple machines and multi-core cpus.

Afterword.

Because the scope of concurrency knowledge is large, and the transfer of abstract concepts necessarily takes some space, this topic will consist of a series of articles covering the following topics:

  1. What is concurrency?
    • Get rid of the tedious technical points and get straight to the concepts related to concurrency.
  2. What is multithreading?
    • Multithreading is an important form of concurrency. The key points and corresponding tools and knowledge points in multithreading programming are introduced through specific multithreading problems, and multithreading programming is easily learned.
  3. Concurrency implementation in common tools
    • Gain insight into concurrent programming practices by parsing concurrent schema implementations in well-known open source tools.

The interested reader should stay tuned for further articles that will provide detailed theoretical and practical descriptions and examples of concurrent programming, operating system primitives, hardware primitives, and more.

Readers interested in database indexes can take a look at my previous article:

  1. What is a database index? Xinhua Dictionary to help you understand
  2. Database index penetration – in-depth
  3. 20 minutes database index design practice – practice
  4. Why is database indexing implemented with B+ tree? –