Write OS kernel from scratch - lock synchronization with multithreading

Series directory

introductory
The preparatory work
BIOS starts in real mode
GDT and Protected Mode
Exploration of Virtual Memory
Load and enter the kernel
Display and print
The global descriptor table GDT
Interrupt handling
Virtual memory improvement
Implement heap and malloc
The first kernel thread
Multi-threading running and switching
The lock is synchronized with multithreading
Enter user mode
Implementation of a process
A simple file system
Load the executable program
Implementation of system calls
The keyboard driver
To run a shell

Multithreaded competition

In the last article, we finally started to run multithreading and initially established the task scheduling system scheduler, making the kernel finally show the appearance that an operating system should have. On the basis of multi-threads operation, next we need to enter user mode to run threads, establish the concept of process, and load user executable program.

Before that, however, there is an important and dangerous problem that has come with running multi-threads, which is the issue of thread contention and synchronization. You should have programming experience with multithreading in user mode, understand the issues and causes of contending synchronization between Threads, and understand the concept and use of Lock. This article examines and discusses Lock, and the implementation of the code, from a kernel perspective. It should be noted that Lock is a large and complex topic, and there are many differences in the use and implementation of Lock in kernel and in user mode (of course, there are also most of the common points). This article is just my personal superficial understanding and implementation, welcome to discuss and correct.

Lock the Lock

I don’t think there is any need to explain the data problems caused by the competition between Threads. After the first two Threads were up and running, it should be clear that Interrupt can occur at any time and at any instruction, and that any non-atomic operation can result in a data race between Threads.

For our current kernel, locks are needed in many places to protect access to public data structures, such as:

page faultThe bitmap of the processing function that allocates the physical frame obviously needs to be protected;
kheap, all threads are dug in memory;
schedulerIn the various task queues, such asready_tasks;
.

Lock concepts and tools are available in most programming languages that support multithreading, and as a poor kernel project, we needed to implement them ourselves.

Lock is a complex subject. On the basis of safety first, the design and implementation of lock and the way of use will greatly affect the performance of the system. Bad Lock design and use may lead to unreasonable scheduling of Threads and a large waste of CPU time, reduce the system throughput performance.

Next, we will start from the underlying principle of Lock, and discuss the classification and implementation of several common Lock, as well as their use scenarios.

Atomic command operation

Logically, the implementation of Lock is simple:

if (lock_hold == 0) {
  // Not locked, I get the lock!
  lock_hold = 1;
} else {
  // Already locked :(
}

Here LOCK_HOLD holds whether the current state is locked or not, with a value of true/false. If it is 0, it means that the lock has not been held by other people. Then I can get the lock and set it to 1, which means that the lock is on, so that no one else can get the lock.

The error with the above implementation, however, is that the if condition and the lock_hold = 1 operation below are not atomic, Two threads may both execute if and both pass if lock_hold is 0 and the other has not yet modified lock_hold = 1, get lock together and enter.

The core problem here is that the two operations of lock judgment and modification are not atomic, that is to say, they are not completed by a single instruction, so the intersecting of two threads may cause a data race.

Therefore, the lowest level implementation of any lock must be an atomic instruction, which uses a single instruction to verify and change the data. This ensures that only one thread can pass through the instruction, while other threads are blocked. For example:

uint32 compare_and_exchange(volatile uint32* dst,
                            uint32 src);

It must be implemented by assembly:

compare_and_exchange:
  mov edx, [esp + 4]
  mov ecx, [esp + 8]
  mov eax, 0
  lock cmpxchg [edx], ecx
  ret

The CMPXCHG directive is a compare and exchange directive that compares the first operand to the value of eax:

If the same, the second operand is loaded into the first operand;
If not, the first operand is assigned toeax;

CMPXCHG is exclusive and visible to other CPUs when executing this instruction on multicore CPUs. CMPXCHG is exclusive and visible to other CPUs when executing this instruction. CMPXCHG is exclusive and visible to other CPUs when executing this instruction. The LOCK prefix is not required for the single-core CPUs used in our project experiments.

This instruction actually implements the atomic merge for the check and modify operation. We use it to implement the lock logic by comparing the operand DST to eax = 0 to indicate whether the lock has been locked:

If it’s equal, then it’s the first case, 0 means it’s not locked, so I assign 1 todstGet, mean getlockAnd lock. The return value iseax = 0;
If not, then explaindstWhich is equal to 1,lockHas been locked by someone else, so that is the second case, willdst = 1The value assigned toeax, the return value is eax, which has been modified to 1;

int lock_hold = 0;

void acquire_lock() {
    if (compare_and_exchange(&lock_hold, 1) == 0) {
        // Get lock!
    } else {
        // NOT get lock.
    }
}

In addition to the CMPXCHG directive, there is another way to implement the XCHG directive, which I personally feel better to understand:

atomic_exchange:
  mov ecx, [esp + 4]
  mov eax, [esp + 8]
  xchg [ecx], eax
  ret

The XCHG instruction, which has two operands, swaps their values, and the atomic_exchange function returns the value of the second operand swapped, which is essentially the old value before the first argument was swapped.

So how do you implement lock with atomic_exchange? Same code:

int lock_hold = 0;

void acquire_lock() {
    if (atomic_exchange(&lock_hold, 1) == 0) {
        // Get lock!
    } else {
        // NOT get lock.
    }
}

The person trying to take the lock always swaps lock_hold with the value 1, so the atomic_exchange function, which always returns the old value of lock_hold, swaps out the old value of lock_hold and returns, Only if the lock_hold old value is 0 will the above judgment pass, indicating that the lock was not previously held and that the lock was successfully picked up.

You can see that the lock_hold is also modified and checked with only one instruction. Interestingly, it changes first and then checks, but that doesn’t affect its correctness at all.

Spinlocks spinlock

The underlying implementation of Lock’s acquire operations has been discussed above, but this is just the tip of the Lock related iceberg. The real complexity of locking lies in the handling of acquisition failures, which is also an important way to categorize locks, which can greatly affect the performance and usage scenarios of locks.

One of the simplest lock types we’ll discuss first is spinlock, which retries a acquire failure until it succeeds:

#define LOCKED_YES 1 #define LOCKED_NO 0 void spin_lock() { while (atomic_exchange(&lock_hold , LOCKED_YES) ! = LOCKED_NO) {} }

If the current thread is running on a busy wait, it can continue to hold the CPU and retry the acquire.

First of all, it should be clear that SpinLock cannot be used on single-core CPUs. On a single-core CPU, only one thread is executing at any one time. If the lock is not available, the spin is meaningless because the thread holding the lock cannot release the lock in the meantime.

However, on a multi-core CPU, SpinLock can be useful. If the acquire lock fails and retries for some time, then it is possible to wait for the thread holding the lock to release the lock, because it is most likely running on another core and must be inside a Critical Section:

This is particularly useful in situations where critical sections are small and lock contention is not strong, because in this case, the time of the spin wait is not likely to be very long. This may be more costly if the current thread abandons the CPU because it can’t get the lock, as we’ll discuss later.

However, if the critical section is large or the lock competition is intense, even on multi-core CPUs, the Spin lock is not applicable. It is unwise to waste a lot of CPU time by waiting and spinning idling.

yield lock

As mentioned above, SpinLock does not make any sense for single-core CPUs, but our kernel happens to be running on a single-core CPU emulator, so we need to implement a lightweight lock similar to SpinLock, which I’ll call yield lock.

Yield lock, as the name implies, yields the CPU after a acquire failure. That is to say, I cannot get LOCK temporarily, I will take a rest for a while and let other threads run first. I will come back and try again after they finish their rotation.

This behavior is essentially a spin, but instead of idling in place, it doesn’t waste any CPU time. Instead, it immediately cedes the CPU to someone else, so that the thread holding the lock can run, and by the time the next slice is finished, it will most likely have released the lock:

void yield_lock() {
  while (atomic_exchange(&lock_hold , LOCKED_YES) != LOCKED_NO) {
    schedule_thread_yield();
  }
}

Note that you have to put schedule_thread_yield inside the while loop, because even if the thread holding the lock releases the lock, it doesn’t mean that the current thread will get the lock later, because there might be other contenders. So when yield comes back, you have to compete again for the acquire lock;

Similar to SpinLock, Yield Lock is suitable for situations where the critical sections are small and the competition is not intense. Otherwise, many threads are thrown around and empty, which is also a waste of CPU resources.

Blocking locks

If a thread fails to acquire a acquire, it will not block, but will try again and again, or it will try again after a period of time. However, in cases where critical sections are large or lock contention is intense, it is likely to be a waste of CPU resources to keep retrying.

If a thread can’t get the lock, it will add itself to the queue and sleep on it. If the thread can’t get the lock, it will not be scheduled to run again during sleep. When the thread holding the lock releases the lock, a thread is taken from the queue and reawakened.

For example, let’s define the following blocking lock, named mutex:

struct mutex {
  volatile uint32 hold;
  linked_list_t waiting_task_queue;
  yieldlock_t ydlock;
};

Implementation of locking:

void mutex_lock(mutex_t* mp) { yieldlock_lock(&mp->ydlock); while (atomic_exchange(&mp->hold, LOCKED_YES) ! = LOCKED_NO) { // Add current thread to wait queue. thread_node_t* thread_node = get_crt_thread_node(); yieldlock_lock(&mp->ydlock); linked_list_append(&mp->waiting_task_queue, thread_node); // Mark this task status TASK_WAITING so that // it will not be put into ready_tasks queue // by scheduler. schedule_mark_thread_block(); yieldlock_unlock(&mp->ydlock); schedule_thread_yield(); // Waken up, and try acquire lock again. yieldlock_lock(&mp->ydlock); } yieldlock_unlock(&mp->ydlock); }

Lock implementation is a bit complicated here, but it is a standard Conditional Wait implementation. Conditional wait is to wait for a desired condition to be satisfied in a blocking manner. The expected condition here is that the lock is released and I can try to get the lock.

After a failed attempt to obtain the lock, the current thread adds itself to the mutex waiting_task_queue, marks itself as TASK_WAITING, and cesses the CPU. The schedule_thread_yield function is called in the same way as in the yield lock above, but there is a fundamental difference:

yield lockThreads that pass through yield will still be put inready_tasksQueues, which are still scheduled by the scheduler;
Here Thread marks itself asTASK_WAITINGSo that inschedule_thread_yieldThe Thread is not added to theready_tasksIn the queue, so it actually goes inblockingState is not scheduled again until the next time the thread holding the lockunlockWhen it is removed from mutexwaiting_task_queueTake out and wake up and put back inready_tasksThe queue;

void mutex_unlock(mutex_t* mp) {
  yieldlock_lock(&mp->ydlock);
  mp->hold = LOCKED_NO;

  if (mp->waiting_task_queue.size > 0) {
    // Wake up a waiting thread from queue.
    thread_node_t* head = mp->waiting_task_queue.head;
    linked_list_remove(&mp->waiting_task_queue, head);
    // Put waken up thread back to ready_tasks queue.
    add_thread_node_to_schedule(head);
  }
  yieldlock_unlock(&mp->ydlock);
}

The key element in both the lock and unlock code above is a yieldlock defined inside a mutex. This may seem odd, since mutex is essentially a lock, so the data inside that lock needs another lock to protect itself.

Implematically, mutex is already a complex lock that maintains a waiting queue, which obviously needs to be protected, hence the nesting doll paradox. The key point here is that the types and purposes of these two locks are fundamentally different:

mutexIs a heavy lock, it is provided for external use, its purpose and protection is uncertain, generally yescritical sectionA larger, more competitive area;
internalyield lockIs a lightweight lock, the purpose of this lock and the object of protection is determined, it protects ismutexInternal operations, this onecritical sectionIt can be controlled to a very small extent, so the introduction of this lock is necessary and reasonable;

The trade-off for internal Yield Lock is that it introduces new competition, making threads more competitive across mutex. However, such additional costs are unavoidable in terms of the design and usage of mutex, and can be tolerated or ignored in some sense: This is because the external critical sections protected by mutex are generally considered to be larger than the areas protected by their internal yield locks.

Kernel and user-mode lock

The above discussion covers several types of locking principles and implementations, as well as the different scenarios in which they are used. An important distinction between the size of a critical section and the intensity of contention is that this essentially reflects the ease (or probability) with which a thread attempts to obtain the lock. Based on this, we can divide it into two types of usage scenarios:

critical sectionSmall and less competitive, then use a Spin-type lock (includingspinlock 和 yieldlock), they are non-blocking;
critical sectionIf it is large, or the competition is intense, use blocking locks;

In fact, the choice of which lock to use in kernel mode is much more than that. It also involves where to use the lock, such as interrupt context or exception context, and many other considerations, which can lead to many limitations and differences. I will try to write a separate discussion on these issues sometime in the future, but dig a hole first.

However, in the user mode, the use of lock will be very different from the kernel mode. One of the most discussed is the choice of blocking lock and spinlock. As mentioned above, blocking locks are often used in cases where the critical section is large or the competition is intense, because they do not result in a large number of CPU idling, which seems to save CPU resources. However, in user mode, threads entering blocking sleep need to enter kernel state. This has a very large impact on CPU performance, and may not be as cost-effective as waiting for the in-place spin (if the CPU is multi-core), so the consideration here will be very different from the use of locking in kernel mode.

conclusion

This article discusses the principle and implementation of LOCK, limited to their own level, only my personal shallow understanding, hope to help readers, also welcome to discuss and comment. In this scroll project, performance is not a consideration for the time being. For the sake of simplicity and safety, I used Yieldlock as the main lock used in kernel.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Write OS kernel from scratch – lock synchronization with multithreading

Series directory

Multithreaded competition

Lock the Lock

Atomic command operation

Spinlocks spinlock

yield lock

Blocking locks

Kernel and user-mode lock

conclusion

Write OS kernel from scratch – lock synchronization with multithreading

Series directory

Multithreaded competition

Lock the Lock

Atomic command operation

Spinlocks spinlock

yield lock

Blocking locks

Kernel and user-mode lock

conclusion

Related Posts

The principle of JIT is briefly introduced

Write OS kernel from scratch – interrupt handling

Write the OS kernel – GDT and guard mode from scratch