Cmu-15-445 Database System, lab2 is a B+ tree that supports concurrent operations, and Task4 is an index that supports concurrent operations. The operations on the B+ tree itself (insert, delete) will be sorted out later.

directory

  • Some basics

    • Concurrency control for indexes

    • The Lock and the Latch

      • Lock
      • Latch
    • The realization of the Latch

      • Blocking OS Mutex
      • Test-and-Set Spin Latch (TAS)
      • Reader-Writer Latches
  • B+ tree locking algorithm

    • example

      • To find the
      • Delete and insert
    • To optimize the

    • Leaf Scan

  • Concrete implementation idea

    • transaction

    • Find child pages recursively from the top down

      • Recursively find child pages
      • Handles the lookup down page lock
    • Uniform release and delayed deletion of pages

    • In view of the root_page_id

    • Unification of Unpin operations

    • Other Special circumstances

  • A little bit of teasing about test cases

Some basics

Concurrency control for indexes

Concurrency control protocols are the methods that a DBMS uses to ensure the “correct” results of concurrent operations on shared objects.

The correctness criteria for the protocol may vary:

  • Logical correctness: This means that the thread can read the values it should be allowed to read.
  • Physical correctness: This means that there are no Pointers in the data structure, which will cause the thread to read invalid memory locations.

The Lock and the Latch

Lock

Lock is a high-level logical primitive that protects the contents of a database (for example, tuples, tables, databases) from other transactions. The transaction remains locked for the entire duration. The database system can expose the user to the locks held by the query run time. Locks need to be able to roll back changes.

Latch

Latch is a low-level protection primitive used to protect critical parts of the DBMS’s internal data structures (for example, data structures, memory areas) from other threads. Latch is held only for the duration of the operation. Latch does not need to be able to roll back changes. Latch has two modes:

  • READ: Allows multiple threads to READ the same item simultaneously. One thread can acquire the latch even if another thread has acquired it.
  • WRITE: Allow only one thread to access the project. If another thread holds the LATCH in any mode, that thread will not be able to acquire the write LATCH. The thread that holds the latch can also prevent other threads from acquiring the latch

The implementation of the RWLatch provided in this section is really well written and is referenced at the end

Here’s a comparison of their differences:

The realization of the Latch

The basic primitive used to implement LATCH is implemented through the atomic compare and swap (CAS) instruction provided by modern cpus. In this way, the thread can examine the contents of a memory location to see if it has a specific value. If so, the CPU swaps the old value for the new value. Otherwise, the value of the memory location remains the same.

There are several ways to implement LATCH in a DBMS. Each approach has different trade-offs in terms of engineering complexity and runtime performance. These test and setup steps are performed automatically (that is, no other thread can update the values between the test and setup steps.

Blocking OS Mutex

One possible implementation of latch is the mutex infrastructure built into the OS. Linux provides mutex (Fast User-space Mutex), which consists of (1) spin latches in user space and (2) MUtex at the OS level.

If the DBMS can obtain the User Space LATCH, set the LATCH. Even if it contains two internal LATches, it appears as a single LATCH for the DBMS. If the DBMS is unable to acquire the User Space Latch, it will go into the kernel and try to acquire the more expensive mutex. If the DBMS is unable to acquire the second mutex, the thread notifies the OS that the lock is blocked and then schedules it.

Operating system mutex is usually a bad choice within a DBMS because it is managed by the OS and is expensive.

Test-and-Set Spin Latch (TAS)

Spin latch is a more efficient alternative to OS mutex because it is controlled by the DBMS. The spin latch is essentially where the thread tries to update in memory (for example, with a Boolean set to true). The thread executes CAS in an attempt to update the memory location. If the LATCH is not available, the DBMS can control what happens. It can choose to retry (for example, using a while loop) or allow the operating system to unschedule. Thus, this approach gives the DBMS more control than the OS mutex, which does not latch and gives the OS control.

Reader-Writer Latches

Mutex and spin latches do not differentiate between read/write (that is, they do not support different modes). The DBMS needs a way to allow concurrent reads, so if an application does a lot of reads, it will have better performance because readers can share resources without waiting.

The reader LATCH allows the LATCH to remain in either read or write mode. It tracks how many threads hold latch and are waiting to acquire latch in each mode. Reader – writer Latch uses one of the first two LATCH implementations as a primitive and has additional logic to process the reader – writer queue, which is the queue requests to LATCH in each mode. Different DBMSS can have different policies for how queues are handled.

B+ tree locking algorithm

In order to make the B+ tree as safe as possible for as many people to use, you need to use a certain locking algorithm to lock a part of the B+ tree to operate. The Lock Crabbing/coupling protocol will be used to allow multiple threads to access/modify the B+ tree simultaneously. The basic idea is as follows:

  1. Obtain the parent’s latch
  2. Get the LACTH of the child
  3. If the child is “safe,” then the parent’s lock can be released. “Safe” means that a node does not split or merge when its children are updated (dissatisfied with insertions, greater than the minimum size when deleted).

Of course, there is a distinction between read and write locks, and read latches and write latches

The lock is per-page. We will lock each internal Page or root Page

I recommend reading Lecture 9 on CMU-15-445, which is very clear. Here are some examples:

example

To find the

. Step down the lock, if the child page is obtained, the parent read lock is released directly

Delete and insert

In fact, delete and insert are write locks, the difference is not big, just in the specific judgment of a node is safe to make a different judgment. Here’s an example of an unsafe insert:

To optimize the

What we talked about above is actually a kind of realization of pessimistic lock, that is to say, if we are in an unsafe state, we must lock (note, unsafe state is not necessarily), so the efficiency may be a little discount, here is the idea of optimistic lock:

Assume that the default is to find more, most operations will not be split or merge, will not modify to the parent page, has been optimistic in the tree to use read lock traversal, until really found will modify the father page, again in the pessimistic lock way to perform a write operation.

Leaf Scan

The threads mentioned earlier acquire latch in a “top-down” fashion. This means that the thread can only acquire latches from the node below its current node. If the required LATCH is not available, the thread must wait until it becomes available. For this reason, deadlocks never occur.

But leaf scans are prone to deadlocks, because now we have threads trying to acquire locks in two different directions at the same time (that is, left to right and right to left). Index latch does not support deadlock detection or deadlock avoidance.

So the only way to solve this problem is through code rules. The latches of the leaf nodes must support the “no wait” mode. That is, the B + tree code must handle the failed latch acquisition. This means that if the thread tries to acquire the LATCH on the leaf node, but the LATCH is not available, it will immediately abort its operation (release all latches it holds), and then restart the operation.

My idea is that the lock can be used in one direction or leaf locks in two directions. If the priority of one direction is a problem, the lock with a higher priority can seize the lock with a lower priority.

Concrete implementation idea

transaction

Each thread creates a new transaction for the database operation, and each transaction executes a different operation. Each transaction executes a mark for the current transaction, with an enumeration type indicating which transaction is added, deleted, changed, or checked:

/** * Type of operation */
enum class OpType { READ = 0, INSERT, DELETE, UPDATE };
Copy the code

Find child pages recursively from the top down

This is the basis of the whole pessimistic locking algorithm, which is shown in the example above. When we recursively look for a Leaf Page, we can lock the “insecure” Page, then we do not need to lock the Page again, and all locked pages are stored using transaction. In this way, the read lock or write lock of the insecure page can be released uniformly after the final modification.

Here are two core functions:

Recursively find child pages

The logic that locks are required during recursive lookup of sub-pages is removed and the lock is released or not based on the operation of different transactions

/* * Find leaf page containing particular key, if leftMost flag == true, find * the left most leaf page */
INDEX_TEMPLATE_ARGUMENTS
Page *BPLUSTREE_TYPE::FindLeafPage(const KeyType &key, bool leftMost, Transaction *transaction) {{std::lock_guard<std::mutex> lock(root_latch_);
    if (IsEmpty()) {
      return nullptr; }}// Retrieve the corresponding page from the buffer pool
  Page *page = buffer_pool_manager_->FetchPage(root_page_id_);
  BPlusTreePage *node = reinterpret_cast<BPlusTreePage *>(page->GetData());

  if (transaction->GetOpType() == OpType::READ) {
    page->RLatch(a); }else {
    page->WLatch(a); } transaction->AddIntoPageSet(page);

  while(! node->IsLeafPage()) {  // If it is not a leaf
    InternalPage *internal_node = reinterpret_cast<InternalPage *>(node);
    page_id_t child_page_id;
    if (leftMost) {
      child_page_id = internal_node->ValueAt(0);
    } else {
      child_page_id = internal_node->Lookup(key, comparator_);
    }

    Page *child_page = buffer_pool_manager_->FetchPage(child_page_id);
    BPlusTreePage *new_node = reinterpret_cast<BPlusTreePage *>(child_page->GetData());
    HandleRWSafeThings(page, child_page, new_node, transaction);  // Handle the read/write lock here, both father and child lock here
    page = child_page;
    node = new_node;
  }

  return page;  // The wrapper type of the leaf page found last
}
Copy the code

Handles the lookup down page lock

/** * Determine whether the lock can be released based on whether the child page is secure * 1. 2. Write lock, the child first add write lock, and then judge whether the child is safe, if safe release all father write lock, and then add the child to PageSet */
INDEX_TEMPLATE_ARGUMENTS
void BPLUSTREE_TYPE::HandleRWSafeThings(Page *page, Page *child_page, BPlusTreePage *child_node, Transaction *transaction) {
  auto qu = transaction->GetPageSet(a);if (transaction->GetOpType() == OpType::READ) {
    child_page->RUnlatch(a);// Lock the child
    ReleaseAllPages(transaction);
    transaction->AddIntoPageSet(child_page);  // Add the child to PageSet

  } else if (transaction->GetOpType() == OpType::INSERT) {
    child_page->WLatch(a);// Put a read lock on the child
    if (child_node->IsInsertSafe()) {  // The child inserts safely, the father write lock can be released
      ReleaseAllPages(transaction);
    }
    transaction->AddIntoPageSet(child_page);

  } else if (transaction->GetOpType() == OpType::DELETE) {
    child_page->WLatch(a);if (child_node->IsDeleteSafe()) {
      ReleaseAllPages(transaction);
    }
    transaction->AddIntoPageSet(child_page); }}Copy the code

Uniform release and delayed deletion of pages

Add all locked pages to the Pageset of a transaction, and finally release locks and Unpin operations based on the read/write transaction.

/** * End of transaction, release all locks */
INDEX_TEMPLATE_ARGUMENTS
void BPLUSTREE_TYPE::ReleaseAllPages(Transaction *transaction) {
  auto qu = transaction->GetPageSet(a);if (transaction->GetOpType() == OpType::READ) {
    for (Page *page : *qu) {
      page->RUnlatch(a); buffer_pool_manager_->UnpinPage(page->GetPageId(), false); }}else {
    for (Page *page : *qu) {
      page->WUnlatch(a); buffer_pool_manager_->UnpinPage(page->GetPageId(), true);
    }
  }
  (*qu).clear(a);// Delay deletion until the end of the transaction
  auto delete_set = transaction->GetDeletedPageSet(a);for (page_id_t page_id : *delete_set) {
    buffer_pool_manager_->DeletePage(page_id);
  }
  delete_set->clear(a); }Copy the code

In view of the root_page_id

Because root_page_id is modified in several places, an additional synchronization variable is used to make these processes mutually exclusive when accessing and modifying root_page_id.

std::mutex root_latch;
Copy the code

Unification of Unpin operations

It is important that the Unpin is dropped when the page is not needed in order for the buffer pool to correctly perform the replacement algorithm. In the event of multiple threads, Unpin all pages that need to be locked at the end of the transaction.

Other Special circumstances

Dealing with concurrency can be a tricky business and requires consideration of various scenarios. Here are a few more error-prone possibilities:

  1. In the first transaction, it is safe to delete A leaf node A, while in the second transaction, it is not safe to delete the adjacent node B of the first leaf node. At this time, B may borrow elements or merge with its brother node A, so B needs to obtain the write lock of A to continue

A little bit of teasing about test cases

I have to make fun of the fact that the test cases of several Tasks in Lab2B are really stretched, and most of the bugs cannot be detected, so I can only build the test cases of big data by myself. It is suggested that I construct a 1,000,000 length test case to test that B+ tree can correctly execute the page replacement algorithm under the pressure of big data. Only through such tests can the correctness of B+ tree be verified.