The problem

The problem is simple: to achieve efficient concurrency control of multiple reads per write and multiple reads per write.

Lockless double buffer + switching wait

A double buffer works like this:

  1. Apply for two buffers, one is read buffer and one is write buffer.
  2. The reader thread reads only from the Read Buffer.
  3. The writer thread modifies write Buffer and then switches between Read buffer and write Buffer.

The pseudo-code implementation is as follows:

const int kBufCnt = 2;

atomic<int> curr_idx = 0;

T buffers[kBufferCnt];



/ / Reader calls

const T* Get(a) {

  return buffers[curr_idx];

}



/ / Writer calls

T* Mutable(a) {

  return buffers[(curr_idx + 1) % kBufferCnt];

}



// writerBuffer is changed by calling Switch

void Switch(a) {  / / writer

   curr_idx = (curr_idx + 1) % kBufferCnt;

}



// Reader: multiple threads execute Read()

void Read(a) {

  const T* buf = Get();

  // Read something from *buf...

}



// Writer: only one thread will execute Write()

void Write(a) {

  T* buf = Mutable();

  // Write something to *buf...

  Switch();

}

Copy the code

This code logic has a thread-safety problem: once a reader gets a pointer, it does not know when it will finish reading it. Writer can Switch at any time. Such as:

  1. After thread R gets the pointer, it starts reading.
  2. Thread W calls Switch after writing.
  3. Thread W writes again… => The buffer that thread R is reading will be written to, causing read/write conflicts.

The solution is as follows: Writer waits for a period of time after switching.

void Write(a) {

  T* buf = Mutable();

  // Write something to *buf...

  Switch();

   // Sleep for a while...

}

Copy the code
  1. After thread R gets the pointer, it starts reading.
  2. Thread W calls Switch after writing.
  3. Thread W sleep for a while… => Thread R is guaranteed to finish reading within this time.
  4. Thread W writes again.

But how long does it take to sleep to get the reader through? A common practice is to sleep for a time “well beyond” the normal operation time of the reader — but this is scenario-specific and there is no universal sleep time. And there are some uncertainties, like:

  1. If the language has GC, the reader may be held by the card owner “for a long time” because of GC.
  2. The logic that readers need to read buffers can be complex and time-consuming.

To sum up, “double buffer + wait for a period of time after switching” cannot fundamentally solve the problem of read/write conflicts.

DoublyBufferedData

BRPC implements a dual buffer — DoublyBufferedData[1], which is multi-read and concurrent safe. In theory, both read and write buffers must be locked to ensure absolute concurrency. DoublyBufferedData has a special locking idea:

Read requires one thread-local lock, write requires all thread-local locks.

Writer needs to acquire all thread-local locks one by one after the Switch is performed (each lock is released immediately after it is successfully acquired). The pseudocode is as follows:

// Reader: multiple threads execute Read()

void Read(a) {

   LockGuard(local_lock);

  const T* buf = Get();

  // Read something from *buf...

}



// Writer: only one thread will execute Write()

void Write(a) {

  T* buf = Mutable();

  // Write something to *buf...

  Switch();

  for local_lock of readers {

    LockGuard(local_lock)

  }

}

Copy the code

If writer succeeds in obtaining reader’s local_lock, reader must not be reading.

Intuitively, this locking method is not very different from using read/write locks directly: read requests do not block each other, but read/write requests do. The advantage of this locking method over read-write locking is that Writer blocks only one reader thread at a time.

Of course, the actual effect, the specific scene test only know. In addition, writer may be blocked for a long time if reader’s critical section is long.

shared_ptr

A common alternative to double-buffer is:

  1. Returns a shared_ptr (a read-only buffer that is pointed to) on read time.
  2. Create a new object while writing, replacing the original.

Pseudo code:

std: :shared_ptr<T> buffer;



/ / Reader calls

std: :shared_ptr<T> Get() {

  return buffer;

}



/ / Writer calls

void Update(a) {

    buffer.reset(new T);

}



// Reader: multiple threads execute Read()

void Read(a) {

  std: :shared_ptr<T> buf = Get();

  // Read something from *buf...

}



// Writer: only one thread will execute Write()

void Write(a) {

  // Write something to...

  Update();

}

Copy the code

Here are a few issues to address:

  1. STD ::shared_ptr[2] constructors, operators = are not atomic.
  2. Each Get generates a STD ::shared_ptr

    object.
  3. Each Update generates a brand new object T.

Question 1

  1. C++20 can replace STD ::shared_ptr

    with STD ::atomic< STD ::shared_ptr

    .

  2. C++11 can perform atomic operations on STD ::shared_ptr

    using the STD ::atomic_*[3] family of functions.
  3. C++11 can be replaced with an equivalent implementation of the boost library.

Question 2

The main overhead of constructing a STD ::shared_ptr object is:

  1. Two Pointers: one to the actual data and one to the control block.
  2. The reference count increases by 1.

Personally, I think such expenses are acceptable in most cases.

Question 3

In the scenario of more read and less write and low update frequency, I think the performance cost of this part is negligible.

The advantage of using shared_ptr is that the critical area where Writer and reader can collide is small — only when shared_ptr is copied or replaced, regardless of the specific logic of writer or reader.

summary

This paper introduces three concurrent control methods: write more read, read more write less.

  1. Lockless double buffer + switching wait.
  2. BRPC DoublyBufferedData.
  3. From.

The performance of “lockless double buffer + switched wait” is the best, but theoretically there are thread safety issues. Adjust the sleep time according to specific application scenarios to avoid read/write conflicts.

DoublyBufferedData is thread-safe, but locks are added to each read, although in most cases there is no contention.

The main disadvantage of the shared_Ptr approach is that shared_Ptr is constructed one at a time, but the performance overhead of shared_PTR is manageable.

The resources

[1]

DoublyBufferedData: https://github.com/apache/incubator-brpc/blob/master/docs/cn/lalb.md#doublybuffereddata


[2]

std::shared_ptr: https://en.cppreference.com/w/cpp/memory/shared_ptr/shared_ptr


[3]

std::atomic_*: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic