Disruptor 1. What is disruptor?

Disruptor is an open source framework developed to address the issue of queue locking in high concurrency. Originally developed and used by LMAX, Disruptor enables concurrent queue operations without locking and claims to be able to process 6 million orders per second in a single thread.

The basic concept

RingBuffer — Disruptor underlying data structure implementation, core class, a conduit for exchanging data between threads;

Sequencer — serial number manager, the implementer of production synchronization, is responsible for the management and coordination of consumer/producer serial number and serial number fence. Sequencer has two different modes of single producer and multiple producer, which realizes various synchronization algorithms.

Sequence — Declare a Sequence number that tracks changes in tasks and consumer consumption in ringBuffer. Most of the concurrent code in Disruptor is synchronized with changes to the value of the Sequence, not locking, which is a major reason for the high performance of disruptor.

SequenceBarrier — Ordinal barriers that manage and coordinate producer cursor ordinals and individual consumer ordinals, ensuring that producers do not overwrite messages that consumers have not had time to process, and ensuring that dependent consumers are processed in the correct order.

EventProcessor — an EventProcessor that listens for events in RingBuffer and consumes available events. Events read from RingBuffer are handed over to the actual producer implementation class for consumption; It listens for the next available sequence number until the event corresponding to that sequence number is ready.

EventHandler — The business processor, which is the interface of the actual consumer that performs the concrete implementation of the business logic and is implemented by a third party; Represents the consumer. Producer — The Producer interface, in which a third-party thread acts as Producer, writes events to RingBuffer.

Wait Strategy: The Wait Strategy determines how a consumer waits for a producer to place an Event into a Disruptor.

Waiting for the strategy

The default policy for “BlockingWaitStrategy” Disruptor is BlockingWaitStrategy. Inside BlockingWaitStrategy, locks and conditions are used to control thread wake up. BlockingWaitStrategy is the least efficient strategy, but it uses the least AMOUNT of CPU and provides more consistent performance across different deployment environments.

A SleepingWaitStrategy performs similarly to BlockingWaitStrategy and has a similar AMOUNT of CPU consumption, but it has the least impact on producer threads, Circular waiting is implemented by using locksupport.parknanos (1).

The YieldingWaitStrategy YieldingWaitStrategy is one of the strategies that can be used in a low-latency system. The YieldingWaitStrategy increases the spin to wait sequence to the appropriate value. In the body of the loop, thread.yield () is called to allow other queued threads to run. This policy is recommended for scenarios that require high performance and the number of event processing lines is smaller than the number of CPU logical cores. For example, the CPU enables hyperthreading.

“BusySpinWaitStrategy” performs best and is suitable for low latency systems. This policy is recommended for scenarios that require high performance and the number of event processing threads is smaller than the number of CPU logical cores. Keep spinning.

Disruptor benefits? Why so fast?

Advantages of Disruptor:

To avoid garbage collection, arrays are used instead of lists. Also, arrays are more processor-friendly to the cache mechanism. Use cache population to solve “fake sharing”.

2. Lock-free design: Each producer or consumer thread will first apply for the position of the element that can be manipulated in the array, and then directly write or read data in this position. The whole process is guaranteed by the atomic variable CAS to ensure the thread safety of operation.

3, element location array length 2^ N, through bit operation, speed up the location. Subscripts take the form of increments. Don’t worry about index overflow. Index is of type long, and even 1 million QPS would take 300,000 years to run out.

(1) The consumer serial number must be less than the producer serial number;

(2) The serial number of the consumer must be less than the serial number of its predecessor (dependency) consumer;

(3) Producer number value -bufferSize cannot be greater than the minimum number of consumers in order to avoid the producer speed too fast, will not have time to consume the message coverage.

How do I prevent reading elements that have not yet been written

Disruptor Introduces a Buffer of the same size as the Ring Buffer in the case of multiple producers: the available Buffer. When a write succeeds, the corresponding position of the availble Buffer is set and marked as write success. When reading, the available Buffer is iterated to determine if the element is ready.

3. What is pseudo-sharing of cache?

The non-standard definition of pseudo-sharing is: the cache system is stored in the unit of cache line. When multithreading modifies variables that are independent of each other, if these variables share the same cache line, the performance of each other will be affected unintentionally, which is called pseudo-sharing.

The CPU cache can be divided into tier 1 cache, tier 2 cache, and tier 3 cache according to the order in which data is read and how closely it is integrated with the CPU. All data stored in each level of cache is part of the next level of cache, and the closer the cache is to the CPU, the faster and smaller the cache. So L1 cache is small but fast, and it sits right next to the CPU core that uses it. L2 is larger, slower, and still can only be used by a single CPU core. L3 is more common in modern multicore machines, still larger, slower, and shared by all CPU cores on a single slot. Finally, you have a piece of main memory, shared by all the CPU cores on all the slots. A CPU with a level 3 cache has a 95% hit rate at level 3, with less than 5% of the data being queried from memory.

A thread running on processor Core1 wants to update the value of variable X, while another thread running on processor Core2 wants to update the value of variable Y. However, both frequently changed variables are on the same cache line. The two threads take turns sending Request For Owner (RFO) messages to claim ownership of the cached row. When CORE1 acquires ownership and starts updating X, the cache row corresponding to Core2 needs to be set to the I state. When CORE2 acquires ownership and starts updating Y, the cache row corresponding to Core1 needs to be set to the I state (invalidation state). Taking ownership in turn not only results in a large number of RFO messages, but if a thread needs to read this row, both L1 and L2 caches are invalid, and only L3 caches are synchronized. As we saw in the previous article, reading L3 data is very performance critical. The worse case is that cross-slot reads, L3 will miss and can only be loaded from memory.

Both X and Y are ostensibly operated by separate threads, and there is no relationship between the two operations. It’s just that they share a cache row, but all competing conflicts are shared.

Pseudo-sharing is easy to happen in multi-core programming and is very stealthy. For example, in the JDK’s LinkedBlockingQueue, there is a reference head to the queue head and a reference tail to the queue tail. This type of queue is often used in asynchronous programming, and the values of these two references are often modified by different threads, but they are likely to be on the same cache line, resulting in pseudo-sharing. The more threads, the more cores, the greater the negative impact on performance.

Note: in jdk1.8, there is a special annotation @contended to avoid fake sharing and solve problems more elegantly.

4. MESI Protocol

M: the local processor has Modified the cache line, that is, the dirty line, and its contents are different from those in memory. There is only one local copy of the cache (private). E (Exclusive, Exclusive) : the contents of a cache row are the same as those in memory, and no other processor has this row. 3. S (Shared) : The contents of the cached line are the same as those in the memory. Other processors may also have copies of the cached line. 4, I (Invalid, Invalid) : the cache line is Invalid and cannot be used.