Moment For Technology

Java pseudo-sharing details and solutions

Posted on Oct. 1, 2023, 4:34 a.m. by 葛雅琪
Category: The back-end Tag: WeChat The back-end java jvm

1. What is fake sharing

The CPU cache system is stored in units of cache lines. The current CPU Cache Line size is 64 Bytes. In multi-threaded situations, if you need to modify "variables that share the same cache row," you can inadvertently affect each other's performance, which is called False Sharing.

2. The cache line

Since shared variables are stored in the CPU cache in units of cache behavior, a cache line can store multiple variables (the number of bytes that fill the current cache line). And the CPU changes the cache in the smallest unit of cache behavior, then there will be appeals to the pseudo-sharing problem.

A Cache Line can be simply defined as the smallest unit of Cache in the CPU Cache. Today, cpus do not access memory by byte, but by 64-byte chunk, called a Cache Line. When you read a particular memory address, the entire cache line is swapped from main memory into the cache, and the overhead of accessing other values in the same cache line is minimal.

3. Level-3 cache of the CPU

Because the SPEED of the CPU is much faster than the speed of the memory, CPU designers add a CPU Cache. So that the computation is not slowed down by the memory speed. The CPU Cache is divided into three levels: L1, L2, and L3. The closer you are to the CPU, the faster and smaller the cache. The L1 cache is small but fast and sits right next to the CPU core that uses it. L2 is larger, slower, and still can only be used by a single CPU core. L3 is more common in modern multicore machines, still larger, slower, and shared by all CPU cores on a single slot. Finally, you have a piece of main memory, shared by all the CPU cores on all the slots.

When the CPU performs an operation, it goes to L1 to find the data it needs, then L2, then L3, and finally to main memory if none of these caches are available. The farther out you go, the longer the computation takes. So if you're doing something very frequently, you want to make sure your data is in L1 cache.

4. Cache affinity

Currently, the commonly used Cache design is N-way Set Associative Cache. Its principle is to Set a Cache as a Set according to N Cache lines and divide the Cache into equal parts by group. Each memory block can be mapped to any cache row in the corresponding set. For example, in a 16-way Cache with 16 Cache lines as a Set, each block of memory can be mapped to any of the 16 Cache lines in the corresponding Set. In general, blocks of memory with the same low bit address will share the same Set.

The following figure shows a 2-way Cache. Index 0,2, and 4 in Main Memory are mapped to different cache lines of Way0, and Index 1,3, and 5 are mapped to different cache lines of Way1.

5. The msci agreement

Multicore cpus have their own proprietary caches (typically L1 and L2), as well as caches shared between the cores of the same CPU slot (typically L3). It is inevitable that the same data will be loaded into the CPU cache of different cores, so how to ensure data consistency is the MESI protocol.

In the MESI protocol, each Cache line has four states, which can be represented by two bits: M(Modified) : The line is valid, the data is Modified, and the data is inconsistent with the data in memory. The data only exists in the Cache. E(Exclusive) : This line of data is valid. The data is consistent with the data in the memory and exists only in the local Cache. S(Shared) : This line of data is valid. The data is consistent with the data in the memory. The data is stored in many caches. I(Invalid) : This line of data is Invalid.

So, suppose there is a variable I =3 (should be the cache block containing variable I, the block size is the cache line size); Has been loaded into the multi-core (a, B, C) cache, the state of the cache row is S; If one of the cores A changes the value of variable I, the state of the current cache line in the cores A will change to that of M, B, and C. The diagram below:

6. Solution principle

To avoid repeated loading of Cache lines from L1,L2, and L3 to main memory due to false sharing, we can use data padding, where a single CacheLine is filled with data. This is essentially a matter of space for time.

7. Java's traditional solution to pseudo-sharing

/ * * * * WeChat public number: * * / import Java. Java technology stack util. Concurrent. Atomic. AtomicLong; public final class FalseSharing implements Runnable { public final static int NUM_THREADS = 4; // change public final static long ITERATIONS = 500L * 1000L * 1000L; private final int arrayIndex; private static VolatileLong[] longs = new VolatileLong[NUM_THREADS]; static {for (int i = 0; i  longs.length; i++)
            longs[i] = new VolatileLong();

    public FalseSharing(final int arrayIndex)
        this.arrayIndex = arrayIndex;

    public static void main(final String[] args) throws Exception
        final long start = System.nanoTime();
        System.out.println("duration = " + (System.nanoTime() - start));

    private static void runTest() throws InterruptedException
        Thread[] threads = new Thread[NUM_THREADS];

        for (int i = 0; i  threads.length; i++)
            threads[i] = new Thread(new FalseSharing(i));

        for (Thread t : threads)

        for (Thread t : threads)

    public void run()
        long i = ITERATIONS + 1;
        while (0 != --i)

    public static long sumPaddingToPreventOptimisation(final int index)
        VolatileLong v = longs[index];
        returnv.p1 + v.p2 + v.p3 + v.p4 + v.p5 + v.p6; } public final static class VolatileLong {public volatile long value = 0L;} public final static class VolatileLong = 0L; public long p1, p2, p3, p4, p5, p6; } // JDk7 Using this method public final static class VolatileLong {public long P1, P2, p3, p4, p5, p6, p7; // cache line padding public volatile long value = 0L; public long p8, p9, p10, p11, p12, p13, p14; // cache line padding } }Copy the code

8. Solution in Java 8

The official solution is already available in Java 8 with a new annotation: @sun.misc.contended. Classes with this annotation will automatically complete the cache lines. Note that the annotation is not valid by default, and you need to set -xx: -RestrictContEnded at JVM startup for it to work.

public final static class VolatileLong {
    public volatile long value = 0L;
    //public long p1, p2, p3, p4, p5, p6;
Copy the code


@code farmers, how do you understand and solve fake sharing? Welcome to leave a message!

Spring Boot Cloud

Scan our wechat public number, dry goods updated every day.

About (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.