preface

If you’re a theory student, you might think that changing HashMap to ConcurrentHashMap is the perfect solution to concurrency. Or use CopyOnWriteArrayList for better performance! Technology may be free to talk, but when faced with the devil of an interviewer, we care more about whether it’s true. Sorted out 100+ Java project videos + source code + notes

Thread reuse leads to user information confusion

In a production environment, sometimes the user information obtained is someone else’s. When you look at the code, you see that you are using the ThreadLocal cache to retrieve user information.

ThreadLocal is suitable for scenarios where variables are isolated between threads but shared between methods or classes. If retrieving user information is expensive (such as querying from a DB), caching in ThreadLocal is appropriate. So the question is, why do we sometimes have user information confusion?

case

Use ThreadLocal to store an Integer value representing the user information that needs to be stored in the thread, starting with null. We get the value from ThreadLocal once, then set the parameters passed in externally to ThreadLocal to simulate getting user information from the current context, then get the value again, and finally print the two values and the thread name.

The conventional wisdom is that the first value fetched before setting user information is always null, but remember that the program is running on Tomcat, and the thread executing the program is Tomcat’s worker thread, which is based on the thread pool. Thread pools, on the other hand, reuse fixed threads, and once a thread reuses, it is likely that the first value fetched from ThreadLocal is a leftover value from a previous request by another user. In this case, the user information in ThreadLocal is the information of other users.

Bug back

In the configuration file, set the Tomcat parameter – the maximum number of worker threads in the pool to 1, so that the same thread is always processing the request:

`server.tomcat.max-threads=1` 
Copy the code

First, let user 1 request the interface. The first and second user ids are null and 1 respectively, which meet the expectation

User 2 requests interface, bug reappears! The first and second user ids are 1 and 2, respectively. Obviously, the first user ID is obtained because the Tomcat thread pool reuses the thread. Both requests are from the same thread:http-nio-45678-exec-1.

When writing business code, the first step is to understand what threads the code will run on:

  • The business code running under the Tomcat server runs in a multithreaded environment (otherwise the interface would not support such high concurrency), and it is not assumed that there would be no thread-safety issues without explicitly enabling multithreading

  • Thread creation is expensive, so Web servers use thread pools to handle requests, and threads are reused. When using a tool like ThreadLocal to store data, you need to explicitly clear the data after the code is run.

The solution

Explicitly clears ThreadLocal in the finally code block. Even if a new request comes in using the previous thread, it will not get the wrong user information. Revised code:

ThreadLocal solves thread-safety problems by using exclusive resources. What if resources are shared between threads? You need a thread-safe container. The use of thread-safe concurrency tools does not mean that all thread-safe problems are solved.

Can ThreadLocalRandom set instances to static variables and reuse them in multiple threads?

Nextseed generates a new seed each time the nextseed uses the previous seed:

`UNSAFE.putLong(t = Thread.currentThread(), SEED, r = UNSAFE.getLong(t, SEED) + GAMMA); ` </pre>Copy the code

If you call current once through the main thread to generate a ThreadLocalRandom instance save, then other threads will not be able to fetch the original seed. Each thread must initialize the seed to the thread when it is used by itself. You can set a breakpoint on nextSeed to see:

`UNSAFE.getLong(Thread.currentThread(),SEED); ` </pre>Copy the code

Is ConcurrentHashMap really secure?

We all know that ConcurrentHashMap is a thread-safe hash table container, but it only guarantees thread-safe atomic read and write operations.

2.1 case

You have a Map of 900 elements, and now you add 100 elements to it, and this is done concurrently by 10 threads.

The developer mistakenly thought that using ConcurrentHashMap would prevent thread-safety problems, so he wrote the following code without thinking: The size method is used in the code logic of each thread to get the current number of elements, calculate how many more elements need to be added to ConcurrentHashMap, print this value in the log, and then add the missing elements using the putAll method.

To see the problem, we print out the number of elements at the beginning and end of the Map.

Access interface

Analyzing the log output, the following results can be obtained:

  • The initial size of 900 is as expected, with 100 more elements to fill

  • The worker13 thread found that the current number of elements to be filled is 49, not a multiple of 100

  • In the end, the total number of items in the HashMap is 1549, which is also not in line with the expectation of filling 1000

Bug analysis

The ConcurrentHashMap is like a big basket, and now the basket has 900 oranges, and we want to fill the basket with 1,000 oranges, which is another 100 oranges. There are 10 workers to do the job, and each of them comes to work and calculates how many more oranges are needed, and then puts them in the basket.

ConcurrentHashMap, by itself, ensures that multiple workers don’t interfere with each other as they load things in, but it doesn’t guarantee that worker A sees that there are 100 oranges to be filled but worker B doesn’t see the number of oranges in the basket before they load it. Your operation to put 100 oranges into this basket is not atomic, and it may appear to others that there is a moment when the basket contains 964 oranges and 36 oranges need to be filled.

Limitations of ConcurrentHashMap’s external capabilities:

  • Use does not mean that multiple operations on it are in consistent state and that no other thread is operating on it. Lock manually if necessary

  • Aggregation methods such as size, isEmpty, and containsValue may reflect the intermediate state of ConcurrentHashMap in a concurrent manner. Therefore, in the concurrent case, the return values from these methods can only be used for reference, not for process control. Obviously, using the size method to calculate the difference value is a process control

  • Aggregation methods such as putAll do not guarantee atomicity either, and obtaining data during putAll may obtain partial data

The solution

Lock entire logical section:

Only one thread found that 100 elements needed to be filled, and the other 9 threads found that there was no need to be filled. Finally, the Map size was 1000

Since ConcurrentHashMap requires locking all the way through, why not use HashMap instead? Not exactly.

ConcurrentHashMap provides some atomically simple composite logic methods that can be used to their power. This leads to another common problem in code: when using the advanced utility classes provided by some libraries, developers may still use the new classes in the old way, without using their real features to exert their power.

Know your enemy and win every battle

case

Scenario where a Map is used to count the number of Key occurrences.

  • Use ConcurrentHashMap. The Key range is 10

  • Use up to 10 concurrent operations, looping 10 million times, each operation accumulating random keys

  • If Key does not exist, set the value to 1 for the first time.

show me code:

After the experience in the previous section, we will directly lock the Map and do it again

  • judge

  • Read the current cumulative value

  • + 1

  • Save the cumulative value

This code is functionally correct, but does not fully exploit the performance of ConcurrentHashMap. Optimized:

The atomic method computeIfAbsent of ConcurrentHashMap does the compound logic operation to determine whether V exists in K. If V does not exist, the result of the Lambda run is stored in Map as V, that is, a new LongAdder object is created, and V is returned

Since the V returned by computeIfAbsent is a LongAdder, it is a thread-safe accumulator, and its increment can be called directly.

This achieves extreme performance with thread-safety and a sharp reduction in lines of code.

The performance test

StopWatch is used to test the performance of the two pieces of code, and the final assertion determines whether the number of elements in Map and the sum of all V meet the expectation to verify the correctness of the code

Performance test results:

At least 5 times better performance than using locks.

ComputeIfAbsent Indicates high performance

Java’s Unsafe implementation of CAS. It ensures atomicity of written data at the JVM layer and is more efficient than locking:

`static final <K,V> boolean casTabAt(Node<K,V>[] tab, int i, Node<K,V> c, Node<K,V> v) { return U.compareAndSetObject(tab, ((long)i << ASHIFT) + ABASE, c, v); } `Copy the code

So don’t assume that just using ConcurrentHashMap concurrency tool is a high performance high concurrency program.

Identify computeIfAbsent and putIfAbsent

  • PutIfAbsent wastes time getting the expensive Value when the Key is present.

  • When the Key does not exist, putIfAbsent returns null, beware of null Pointers, and computeIfAbsent returns the calculated value

  • When a Key is not present, putIfAbsent allows null to be put, whereas computeIfAbsent does not. There is a difference between doing a containsKey query later (this is for HashMap, of course). ConcurrentHashMap does not allow to put null values.

CopyOnWriteArrayList regression of

Another example is a simple non-DB operation of business logic that takes longer than expected, and it is much slower to manipulate the local cache when modifying data than to write back to DB. It turns out that someone is using CopyOnWriteArrayList to cache a lot of data, and the data changes frequently in this business scenario.

CopyOnWriteArrayList is a thread-safe version of ArrayList, but it makes a copy of the data every time it changes data, so it is only suitable for read more than write less or no-lock read scenarios.

So if you use CopyOnWriteArrayList, it must be because of the context, not because of the skill.

** CopyOnWriteArrayList V.S Normal locked ArrayList read/write performance **

Test concurrent write performance

Test result: High concurrent write, CopyOnWriteArray is a hundred times slower than synchronous ArrayList

Test the concurrent read performance

Test result: High concurrent reads (1 million GET operations), CopyOnWriteArray is 24 times faster than synchronous ArrayList

Why is CopyOnWriteArrayList so slow for high concurrency writes? Because its Arrays. CopyOf creates new Arrays every time it adds, frequent add consumes a lot of memory requisition and release performance. ? Sorted out 100+ Java project videos + source code + notes

conclusion

  • Don’t just use concurrent tools and not be familiar with threading

  • Don’t think that just because you use concurrency tools, you’re thread safe

  • Without familiarity with the optimization nature of concurrent tools, it is difficult to achieve their true performance

  • Do not use concurrent tools without considering the current business scenario, which may lead to worse system performance

  • Read the official documentation carefully to understand the application scenarios of concurrent tools and their APIS, and test them yourself before using them

  • Concurrency bugs are not easy to recur, so do your own performance stress tests