Java concurrency – ThreadLocal

ThreadLocal introduction:

Concurrency problems tend to occur when multiple threads access the same shared variable, especially when multiple threads write to a variable. In order to ensure thread safety, ordinary users need to take additional synchronization measures when accessing the shared variable to ensure thread safety.

And ThreadLocal is in addition to lock the synchronously another guarantee a avoid multithreaded access appears thread unsafe methods, concurrent thread safety problem with multiple threads at the same time operating a variable, not be additional restrictions would exist threads competing to write in writing data is not in line with expectations, after if we create a variable, Every time a thread accesses it, it accesses its own variables so that there are no thread unsafe issues.

Typical application scenarios of ThreadLocal:

Typical Scenario 1:

Each thread needs to have a unique object (usually a utility class, typically SimpleDateFormat, Random).

SimpleDateFormat to convert a timestamp to a formatted timestamp string:

public static void main(String[] args) {
    ExecutorService threadPool = Executors.newFixedThreadPool(10);
    for (int i = 0; i < 1000; i++) {
        int finalI = i;
        threadPool.execute(new Runnable() {
            @Override
            public void run(a) {
                String result = toDate(1000L+ finalI); }}); } threadPool.shutdown(); }public static String toDate(long seconds){
    Date currentDate = new Date(seconds *1000);
    SimpleDateFormat formator = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
    return formator.format(currentDate);
}
Copy the code

The above code does not have thread-safety problems, but the problem is that we have called the SimpleDateFormat object 1000 times and created it 1000 times. To solve this problem, we can pull the SimpleDateFormat object out of toDate and make it a global variable:

static SimpleDateFormat formator = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss");

public static String toDate(long seconds){
	Date currentDate = new Date(seconds *1000);
	return formator.format(currentDate);
}
// output:
1970-01-01 08:33:06
1970-01-01 08:33:07
1970-01-01 08:32:47
1970-01-01 08:32:47
1970-01-01 08:33:10
1970-01-01 08:33:11
Copy the code

It is easy to see that the globally unique formator object has thread-safety problems because it is not locked, so we can fix it by locking:

public static String toDate(long seconds){
    Date currentDate = new Date(seconds *1000);
    synchronized (formator){
        returnformator.format(currentDate); }}Copy the code

The synchronized keyword causes threads to frequently apply for locks, wait for locks to be released, and release locks. This is not cost-effective. Using ThreadLocal, you can easily solve the problem:

ThreadLocal is modified as follows:

class FormatorThreadLocalGetter {
  public static ThreadLocal<SimpleDateFormat> formator = new ThreadLocal<>() {
    @Override
    protected SimpleDateFormat initialValue(a) {
      return new SSimpleDateFormat("yyyy-MM-dd hh:mm:ss"); }};// or use Lambda
  public static ThreadLocal<SimpleDateFormat> formator2 = ThreadLocal.withInitial(() -> new SimpleDateFormat("yyyy-MM-dd hh:mm:ss"));
}
Copy the code
public class ThreadLocalTest {
  // Here we use COW to record the hashcode of each SimpleDateFormator
  static CopyOnWriteArraySet<String> hashSet = new CopyOnWriteArraySet<String>();

  public static void main(String[] args) throws InterruptedException {
    ExecutorService threadPool = Executors.newFixedThreadPool(10);
    for (int i = 0; i < 1000; i++) {
      int finalI = i;
      threadPool.execute(new Runnable() {
        @Override
        public void run(a) {
          System.out.println(Thread.currentThread().getName());
          System.out.println(toDate(1000L+ finalI)); }}); } threadPool.shutdown(); Thread.sleep(5000);
    // Delay for 5s to make sure all the output is executed, and then see how many formator objects we have created.
    System.out.println(hashSet.size());
    hashSet.forEach(new Consumer<String>() {
      @Override
      public void accept(String s) { System.out.println(s); }}); }public static String toDate(long seconds) {
    Date currentDate = new Date(seconds * 1000);
    SimpleDateFormat formator = FormatorThreadLocalGetter.formator.get();
    // Write down the hashcode of the current thread's formator to see how many hashcodes it ends up with
    hashSet.add(String.valueOf(formator.hashCode()));
    // Get a formator from ThreadLocal.
    returnFormatorThreadLocalGetter.formator.get().format(currentDate); }}// Here we need to override the hashCode function because the default hashCode generation rule is
// Call the constructor to take the String hashCode parameter pattern, since all formators are formators
// the pattern of the hashCode is the same.
class SSimpleDateFormat extends SimpleDateFormat {
  private int hashCode = 0;
  SSimpleDateFormat(String pattern) {
    super(pattern);
  }

  @Override
  public int hashCode(a) {
    if (hashCode > 0) {
      return hashCode;
    }
    hashCode = UUID.randomUUID().hashCode();
    returnhashCode; }}// output:
1970-01-01 08:33:15
1970-01-01 08:33:10
1970-01-01 08:33:18
23 // A total of 23 Formator objects were created for a thousand tasks
-674481611
-424833271
-2124230669
411606156
-1600493931
900910308
540382160
-1054803206.Copy the code

Because 1000 tasks in the thread pool did not create only 10 threads, which still included thread destruction and creation, more than 10 Formator objects were typically created, as expected.

Typical Scenario 2:

Each thread needs to have a place to save its own thread internal global variables, which can be directly used by different methods to avoid the trouble of parameter transfer and avoid unsafe behavior of the thread.

Typical Scenario 2 is actually less common for clients, but can serve as a demonstration of alternative uses of ThreadLocal. In Scenario 1, we use the active initialization of thread-specific objects that we want to store with ThreadLocal objects when they are constructed.

The following scenario is used to demonstrate an active assignment to a ThreadLocal:

For example, as shown in the figure below, each request is processed in a thread and then passed through layers of handlers to process user information.

The information within the same thread are the same, but different threads using the business content of the user is different, this time we can’t simply through a global variables to store, as was the global variable between threads are visible, therefore, we can declare a structure of the map, to save the user information that are unique to each thread, Key is the thread and value is the content we want to save. To be thread-safe, we can do this in two ways:

  1. Lock map operations (synchronized, etc.).
  2. This map is implemented using a thread-safe Map data structure, such as ConcurrentHashMap.

However, both locking and CHM implementations have to face the pressure of synchronous mutex. In this scenario, ThreadLocal is a very good solution. There is no need for synchronous mutex mechanism, and the user parameter does not need to be passed layer by layer through function entry without affecting performance. The user information corresponding to the current thread (request) can be saved.

Simple implementation is as follows:

class UserContextHolder{
  public static ThreadLocal<User> holder = new ThreadLocal<>();
}

class Handler1{
  public void handle(a){
    User user = new User();
    user.name = "UserInfo" + user.hashCode();
    // Handler1 assigns ThreadLoca
      
        to the current thread using the set method
      UserContextHolder.holder.set(user); }}class Handler2{
  public void handle(a){
    // Handler2 uses the get method to get the user information corresponding to the current thread.
    System.out.println("UserInfo:"+ UserContextHolder.holder.get()); }}Copy the code

From the example above, we can summarize several benefits of ThreadLocal:

  1. Thread-safe storage, since each thread has its own unique copy of data, there is no concurrency security.
  2. There is no need to lock, and the execution efficiency is definitely higher than lock synchronization.
  3. You can use memory more efficiently and save on memory overhead, as shown in scenario 1, where 1000 tasks are performed several times. ThreadLocal is clearly a better solution than creating 1000 SimpleDateFormator objects or locking them.
  4. We can also see from Scenario 2 that, in some scenarios, we can simplify the tedious process of parameter passing and reduce the coupling of the code.

ThreadLocal

To understand ThreadLocal, you need to know the relationship between Thread, ThreadLocal, and ThreadLocalMap.

Each Thread object has a variable called ThreadLocalMap. The Map is a hash table data structure. The key of the Map’s Entry is the ThreadLocal object, and the Value is the Value object that the ThreadLocal object stores.

ThreadLocalMap itself is a hash table with an array structure rather than a traditionally defined Map structure. When a ThreadLocalMap encounters a hash conflict, it uses a linear detection method to resolve the conflict. The array stores entries as ThreadLocal and Value objects.

static class Entry extends WeakReference<ThreadLocal<? >>{
    /** The value associated with this ThreadLocal. */Object value; Entry(ThreadLocal<? > k, Object v) {super(k); value = v; }}Copy the code

Note, in particular, that the core of how ThreadLocal works is the ThreadLocalMap data structure held by the thread, not the ThreadLocal itself. ** is a bit convoluted, you can see the analysis below.

Core API parsing:

initialValue:

This method returns the initial value of the data corresponding to the current thread, and is a lazy initialization method. It is not called when a ThreadLocal object is constructed, but when a thread calls ThreadLocal#get.

get:

Gets the Value of this thread, which, if called for the first time, will be initialized by initialize.

set:

Set a new Value for this thread.

remove:

Delete the Value set by this thread, then get, and initialValue will be set again.

The source code is as follows:

public T get(a) {
    Thread t = Thread.currentThread();
    ThreadLocalMap map = getMap(t);
    // The above two lines of JDK source code are equivalent to:
    // ThreadLocalMap map = t.threadLocals
    if(map ! =null) {
        // Get the Entry stored in the ThreadLocalMap of the current thread
        // The Entry key is actually this (ThreadLocal itself).
        ThreadLocalMap.Entry e = map.getEntry(this);
        if(e ! =null) {
            @SuppressWarnings("unchecked")
            T result = (T)e.value;
            ThreadLocal returns the object itself.
            returnresult; }}// If it is not found above, it will be initialized
    return setInitialValue();
}
Copy the code

SetInitialValue is implemented as follows:

private T setInitialValue(a) {
    // Call initialValue to initialize the object to be saved by ThreadLocal.
    T value = initialValue();
    Thread t = Thread.currentThread();
    // Get the ThreadLocalMap of the current thread
    ThreadLocalMap map = getMap(t);
    // the initial value of Thread#ThreadLocalMap is null
    if(map ! =null) {
        // If map exists, set stores
        map.set(this, value);
    } else {
        // Otherwise, the current Thread's threadLocals(ThreadLocalMap) value is assigned and saved
        // Value created above.
        createMap(t, value);
    }
    if (this instanceofTerminatingThreadLocal) { TerminatingThreadLocal.register((TerminatingThreadLocal<? >)this);
    }
    // Return Value as the return Value of get.
    return value;
}
Copy the code

If the thread already has a ThreadLocalMap, it stores the Value directly. If the thread doesn’t have a ThreadLocalMap, it creates the Map and stores the Value.

    public void set(T value) {
        Thread t = Thread.currentThread();
        ThreadLocalMap map = getMap(t);
        if(map ! =null) {
            map.set(this, value);
        } else{ createMap(t, value); }}Copy the code

Here we can also see that if the initialValue of Value is written through initialValue, setInitialValue will be called. If the initialValue is written through set, setInitialValue will not be called.

Also, note that initialValue is usually called only once. Repeated GET by the same thread does not trigger multiple init operations. However, initialValue will still be triggered if the Value is removed through the REMOVE API and the subsequent GET is used.

public void remove(a) {
    ThreadLocalMap m = getMap(Thread.currentThread());
    if(m ! =null) {
        m.remove(this); }}Copy the code

If we do not actively override the initialValue method, the default is to return NULL, generally use the method of anonymous inner class to override the initialValue method, which is convenient in the subsequent use, can be directly used, but note that initialValue unless the active remove, Otherwise, it will only be called once, meaning that null validation is still required.

ThreadLocal memory leak:

One of the most discussed aspects of ThreadLocal is its potential for memory leaks.

Let’s look at the definition of ThreadLocalMap#Entry:

static class Entry extends WeakReference<ThreadLocal<? >>{ Object value; Entry(ThreadLocal<? > k, Object v) {super(k); value = v; }}Copy the code

The Entry reference of ThreadLocalMap is a strong reference, while the Entry reference of ThreadLocal is a weak reference, but the Value reference is a strong reference, which may result in memory leaks.

Normally, threadLocals is null when the thread terminates, which seems fine.

But:

If the thread does not terminate, or if the thread lives for a long time, the Value object will never be reclaimed, and if the Value object holds another object, such as an Activity in Android, it will leak the Activity’s memory. However, because the Thread bound to Value is still running, the Activity object cannot be collected by the GC.

At this point the chain of references becomes the following:

Thread->ThreadLocalMap->Entry(key is null,Value is not null)->Value->Activity

The JDK takes this into account. ThreadLocalMap scans for an Entry with a null key in its set, remove, and rehash methods, and sets the Value to NULL so that the object corresponding to the original Value can be reclaimed.

Take resize as an example:

private void resize(a) {
    Entry[] oldTab = table;
    int oldLen = oldTab.length;
    int newLen = oldLen * 2;
    Entry[] newTab = new Entry[newLen];
    int count = 0;

    for (Entry e : oldTab) {
        if(e ! =null) { ThreadLocal<? > k = e.get();// If the key is null, set value to null.
            if (k == null) {
                e.value = null; // Help the GC
            } else {
                int h = k.threadLocalHashCode & (newLen - 1);
                while(newTab[h] ! =null)
                    h = nextIndex(h, newLen);
                newTab[h] = e;
                count++;
            }
        }
    }

    setThreshold(newLen);
    size = count;
    table = newTab;
}
Copy the code

But still, there’s a problem:

The set, remove, and rehash methods will not be called if the Thread is running but the ThreadLocalMap is not used, and there is still a memory leak……

According to the Ali Java specification, the best practice for ThreadLocal is to actively remove the ThreadLocal after it runs out. Back to the code in typical scenario 2, we need to execute the ThreadLoca. Remove operation at the end of Handler2. Or in the Handler link process, if the logic cannot run to the end of Handler2, the corresponding exception also needs to handle remove.

NPE problems in packing and unpacking:

If you use ThreadLocal to store basic data types, you need to be aware of the null pointer exception, because ThreadLocal can only store Object types after being unboxed. When unboxed, null Pointers need to be compatible with the following code:

public class ThreadLocalNPE {
  static ThreadLocal<Integer> intHolder = new ThreadLocal<>();
  static int getV(a){
    return intHolder.get();
  }
  public static void main(String[] args) {
    getV();/ / thrown exception}}Copy the code

GetV intholder. get first gets an Integer null value, which is converted to the basic data type. Of course, an error is reported.