A: background

1. Tell a story

ThreadStatic variables are stored in the same directory as ThreadStatic variables. Can you help me dig out 😂😂😂, in fact, this question asked quite deep, playing high-level language friends believe that few contact with this feature, although many friends know how to use this feature, of course, I did not study this, since to answer this question, I have to study the answer! For better universality, start with the simple ones!

ThreadStatic = ThreadStatic

1. Plain static variables

The static variable can be used as a process cache to improve performance. You can also use the static variable as a level cache.


    public class Test
    {
        public static Dictionary<int.string> cachedDict = new Dictionary<int.string> (); }Copy the code

As I mentioned earlier, this is a process-level cache that can be seen by multiple threads, so in a multi-threaded environment, you need to pay special attention to synchronization. Either lock or ConcurrentDictionary, I think this is also a stereotype of thinking. Most of the time, thinking always fixes on the existing foundation, rather than jumping out of the thinking and dealing with the foundation. What do you mean by saying so much? Let me give you an example:

In the common chain tracking framework in the market, for example: Zikpin, SkyWalking, uses collections to store links that track the current thread, such as A -> B -> C -> D -> B -> A. The conventional wisdom is to define A global cachedDict and use various synchronization mechanisms. Wouldn’t it be better to reduce cachedDict’s access scope and make global access threadlevel?

2. Mark static variables with ThreadStatic

ThreadStatic is an easy way to implement ThreadStatic in cachedDict:


    public class Test{[ThreadStatic]
        public static Dictionary<int.string> cachedDict = new Dictionary<int.string> (); }Copy the code

Then you can open multiple threads to feed data to cachedDict to see if the dict is Thread scoped. The code is as follows:


    class Program
    {
        static void Main(string[] args)
        {
            var task1 = Task.Run(() =>
            {
                if (Test.cachedDict == null) Test.cachedDict = new Dictionary<int.string> (); Test.cachedDict.Add(1."mary");
                Test.cachedDict.Add(2."john");

                Console.WriteLine($"thread={Thread.CurrentThread.ManagedThreadId}Dict records:{Test.cachedDict.Count}");
            });

            var task2 = Task.Run(() =>
            {
                if (Test.cachedDict == null) Test.cachedDict = new Dictionary<int.string> (); Test.cachedDict.Add(3."python");
                Test.cachedDict.Add(4."jaskson");
                Test.cachedDict.Add(5."elen");

                Console.WriteLine($"thread={Thread.CurrentThread.ManagedThreadId}Dict records:{Test.cachedDict.Count}"); }); Console.ReadLine(); }}public class Test{[ThreadStatic]
        public static Dictionary<int.string> cachedDict = new Dictionary<int.string> (); }Copy the code

The result is a Thread level, and the synchronization overhead between threads is avoided. 😄

Select ThreadStatic from Windbg

1. Understanding of TEB and TLS

  • TEB (Thread Environment Block)

Each Thread has a copy of its own private data stored in the Thread’s TEB, which can be printed out in WinDBG if you want to see it.


0:000> !teb
TEB at 0000001e1cdd3000
    ExceptionList:        0000000000000000
    StackBase:            0000001e1cf80000
    StackLimit:           0000001e1cf6e000
    SubSystemTib:         0000000000000000
    FiberData:            0000000000001e00
    ArbitraryUserPointer: 0000000000000000
    Self:                 0000001e1cdd3000
    EnvironmentPointer:   0000000000000000
    ClientId:             0000000000005980 . 0000000000005aa8
    RpcHandle:            0000000000000000
    Tls Storage:          000001b599d06db0
    PEB Address:          0000001e1cdd2000
    LastErrorValue:       0
    LastStatusValue:      c0000139
    Count Owned Locks:    0
    HardErrorMode:        0

Copy the code

It can be seen from the structure of TEB that there are both thread-local storage (TLS) and ExceptionList storage (ExceptionList) and other related information.

  • TLS (Thread Local Storage)

The process allocates a total of 1088 slots to TLS after startup. Each thread is assigned a dedicated TLSIndex index and has a set of slots that you can verify with WinDBG.


0:000> !tls
Usage:
tls <slot> [teb]
  slot:  - 1 to dump all allocated slots
         {00n1088} to dump specific slot
  teb:   <empty> for current thread
         0 for all threads in this process
         <teb address> (not threadid) to dump forspecific thread. 0:000> ! tls -1 TLS slotsonThread: 5980.5 AA8 0x00000000000000000000 0x0001:0000000000000000 0x0002:0000000000000000 0x0003: 0000000000000000 0x0004 : 0000000000000000 ... 0x0019 : 0000000000000000 0x0040 : 0000000000000000 0:000> ! t Lock DBG ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 0 1 5aa8 000001B599CEED90 2a020 Preemptive 000001B59B9042F8:000001B59B905358 000001b599cdb130 1 MTA 5 2 90c 000001B599CF4930 2b220 Preemptive 0000000000000000:0000000000000000 000001b599cdb130 0MTA (Finalizer) 
   7    3   74 000001B59B7272A0  102a220 Preemptive  0000000000000000:0000000000000000 000001b599cdb130 0     MTA (Threadpool Worker) 
   9    4 2058 000001B59B7BAFF0  1029220 Preemptive  0000000000000000:0000000000000000 000001b599cdb130 0     MTA (Threadpool Worker) 


Copy the code

{0-0n1088} to dump specific slot {0-0n1088} to dump specific slot

All right, with the basic concepts covered, it’s time to take a look at assembly code.

2. Look for answers in assembly code

To better use windbg, I’ll define a simple ThreadStatic int variable as follows:


    class Program{[ThreadStatic]
        public static int i = 0;

        static void Main(string[] args)
        {
            i = 10;   // 12 line

            varnum = i; Console.ReadLine(); }}Copy the code

Use it next! U Disassemble the Main function, focusing on line 12 where I = 10; .


0:000> !U /d 00007ffbe0ae0ffb
E:\net5\ConsoleApp5\ConsoleApp5\Program.cs @ 12:
00007ffb`e0ae0fd6 48b9b0fbb7e0fb7f0000 mov rcx,7FFBE0B7FBB0h
00007ffb`e0ae0fe0 ba01000000      mov     edx,1
00007ffb`e0ae0fe5 e89657a95f call coreclr! JIT_GetSharedNonGCThreadStaticBase (00007ffc`40576780)
00007ffb`e0ae0fea c7401c0a000000  mov     dword ptr [rax+1Ch],0Ah

Copy the code

From the assembly instruction, the last 10 is assigned to the lower 32 bits of RAx +1Ch, so where did the RAx address come from? It can be seen that the core logic within JIT_GetSharedNonGCThreadStaticBase method, then why have to research this method.

3. The debugging JIT_GetSharedNonGCThreadStaticBase core function

Next set a breakpoint at 12! BPMD program.cs :12, the simplified assembly code of the method is as follows:

coreclr! JIT_GetSharedNonGCThreadStaticBase:00007ffc`2c38679a 448b0dd7894300         mov     r9d, dword ptr [coreclr!_tls_index (00007ffc`2c7bf178)]
00007ffc`2c3867a1 654c8b042558000000     mov     r8, qword ptr gs:[58h]
00007ffc`2c3867aa b908000000             mov     ecx, 8
00007ffc`2c3867af 4f8b04c8               mov     r8, qword ptr [r8+r9*8]
00007ffc`2c3867b3 4e8b0401               mov     r8, qword ptr [rcx+r8]
00007ffc`2c3867b7 493b8060040000         cmp     rax, qword ptr [r8+460h]
00007ffc`2c3867be 732bjae coreclr! JIT_GetSharedNonGCThreadStaticBase+0x6b (00007ffc`2c3867eb)
00007ffc`2c3867c0 4d8b8058040000         mov     r8, qword ptr [r8+458h]
00007ffc`2c3867c7 498b04c0               mov     rax, qword ptr [r8+rax*8]
00007ffc`2c3867cb 4885c0                 test    rax, rax
00007ffc`2c3867ce 741bje coreclr! JIT_GetSharedNonGCThreadStaticBase+0x6b (00007ffc`2c3867eb)
00007ffc`2c3867d0 8bca                   mov     ecx, edx
00007ffc`2c3867d2 f644011801             test    byte ptr [rcx+rax+18h], 1
00007ffc`2c3867d7 7412je coreclr! JIT_GetSharedNonGCThreadStaticBase+0x6b (00007ffc`2c3867eb)
00007ffc`2c3867d9 488b4c2420             mov     rcx, qword ptr [rsp+20h]
00007ffc`2c3867de 4833cc                 xor     rcx, rsp
00007ffc`2c3867e1 e89a170600 call coreclr! __security_check_cookie (00007ffc`2c3e7f80)
00007ffc`2c3867e6 4883c438               add     rsp, 38h
00007ffc`2c3867ea c3                     ret  

Copy the code

So let me take a closer look at the MOV operation here.

1) dword ptr [coreclr!_tls_index (00007ffc`2c7bf178)]

This is simple: get the thread-specific TLs_index index

2) qword ptr gs:[58h]

What does gs:[58h] mean? The gs register is used to store the teB address of the current thread. 58 is the offset from the TEB address. In fact, you can print out the data structure of the TEB.


0:000> dt teb coreclr! TEB +0x000 NtTib            : _NT_TIB
   +0x038 EnvironmentPointer : Ptr64 Void
   +0x040 ClientId         : _CLIENT_ID
   +0x050 ActiveRpcHandle  : Ptr64 Void
   +0x058 ThreadLocalStoragePointer : Ptr64 Void
   +0x060 ProcessEnvironmentBlock : Ptr64 _PEB
   ...

Copy the code

The above sentence + 0 x058 ThreadLocalStoragePointer: Ptr64 Void, you can see that is actually pointing ThreadLocalStoragePointer.

3) qword ptr [r8+r9*8]

With the foundation of the previous two steps, the assembly is simple and does an index operation: ThreadLocalStoragePointer [tls_index], right, and thus obtain belongs to the TLS content of the thread, the ThreadStatic variables will be stored in the array of a memory block.

Follow-up and some calculating migration logic operations are based on the ThreadLocalStoragePointer tls_index above, around method calls, assembly unreadable ha 😂 😂 😂

Four:

ThreadStatic variables can be determined, on the whole, is stored in TEB ThreadLocalStoragePointer array and NET5 CoreCLR no compilation is successful, these days if you are interested, You can debug CoreCLR + assembly to dig deeper!

For more high-quality dry goods: See my GitHub:dotnetfly