A: background

1. Tell a story

In the middle of this month, a friend added me to WX for help. His program thread occupancy was very high, and he sought to solve it. The screenshot is as follows:

To be honest, it’s fun to chat with programmers from different industries, make friends and expand your circle. My friend said he even lost a project because of this bug. 😂 😂 😂

Ha ha, it seems that the customer does not buy, acceptance, harm… Early find me, this customer is not retrieved back, this may be the value of technology! 😁 😁 😁

Since you found me, let this hanging problem completely disappear, winDBG talk.

Two: Windbg analysis

1. Check the status of threads

Since the friend says thread is high, then start with thread, use! The t command is sufficient.


0:000> !t
ThreadCount:      1006
UnstartedThread:  0
BackgroundThread: 1005
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
                                                                                                            Lock  
 DBG   ID     OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
   0    1     10c8 00000000004D89A0    2a020 Preemptive  0000000000000000:0000000000000000 00000000002f1070 0     MTA 
   2    2     13c0 000000000031FF70    2b220 Preemptive  0000000000000000:0000000000000000 00000000002f1070 0     MTA (Finalizer) 
   3    3     12cc 000000000032B780  102a220 Preemptive  0000000000000000:0000000000000000 00000000002f1070 0     MTA (Threadpool Worker) 
   4    5     138c 000000000039E3C0  8029220 Preemptive  00000000B6D3CCA0:00000000B6D3D260 00000000002f1070 0     MTA (Threadpool Completion Port) 
   6    6     106c 0000000019E562A0  3029220 Preemptive  0000000000000000:0000000000000000 00000000002f1070 0     MTA (Threadpool Worker) 
   8   11      7f0 0000000019F8F9E0    20220 Preemptive  0000000000000000:0000000000000000 00000000002f1070 0     Ukn 
   9 1949     323c 000000009AA69E40  8029220 Preemptive  00000000B6BB8AD0:00000000B6BB94E0 00000000002f1070 0     MTA (Threadpool Completion Port) 
  10 1637      b3c 000000009AA1C260  8029220 Preemptive  00000000B6CD4220:00000000B6CD47E0 00000000002f1070 0     MTA (Threadpool Completion Port) 
  11 1947     223c 000000009ADB72E0  8029220 Preemptive  00000000B6D88D68:00000000B6D89550 00000000002f1070 0     MTA (Threadpool Completion Port) 
  12 1968     2e74 000000009AA1E330  8029220 Preemptive  00000000B6A8CD40:00000000B6A8D300 00000000002f1070 0     MTA (Threadpool Completion Port) 
...
 994  313     1fa4 000000009A81FFC0  8029220 Preemptive  00000000B6BFC1B8:00000000B6BFC410 00000000002f1070 0     MTA (Threadpool Completion Port) 
 995 1564     18ec 000000009A835510  8029220 Preemptive  00000000B6AC1ED0:00000000B6AC2490 00000000002f1070 0     MTA (Threadpool Completion Port) 
 996 1581      4ac 000000001C2E36E0  8029220 Preemptive  00000000B6C51500:00000000B6C51AC0 00000000002f1070 0     MTA (Threadpool Completion Port) 
 997  814     2acc 000000009A73B5E0  8029220 Preemptive  00000000B6D67BF8:00000000B6D683E0 00000000002f1070 0     MTA (Threadpool Completion Port) 
 998  517     25dc 000000009A838990  8029220 Preemptive  00000000B6D2CA10:00000000B6D2CFD0 00000000002f1070 0     MTA (Threadpool Completion Port) 
 999  670     2a10 000000001C2E4400  8029220 Preemptive  00000000B6CD0490:00000000B6CD0A50 00000000002f1070 0     MTA (Threadpool Completion Port) 
1000  183     1704 000000009A81F930  8029220 Preemptive  00000000B6AE8670:00000000B6AE8C30 00000000002f1070 0     MTA (Threadpool Completion Port) 
1001  117     1bcc 000000009A73BC70  8029220 Preemptive  00000000B6B92780:00000000B6B92D40 00000000002f1070 0     MTA (Threadpool Completion Port) 
1002 1855     1d68 000000009A81E580  8029220 Preemptive  00000000B6B28460:00000000B6B28A20 00000000002f1070 0     MTA (Threadpool Completion Port) 
1003 1070     2ef0 000000009A73C300  8029220 Preemptive  00000000B6B8F640:00000000B6B8FC00 00000000002f1070 0     MTA (Threadpool Completion Port) 
1004 1429     210c 000000001C2E4A90  8029220 Preemptive  00000000B6D5F488:00000000B6D5FC70 00000000002f1070 0     MTA (Threadpool Completion Port) 
1005 1252     2f38 000000009A838300  8029220 Preemptive  00000000B6A99240:00000000B6A99800 00000000002f1070 0     MTA (Threadpool Completion Port) 
1006 1317     3118 000000001C2E5120  8029220 Preemptive  00000000B6DA3A30:00000000B6DA4440 00000000002f1070 0     MTA (Threadpool Completion Port) 
1007 1837     3120 000000009A8375E0  8029220 Preemptive  00000000B6D38F10:00000000B6D394D0 00000000002f1070 0     MTA (Threadpool Completion Port) 
1009 1964     2f64 000000009A81DEF0  1029220 Preemptive  0000000000000000:0000000000000000 00000000002f1070 0     MTA (Threadpool Worker) 

Copy the code

As you can see, there are currently 1006 threads, of which 1000 are Threadpool Completion ports. This is the first time that so many I/O threads are stuck, 🐂👃.

Honestly, seeing Threadpool Completion Port reminds me of a callback to an asynchronous operation, so why are so many I/O threads stuck? To find out, take a look at the thread.


0:1000> ~1000s ntdll! NtNotifyChangeDirectoryFile+0xa:
00000000`77c7a75a c3              ret
0:1000> !clrstack 
OS Thread Id: 0x1704 (1000)
        Child SP               IP Call Site
00000000A99FF4C0 0000000077c7a75a [InlinedCallFrame: 00000000a99ff4c0] Interop+Kernel32.ReadDirectoryChangesW(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Boolean, Int32, Int32 ByRef, System.Threading.NativeOverlapped*, IntPtr)
00000000A99FF4C0 000007fe8e87bd20 [InlinedCallFrame: 00000000a99ff4c0] Interop+Kernel32.ReadDirectoryChangesW(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Boolean, Int32, Int32 ByRef, System.Threading.NativeOverlapped*, IntPtr)
00000000A99FF470 000007fe8e87bd20 DomainBoundILStubClass.IL_STUB_PInvoke(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Boolean, Int32, Int32 ByRef, System.Threading.NativeOverlapped*, IntPtr)
00000000A99FF560 000007fef19dab6e System.IO.FileSystemWatcher.Monitor(AsyncReadState) [E:\A\_work\322\s\corefx\src\System.IO.FileSystem.Watcher\src\System\IO\FileSystemWatcher.Win32.cs @ 141]
00000000A99FF5E0 000007fef19dae6c System.IO.FileSystemWatcher.ReadDirectoryChangesCallback(UInt32, UInt32, System.Threading.NativeOverlapped*) [E:\A\_work\322\s\corefx\src\System.IO.FileSystem.Watcher\src\System\IO\FileSystemWatcher.Win32.cs @ 227]
00000000A99FF630 000007feedbe0af9 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [E:\A\_work\191\s\src\mscorlib\shared\System\Threading\ExecutionContext.cs @ 167]
00000000A99FF6B0 000007feede094dc System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*) [E:\A\_work\191\s\src\mscorlib\src\System\Threading\Overlapped.cs @ 108]
00000000A99FF7F0 000007feee359ed3 [GCFrame: 00000000a99ff7f0] 
00000000A99FF9D0 000007feee359ed3 [DebuggerU2MCatchHandlerFrame: 00000000a99ff9d0] 

Copy the code

Last month, I analyzed a.net streaming unicorn API handle leak analysis, which is caused by FileSystemWatcher file handle explosion. It’s a small world, there could be a collision… Let’s focus on that.

2. To explore FileSystemWatcher

For further analysis, use! The dso command looks at the current thread stack object.


0:1000> !dso
OS Thread Id: 0x1704 (1000)
RSP/REG          Object           Name
00000000A99FF508 00000000263285d8 System.Byte[]
00000000A99FF518 00000000242aeb10 System.Threading._IOCompletionCallback
00000000A99FF560 00000000242ae1b0 Microsoft.Win32.SafeHandles.SafeFileHandle
00000000A99FF568 00000000242aeaa8 System.Threading.PreAllocatedOverlapped
00000000A99FF578 00000000242aeb10 System.Threading._IOCompletionCallback
00000000A99FF5E0 00000000242a8538 System.IO.FileSystemWatcher
00000000A99FF5E8 00000000242aea10 System.IO.FileSystemWatcher+AsyncReadState
00000000A99FF608 00000000242aea10 System.IO.FileSystemWatcher+AsyncReadState
00000000A99FF610 0000000023206e30 System.Threading.ExecutionContext
00000000A99FF618 0000000001032928 System.Threading.ContextCallback
00000000A99FF630 00000000242a8538 System.IO.FileSystemWatcher
00000000A99FF678 00000000b6a69a40 System.Threading.Thread
00000000A99FF688 00000000242aeb10 System.Threading._IOCompletionCallback
00000000A99FF690 0000000023206e30 System.Threading.ExecutionContext
00000000A99FF6C0 0000000021fa55d8 System.Threading._IOCompletionCallback
00000000A99FF6C8 000000002052e6e0 System.Threading.ExecutionContext
00000000A99FF7E0 000000000560d2b0 System.Threading.OverlappedData

Copy the code

Since objects on the thread stack scale small, look at the smallest address of system.byte [] to see what the current callback is.

This is another example of building a ConfigurationRoot with reloadOnChange: ConfigurationRoot and FileSystemWatcher are not ConfigurationRoot. The configuration file is not ConfigurationRoot. The configuration file is not ConfigurationRoot. Listen as I slowly break it down and verify the number of these two classes on the managed heap.


0:1000> !dumpheap -stat -type FileSystemWatcher
Statistics:
              MT    Count    TotalSize Class Name
000007fe8ed5bc90        2          160 System.Collections.Generic.Dictionary`2[[System.String, System.Private.CoreLib],[System.IO.FileSystemWatcher, System.IO.FileSystem.Watcher]]
000007fe8e9f11a0    34480      1930880 System.IO.FileSystemWatcher+AsyncReadState
000007fe8e9d69c8    34480      4137600 System.IO.FileSystemWatcher
Total 68962 objects

0:1000> !dumpheap -stat -type ConfigurationRoot
Statistics:
              MT    Count    TotalSize Class Name
000007fe8e9f1e70    34480       827520 Microsoft.Extensions.Configuration.ConfigurationRoot+<>c__DisplayClass2_0
000007fe8e999560    34480      1103360 Microsoft.Extensions.Configuration.ConfigurationRoot
Total 68960 objects

Copy the code

Sure enough, the managed heap has a 3.4W FileSystemWatcher and ConfigurationRoot, so you need to communicate with your friends.

3. What code is causing this?

ConfigurationRoot ConfigurationRoot ConfigurationRoot ConfigurationRoot ConfigurationRoot ConfigurationRoot ConfigurationRoot ConfigurationRoot ConfigurationRoot

FileSystemWatcher monitors appSettings. json in real time, causing 3.4W objects to fail to be released.

Three: why log changes cause the program to freeze

1. Initial confusion

Even if there is a 3.4W FileSystemWatcher in memory, it is only monitoring appsettings.json. As long as the file is not changed, it will not trigger the 3.4W callback, right? But after I analyzed the ConfigurationRoot source code, I realized how damn naive I was.

2. Look for the answer in the source code

What does FileSystemWatcher monitor? You can set a breakpoint in its constructor, as shown below:

Clearly see that the watch is application root directory, this could explain why there are changes to the log file will trigger the callback function of the file changes, to test, I can be in the middle and lower a breakpoint ReadDirectoryChangesCallback method, and then throw a log file to the root directory, See if it can trigger… Screenshot below:

If the log changes 100 times, 340W callbacks will be triggered. If the log changes don’t stop, the program will be overwhelmed by a flood of callbacks. Right…

Four:

ConfigurationRoot: reloadOnChange: ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot: int ConfigurationRoot True causes the IO thread to be unable to handle the massive callbacks caused by the log file changes in a timely manner, resulting in the application hanging.

Once you know the whole story, optimization is simple, and there are two ways to do it.

  1. Change reloadOnChange: true to reloadOnChange: false.

  2. Try to inject Configuration into the DataBaseConfig class, or make it a static variable. 😁 😁 😁

The last egg, my friend is too polite.