A: background

1. Tell a story

A few days ago a friend and wx complained that his program was always inexplicably high CPU during peak hours, how to analyze for help?





Communicate with this friend come down, it is said that this problem troubled them a few years 😂, still asked the engineer of Microsoft to come over to solve, have nothing to do, should still did not seek to the big guy of Microsoft…

Regular readers should know that I have written several articles about the problem of the program CPU spike, which basically falls into two categories:

  • The GC triggered
  • A large number of lock lock

A few of them are all kinds of inexplicable problems, which cannot be enumerated one by one 🤣. Now that my friend has found me, I have to find a way to solve it for him. Without much talk, I go to Windbg.

2. WINDBG analysis

1. View the synchronization block table

Encounter this kind of problem, the first check synchronization block table is already my inertial thinking, the command is very simple! Syncblk.

0:00 0 >! syncblk Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner ----------------------------- Total 20779 CCW 16 RCW 21 ComClassFactory 0 Free 16490

I’ll go. Pounce a blank. There’s nothing in the synchronized block table… Since it has nothing to do with locks, let’s look at threads, which are supported by the CPU.

2. Look at threads

To view managed threads on your system, use! T command, a lot of threads, simplify it a little bit.

0:00 0 >! t ThreadCount: 135 UnstartedThread: 0 BackgroundThread: 132 PendingThread: 0 DeadThread: 1 Hosted Runtime: no Lock ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 34 1 25d4 000001ea28702130 28220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 Ukn 74 2 3ed0 000001ea286fa940 2b220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 MTA (Finalizer) 76 3 4a70 000001f4447d7810 102a220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 MTA (Threadpool Worker) 77 4 326c 000001f4447dbe60 21220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0 Ukn 78 6 2dac 000001f4447d9750 1020220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 Ukn (Threadpool Worker) 79 7 1468 000001f444a2d6f0 21220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 1 Ukn (GC) 80 8 f0 000001f444a2cf20 21220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 Ukn 81 9 3118 000001f444a2f630 21220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 Ukn

Sell a pass first, can have a friend to see these threads have what unusual?? Yes, the last column of thread 79 has a UKN (GC) tag, and I’m sure you’re wondering, what does that mean? There are some changes to the underlying GC pattern, but anyway, it tells you at some level that your program is triggering GC. For further verification, use! T-Special takes a look at the category of thread 79 and more details.

0:00 0 >! t -special ThreadCount: 135 UnstartedThread: 0 BackgroundThread: 132 PendingThread: 0 DeadThread: 1 Hosted Runtime: no Lock ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 34 1 25d4 000001ea28702130 28220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 Ukn 79 7 1468 000001f444a2d6f0 21220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 1 Ukn (GC) OSID Special thread type 41 38d8 DbgHelper 42 1c88 GC 74 3ed0 Finalizer 75 402c ProfilingAPIAttach 76 4a70 Timer 79 1468 GC SuspendEE

In the last line of output you can see that 79 is the GC thread and there is a strange sign for SuspendEE behind it, so again you wonder what that means.

SuspendEE =
Suspend CLR Execution Engine (EE)

Thread 79 has frozen the CLR execution engine for the purpose of making it easier for the other 31 GC threads to handle the current managed heap. Cpuid validation.

0:00 0 >! CPUID CP F/M/S Manufacturer MHz 0 6,62,4 <unavailable> 2600 1 6,62,4 <unavailable> 2600 2, 62,4 <unavailable> 2600 3 6,62,4 <unavailable> 2600 XXX 31 6,62,4 <unavailable> 2600 XXX 31

Now that the GC is anticipated, the next step is to call out all the managed and unmanaged stacks for all threads.

3. Look at each thread stack

To view the managed and unmanaged stacks of individual threads is as simple as using! Eestack will do, and then you can retrieve the keyword WaitUntilgCComplete to determine how many threads are waiting for the GC to finish.

As you can see from the graph, there are currently 40 threads that are blocked. The problem is becoming clearer. Let’s look at which threads are doing what they shouldn’t be doing to trigger the GC.



It is obvious that currently threads 170 and 187 are allocating large objects to gc_heap::allocate_large_object, triggering GC. The large object heap itself is a daunting thing, and it is CPU heavy to collect and clean it. This also and the friend said that 1 minute or so CPU dropped the situation is more consistent.

4. Look for the culprit

Now the focus is on these two threads, I have looked at the two threads stack is the same method, so here I pick a 187 thread to analyze, can use! ClrStack takes a look at its managed stack.

0:18 7 >! clrstack OS Thread Id: 0x1ef0 (187) Child SP IP Call Site 00000054ce631e30 00007ffc4021bde2 System.String.FillStringChecked(System.String, Int32, System.String) 00000054ce631e70 00007ffc402222a8 System.String.ConcatArray(System.String[], Int32) 00000054ce631ed0 00007ffc40223528 System.String.Concat(System.String[]) 00000054ce631f40 00007ffbe6dbdafb BLL.xxx.xxx.GetRowString(System.String, Boolean, Boolean, System.String, System.String, System.String, System.String, System.String, System.String, Int32, System.String, System.String, System.String, System.String, System.String, System.String, Int32, Int32, System.String, System.Nullable`1, System.Nullable`1, System.String, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.String, System.String, System.String, System.String, System.String ByRef) 00000054ce65cf40 00007ffbe6ab3187 BLL.xxx.xxx.ExpZB(System.String, Boolean, Boolean, System.String, System.String, System.String, System.String, Int32, System.String, System.String, System.String, Int32, System.String, System.String, System.String, System.String, System.String, System.String, Int32, Int32, System.String, System.Nullable`1, System.Nullable`1, System.String, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.String, System.String, System.String, System.String)

From the stack, it seems to be the use of System.String.Concat concatenation String, good boy, String concatenation so many years I do not know how many times people attacked, or a lot of people step on pit 😓 call, in order to further verify, can be used! clrstack -p + ! Da-Details XXX: What is this System.String[]?

0:18 7 >! clrstack -p OS Thread Id: 0x1ef0 (187) 00000054ce631e70 00007ffc402222a8 System.String.ConcatArray(System.String[], Int32) PARAMETERS: values (<CLR reg>) = 0x000001ea69e8d2f8 totalLength = <no data> 0:187> ! da -details 0x000001ea69e8d2f8 Name: System.String[] Size: 128(0x80) bytes Array: Rank 1, Number of elements 13, Type CLASS Element Methodtable: 00007ffc403d6948 [0] 000001f2391a83f0 Name: System.String MethodTable: 00007ffc403d6948 EEClass: 00007ffc3fcd50e0 Size: 445950(0x6cdfe) bytes File: C:\Windows\Microsoft.Net \ assembly \ GAC_64 \ mscorlib \ v4.0 _4. 0.0.0 __b77a5c561934e089 \ mscorlib DLL String: XXXXX

String[] contains 13 elements, of which the largest String is 445950 bytes = 445K, larger than the 85K limit of the large object, so the Concat is the problem, and so is the 170 threads. The next question I had to solve was: why is there such a large string and what is going on in the code?? To find out, you’ll have to export the source code from dump.

5. Review the problem code

To analyze the problem code, use! ip2md + ! SaveModule exports the bll.xxx.xxx. getRowString method.

0:18 7 >! ip2md 00007ffbe6dbdafb MethodDesc: 00007ffbe5342118 Method Name: BLL.xxx.xxx.GetRowString(System.String, Boolean, Boolean, System.String, System.String, System.String, System.String, System.String, System.String, Int32, System.String, System.String, System.String, System.String, System.String, System.String, Int32, Int32, System.String, System.Nullable`1<Int32>, System.Nullable`1<Int32>, System.String, System.Nullable`1<Single>, System.Nullable`1<Single>, System.Nullable`1<Single>, System.Nullable`1<Single>, System.String, System.String, System.String, System.String, System.String ByRef) Class: 00007ffbe52fe328 MethodTable: 00007ffbe53421d8 mdToken: 0000000006000096 Module: 00007ffbe471a890 0:187> ! savemodule 00007ffbe471a890 E:\dumps\RowString.dll 3 sections in file section 0 - VA=2000, VASize=f7fcc, FileAddr=200, FileSize=f8000 section 1 - VA=fa000, VASize=3bc, FileAddr=f8200, FileSize=400 section 2 - VA=fc000, VASize=c, FileAddr=f8600, FileSize=200

ILSpy then decompiles the code.

This is a string concatenation. I feel that gen and LOH are too late to allocate memory segments.

Three:

Actually, this is a textbook problem, and there’s a textbook solution, and I looked at this method which has over 600 lines of code, which basically does string concatenation, and finally the solution.

  • Refactor this method to replace String with StringBuilder as much as possible to minimize the number of GC triggers.

The final little Easter egg, it seems that this analysis results and a friend’s deep suspicion coincides…

For more high-quality dry goods: see my GitHub:dotnetfly