background

One night, I received an alarm about CPU overload at 8 o ‘clock.

After a while, several lines of business began to report that the system was slowing down.

After the emergency treatment of the machine, let the business first return to normal.

I checked the monitor and pulled out the cool one.

This service is an important old business, a Web API project of the.NET Framework.

Go back and figure out the root cause of 100% CPU persistence so you can remove the thunder.

To analyze, you need to create a dump package when the CPU is consistently high and use WinDbg to process it.

Let’s analyze it and find out.

WinDbg analysis

WinDbg analysis CPU, with more of the fact that a few commands.

Follow the basic results come out.

The first is to use! Threadpool views current CPU health and thread information.

The main one above is 76% CPU utilization.

And then use! Runaway looks at the time of the thread

It can be seen from the figure above that threads 32, 34, 38 and 39 are suspicious.

The following is to switch to the corresponding thread to see the specific information.

Use ~34s to switch to thread 34, if other, replace as needed.

And then use! Clrstack see what this thread is doing

The figure above clearly shows that there is a ConverAgeMonth method that uses the re. When it comes to regex, it’s really easy to go wrong if you don’t use it right.

This is pretty much where the problem lies.

Now we need to look at the specific parameter information to make it a little clearer.

Here it is! Clrstack -p This command is used.

You can see that the ConverAgeMonth method takes two arguments, age and ageMonth.

Click the address of age or enter it manually! You can see the string content in the do address.

See this super long string, close to 2W in length…

Similarly looked at the other several, are exactly the same, can conclude that is the regular caused trouble.

After the subsequent adjustment of the content of this piece, there was no case of CPU bursting.

Write in the last

Although WinDbg feels very good to use, but the overall process is relatively complex, equivalent to offline analysis, can not be real-time observation and analysis.

There is still room for improvement in this area.