Welcome to collect. Come back here when you run into a really nasty Bug…

Programming is a very elegant job, and programmers, like artists, are supposed to keep producing beautiful and creative work. However, when I started working, I found that most programmers around me were bogged down in trying to catch up on iteration exits and fix urgent bugs. It is common to see the following scenarios:

  • Fixed a Bug, but introduced a more deadly one
  • I kept modifying the code, restarting the server, and the Bug was still not fixed
  • After the patch was put up, the Bug still existed and was located for a long time. Finally, it was found that the patch was put in the wrong place

While we should do our best to ensure that the code we submit has as few bugs as possible, when bugs do appear, we tend to panic and debug like chickens without their heads.

Until recently, I came across this book — The Nine Methods of Debugging, which is the first book about debugging methodology that I have ever seen. Recalling The Times when I was at a loss to know what to do when I was solving bugs, I ran around like a chicken with its head cut off and finally solved the Bug like a blind cat meeting a dead mouse, I bought this book without hesitation.

This is a system debugging theory.

Hopefully this set of debugging knowledge introduced in this article will help you in the future when you encounter a Bug, thinking more clearly, feeling more calm, leisurely to solve the problem.

Debugging rules overview

This methodology can be summed up in a sentence: debugging is a science, any do not understand the principle of the operation is playing rogue.

This statement can be put into practice in two steps:

Preparations:

  • If you want to debug the system, you must first understand it.
  • Before you start debugging, check the “plug” and don’t mess around with some simple problems.
  • Once you have determined that the “plug” is ok, you need to reproduce the Bug.

Find causes and fix bugs

  • Don’t assume your guess is right. Observe and test your guess.
  • A divide-and-conquer approach is adopted to locate multi-module system problems and repair problems caused by multiple sub-problems.
  • Change one place at a time
  • Keep a record of changes
  • If forward debugging doesn’t work, try reverse debugging
  • Last resort: Ask for help

Each of these principles is described in detail below.

Understand the system

Understanding the system is a prerequisite for locating bugs.

Before I met often when an exception is thrown, the exception information posted on Internet, then put online solutions to perform again, sometimes work, prosperous, but most of the time, it is not work, the reason is very simple, perhaps the other side of the JDK version is not the same as you, maybe you two just error message is the same, but throw exceptions for different reasons, More likely, the other side’s solution won’t work anyway. This is why understanding the system is a prerequisite for Bug locating. If you are trying to locate a JDK exception, you should at least understand Java SE. If you can understand the JVM garbage collection principle, class loading mechanism, naturally better. If you’re trying to figure out why a payment system isn’t sending money to a customer’s account, you need to understand the payment process.

Anyway, before we start debugging, it’s important to make it clear that debugging is a science, not a science of probability, and you need to understand the whole system to be able to debug. Here are some ways to help you understand the system:

  • Reading manuals: Read requirements design documents, product documents, user manuals, etc
  • Read every detail of the manual: the solution to a Bug may be in one of the paragraphs
  • Master the basics: You always end up looking at the source code, or at least knowing the appropriate programming language
  • Understand the workflow: Look at it as a whole, not as a frog in the well

It can be said that the ultimate goal of understanding the system is to understand the workflow, so that you can see from the perspective of the whole, otherwise you are like a frog in a well, thinking that the problem must be in your module, but in fact the problem is in a module upstream. This point, and “Programmer thinking training” mentioned, “experts think from the whole” point of view.

Check the plug

Of course, this is not to ask if the plug is in, plug here refers to all the basic requirements to make the product work. These basic requirements are often taken for granted, but sometimes they are not. For example, you need to configure a whitelist in the configuration interface, otherwise the upstream request will be rejected, so when there is a problem, you should first check whether the whitelist is configured, because the other party may be a novice. Even if something goes wrong that doesn’t make sense, go to the directory where the software is running, such as Tomcat’s Webapps directory, to see if the package is complete. When you replace the new code and find that the Bug still exists, you might as well go up and see if what is running is still the old code.

Put this rule in the third from bottom in the book, I put it in a second here, the reason is very simple, usually we find bugs or others with yourself when you say, there are bugs, heart will panic, will be nervous, so please check the plug, relieve the strain on your mood, but also force you to from the perspective of the overall observation, It’s not limited to one small module.

Repeat failure

This is almost a subconscious action, and even if you haven’t read the book before, when you encounter a Bug, you will try to reproduce it for a simple reason:

  • Replaying the failure allows you to look at the context of the failure to find the cause of the failure
  • Reproducing the failure allows you to determine if you have fixed the problem

Some problems are easy to reproduce, while others only arise in the context of certain inputs. Most of the mistakes we make are in the way we reproduce them, and the author proposes two principles for reproduction:

  • Simulate the conditions under which the failure occurs, but do not simulate the mechanism of the failure, because the mechanism you think caused the failure is probably wrong. For example, if you think it’s a bug caused by high concurrency, you simulate a high concurrency environment, the problem reappears, and then you say it’s caused by high concurrency, but it’s just that high concurrency increases the chance of the problem occurring.
  • It only affects how often errors occur, not how they occur. A high concurrency environment can be used to increase the frequency of errors, but you need to find the log information when the problem reappears, and then locate the cause of the problem, rather than assume that it is caused by concurrency.

Don’t guess, observe

Now that we can reproduce the Bug, my gut tells me to do a string encoding conversion in that place. Wait, before making this arbitrary attempt, let’s read what Sherlock Holmes says:

People who are subjective always distort facts to fit theories rather than theories to explain facts.

Guess only to determine the focus of your search, but look to confirm your guess before starting the fix. So before we change the code, let’s look at the log when the error occurs, debug the code, open the source code if necessary, dig deeper, make sure it’s really a string encoding problem, and then change the code.

Some people say, well, I’ll just change the code, and then I’ll see if it’s a string encoding problem. Of course not. You know, a problem can be caused by code in multiple places. Maybe once you solve this string encoding problem, you need to solve another problem to solve the whole problem. If you don’t observe the changes before they are made, the change will be considered meaningless, and the whole debugging process will be dead.

Remember, debugging is a science, and anything you don’t understand is a rogue.

Divide and conquer

The system is usually composed of many modules, which requires us to check the logs of many modules to determine the cause of the problem. Especially with popular microservices frameworks, when a transaction goes wrong, you need to go to many service machines to find logs.

However, if your business execution is linear, that is, if node A fails, then everything after node A will fail, then you can use dichotomy to locate the business. Remember, between 1 and 100, you only need to guess a number seven times at most. By using dichotomy, you will gradually narrow down the field of suspicion and eventually get to the root of the problem.

Of course, if the problem is caused by multiple sub-problems, then remember, find one, kill one, this is called divide and conquer.

Change one place at a time

You observe that your changes will work, but if they don’t work after you change the code, change them immediately before you introduce a new Bug.

Keep a record of changes

Write down all of your debugging operations and results in order, so that you can go back and reflect on what you did wrong when you made so many changes and still didn’t solve the problem.

Reverse debugging method

The debugging rules above all start with the problem and look for the wrong code. But sometimes the opposite might be better. You can find the last version that worked, compare the current version with the current version, and analyze the changed code to see which piece of code caused the problem.

Help others

Sometimes the problem is urgent, this time might as well ask an expert, as “Programmer’s Mind training” mentioned, experts rely on intuition, they will often point out the problem to the point. If you have some understanding of the system, check out the vendor’s website, Google, StackOverflow, etc. In the process of asking for help, you only need to describe the symptoms of the problem, and if the person has not asked for it, then don’t talk about your theory, so as not to put the person into their own mindset. And by describing the problem to someone else, you might be inspired yourself.

conclusion

The above is a set of debugging methodology summarized after I read the “debugging nine methods” this book, of course, or suggest that we look at the original work, maybe there will be new harvest. However, the book lists a large number of examples, so many that I feel a little redundant. I suggest that you read the beginning of each chapter and the summary at the end of each chapter first. If you don’t understand after reading, you can read the cases in the middle of each chapter.

While most of the examples in this book are about engineering, some of the ideas are transferable to life. For example, if a couple has an argument, it may seem like the husband won’t do the dishes, but if you look at the bigger picture, you’ll see that it’s really because the husband didn’t buy his wife a gift for Valentine’s Day.

Looking back on these rules, we actually use them from time to time in our work and life, but we have not had a systematic theoretical system before. After mastering the debugging rules introduced in the book, we will be more orderly and leisurely in locating the root causes of errors in the future.

Finally, it must be said that professionals should ensure the quality of software as much as possible at the time of development, rather than constantly rely on debugging to fix defects. Good and adequate unit testing is the key to code quality. When you find a Bug, it means that your test case is incomplete. After fixing the Bug, make up the test case in time.

Refer to the content

  • Nine Methods of Debugging
  • The Training of Programmer’s Mind
  • Debugging – Wikipedia