When the code is written, we run it to see what it looks like. Most of our development is done through dubugger in a single step or breakpoint. We use debugger all day long, but have you ever thought about how it works?

This article answers the following questions:

  • What are the underlying principles of how code works
  • Why the Debugger
  • What is the implementation principle of debugger
  • How to implement the Debugger client

How does the code work

The way code runs can be divided into direct execution and interpreted execution.

In case you haven’t noticed, executables run directly./ XXX, js files run node. / XXX, python files run python./ XXX.

Direct execution

CPU provides a set of instructions, based on this set of instructions can control the operation of the entire computer, machine language code is composed of these instructions and the corresponding operands, these machine code can run directly on the computer, that is, can be directly executed. The files they make up are called executable files.

The Executable file Format is different for different operating systems. On Windows, it is PE (Portable) Format, and on Linux and Unix, it is ELF (Executable Linkable Format) Format. It’s mash-O on MAC. They specify where different content (.text is code,.data. bass, etc., is data) should be placed in a file. But the real executable part is made up of machine instructions supplied by the CPU.

Compiled language will go through the stages of compilation, assembly and linking. Compilation is to convert source code into intermediate code composed of assembly language, assembly is to convert intermediate code into object code, and linking is to combine object code into executable files. This executable can be executed directly on the operating system. Because it is made up of the CPU’s machine instructions, it can directly control the CPU. So it can be executed directly with./ XXX.

Explain to perform

Compiled languages are generated executable files directly executed on the operating system, do not need to install the interpreter, and JS, Python and other interpreted languages code needs to run with the interpreter.

Why is there no need to generate machine code with an interpreter, and the CPU still doesn’t know the code?

That’s because the interpreter needs to be compiled into machine code, the CPU knows how to execute the interpreter, and the interpreter knows how to execute the higher-level script code, so the machine code interprets the interpreter, and the interpreter interprets the higher-level code, and that’s how scripting languages work. This applies to JS, Python, etc.

However, the interpreter has an extra layer, so it is sometimes compiled to machine code for direct execution. This is a JIT compiler. For example, JS engine is generally composed of parser, interpreter, JIT compiler and GC. Most codes are interpreted and executed by the interpreter, while hot codes are compiled into machine code by JIT compiler and executed directly on the operating system to improve performance.

Compiled to machine code and executed directly, or from the source code interpretation, the code is executed either way. Both have their own advantages, compilation speed, interpretation cross-platform. That’s how the code works.

Wang Yin said that the essence of a computer is an interpreter. That is, the CPU uses circuits to interpret machine code, and the interpreter uses machine code to interpret higher levels of script code, so the computer is essentially an interpreter.

Why the Debugger

We know that Turing’s complete language can explain any computable problem, so that both compiled and interpreted business logic can be described for all computable business logic.

We use different languages to describe the business logic, and then run it to see the effect, when the logic of the code is complicated, it is inevitable to make mistakes, we want to be able to run step by step or run to a point to stop, and then look at the variables in the environment at that time, execute a script. This is done by the Debugger.

There are probably a lot of junior programmers out there who just log using console.log, but logs don’t fully represent the environment at the time, and the best way to do that is in the debugger.

Coyote has said that the ability to use debugger is a clear distinction between nodeJS levels.

The principle of the debugger

We know that the debugger is essential for debugging programs, so how does it work?

Debugger for executable files

In fact, the CPU, the operating system in the design of the ability to support the Debugger (visible debugger importance), there are four registers in the CPU can do hard interrupt, the operating system provides a system call to do soft interrupt. This is the basis for debugger implementations in compiled languages.

interrupt

CPU will only continue to execute the next instruction, but the process of program operation inevitably to deal with some external messages, such as IO, network, exceptions and so on, so the design of the interrupt mechanism, CPU each execution of an instruction, will go to see the interrupt mark, whether it needs to interrupt. Just like the Event loop checks to see if it needs to render after each loop.

The INT instruction

The CPU supports the INT instruction to trigger interrupts. Interrupts are numbered, and different numbers have different handlers. The table that records the numbers and interrupt handlers is called the interrupt direction table. INT 3 (interrupt number 3) triggers the debugger, which is a convention.

So how does the executable use this # 3 interrupt to access the debugger? The debugger program will change the instruction content to INT 3, 0xCC, at the point where it needs to set the breakpoint, and it will stop. You can get the environment data for debugging.

0xcc (INT 3) interrupts the program, but how to restore execution? In fact, it is relatively simple to record the machine code that was changed at the time, and then switch back when the breakpoint needs to be released.

This is how the executable debugger works, and ultimately the CPU-supported interrupt mechanism.

Interrupt register

The debugger is implemented by modifying the machine code in memory, but sometimes the code cannot be modified, such as ROM. In this case, the CPU provides four interrupt registers (DR0-DR3). This is called a hard interrupt.

In short, soft interrupts for INT 3, and hard interrupts for interrupt registers, are two ways that executables can implement debugger.

Interpreted language debugger

Compiled languages, because they execute directly on top of the operating system, take advantage of CPU and operating system interrupt mechanisms and system calls to implement the Debugger. However, interpreted languages are their own implementation of code interpretation execution, so do not need that set, but the implementation idea is the same, is to insert a piece of code to break, support the environment data view and code execution, when the release of the breakpoint to continue to execute.

For example, javascript supports a debugger statement that breaks when the interpreter executes it.

The debugger for interpreted languages is relatively simple and does not need to know about CPU INT 3 interrupts.

The debugger client

Above we saw how the debugger is implemented to directly execute and interpret the executed code. We know how the code breaks, but what happens after it breaks? How do you expose environmental data, how do you execute external code?

This is where the Debugger client comes in.

For example, the V8 engine exposes the ability to set breakpoints, obtain environment information, and execute scripts through the socket. The socket transmits information in the format of V8 Debug Protocol.

Such as:

Set breakpoints:

{
    "seq":117."type":"request"."command":"setbreakpoint"."arguments": {"type":"function"."target":"f"
    }
Copy the code

Remove breakpoints:

{
    "seq":117."type":"request"."command":"clearbreakpoint"."arguments": {
        "type":"function"."breakpoint":1}}Copy the code

Continue to:

{
    "seq":117."type":"request"."command":"continue"
}
Copy the code

Execution code:

{
    "seq":117."type":"request"."command":"evaluate"."arguments": {"expression":"1 + 2"}}Copy the code

For those interested, check out the v8 Debug Protocol documentation for all protocols.

Based on these protocols, you can control the v8 debugger. All debugger implementations are connected to this protocol, such as chrome devtools, vscode debugger, and various ide debuggers.

Debugging of nodeJS code

Nodejs can be debugged by adding the –inspect option (or –inspect-brk, which breaks on the first line).

It will start a websocket server for the debugger, which can be used to debug nodejs code using vscode or chrome devtools (see nodejs debugger documentation).

➜ node --inspect test.js
Debugger listening on ws:/ / 127.0.0.1:9229 / db309268 Abe - 623 - a - 4 - b19a - c4407ed8998d
For help see https://nodejs.org/en/docs/inspector
Copy the code

The principle is to implement the V8 Debug Protocol.

If we make our own debugging tools and IDES, we need to dock with this protocol.

debugger adaptor protocol

The v8 debug protocol introduced above can implement js code debugging, so python, c#, etc must have their own debugging protocol, if you want to implement ide, it is too cumbersome. So a mid-tier protocol, DAP (Debugger Adaptor Protocol), emerged.

The debugger adaptor Protocol, as the name implies, is adaptive. One end ADAPTS various Debugger protocols, and the other provides unified protocols for the client. This is a good use of the adapter pattern.

conclusion

In this article we learned about the implementation principles of the Debugger and exposed debugging protocols.

First, we looked at the two ways code runs: direct execution and explain execution. Then we looked at why we need the Debugger.

After exploring the direct execution of the code through the INT 3 interrupt way to implement the Debugger and interpreted language implementation of the debugger itself.

The debugger capability is then exposed to the client through the socket, providing debugging protocols such as V8 Debug Protocol, which is implemented by various clients including Chrome DevTools and IDE.

However, it was too troublesome to implement each language once, so an adaptation layer protocol was developed to shield the differences between different protocols and provide a unified protocol interface for clients to use.

Hopefully this article has given you an understanding of how the debugger works and how to connect protocols if you want to implement the debugging tool. See why chrome devtools and vscode can debug nodejs code.