As we all know, error logs in MySQL, slow query logs can help you locate problems quickly.

However, sometimes the log records too little information, or the information you are interested in is not written down, sometimes there are too many problems recorded, and a lot of invalid information interferes with your troubleshooting.

Therefore, this article introduces a new idea — probe technology, which can insert some probes into key nodes of the system to collect information without affecting MySQL operation and without breaking the field environment.

In theory, the probe can be inserted into any MySQL or Linux kernel function import or export, easily accessing parameters and other details with minimal resource loss and no loss once the probe is removed. Just like a doctor taking an X-ray for a patient, real-time observation of the patient’s internal conditions without affecting the patient’s health provides the basis and support for the analysis of the cause.

Part1 mechanism of the probe

The probes described in this article are like interrupt points when debugging a program, except that interrupt points are interactive and run in bytecode form in a kernel virtual machine (BPF).

A, abnormal

An exception is a mutation in the control flow in response to some change in the processor’s state. Understanding exceptions helps you understand probe technology. As the processor executes, as shown in the figure below, an important change occurs, which we call an event. Events may be directly related to the current instruction, such as missing page exceptions, arithmetic overflows, attempts to divide by 0. It may also be irrelevant as the timer generates a signal and I/O completes. In any case the handler makes an indirect procedure call through the exception table to a specialized exception handler.

Exceptions fall into four categories: interrupts, traps, faults, and terminations.

Interrupts occur asynchronously and are the result of signals from I/O (mouse, keyboard, network card, etc.) devices outside the processor. Hardware interrupts are asynchronous in the sense that they are not caused by any specific instruction. The remaining types of exceptions (traps, failures, terminations) occur synchronously as a result of executing the current instruction. We call this kind of instruction a fault instruction.

Traps are intentional exceptions that the programmer “actively” triggers, as if he or she had planted a trap in the code. The most common user of the trap is when a process makes a system call that traps from the user state into the kernel state via INT.

Faults are caused by error conditions and can be corrected by fault handlers. When a fault occurs, the processor transfers control to the fault handler. For example, when a page missing exception occurs, the fault handler can swap the corresponding page in the middle of the disk into the physical memory.

Termination is the result of an unrecoverable fatal error, usually some hardware error. When programmers debug code, they add breakpoints to make it stop where we want it to stop. The debugger can control the execution of the program at will, mainly by software interruption. The software breakpoint on X86 systems is the instruction INT 3. When the program executes to the INT 3 instruction, a software interrupt is raised. This is the trap mentioned above. Different from interactive breakpoints in Visual Studio and GDB, if the program automatically executes predefined and Handle records and statistics when trap occurs, it does not affect the normal operation of the program and achieves the purpose of observing MySQL.

Second, the probe

To capture what is going on in the program, we set some traps in the program and set up handlers, which we call probes. Some probes are predefined in the code, others are added dynamically at run time.

1. Static probe

Static probes are probes that are defined in the program and compiled into the program or kernel.

Static probes run only when they are enabled. Common static probes include the tracepoints and Userland Statically Defined Tracing (USDT) probes in the kernel. Tracepoints buries hooks in the code and invokes connected probes at run time.

It has two states: “on” (probe attached) and “off” (probe not attached).

When the trace point is in the “off” state, it has no effect and only adds a small time loss (check branch condition) and space loss. When the trace point is “open,” the connected probe is called each time the trace point is executed in the caller’s execution context. When the probe function completes, it returns to the caller. USDT is similar to tracepoint, but in user mode, insert DTRACE_PROBE() into the code.

2. Dynamic probe

Dynamic probes are probes that are not defined by the application and are added dynamically while the program is running.

Dynamic probe is similar to the exception handling mechanism. When the system generates an exception, it will jump to execute the corresponding Handle. The dynamic probe will insert some breakpoints at the entry and exit of the function, and the program will execute the corresponding Handle when it reaches the breakpoint, so as to observe the application program. The interrupt here refers to trap, which in X86 is the INT3 instruction.

KProbes are Linux kernel probes that can be used to monitor events in production systems. You can use it to solve performance bottlenecks, log specific events, track problems, and more.

KProbes can insert interrupt instructions into kernel code in real time, and while this may sound a little weird, KProbes actually does a lot of security, such as stop_machine to ensure that other cpus stop executing when they are being modified.

In fact, the biggest risk of Kprobes is to add probes to some functions that are called very frequently, such as network modules, where frequent interruptions may cause certain performance risks. KProbe needs to define pre-handler and post-handler. The pre-handler program is first executed when the detected instruction is to be executed. Similarly, post-handler is executed immediately after the probed instruction is executed.

Uprobes are dynamic probes that Linux provides in user mode and were incorporated into the Linux 3.5 kernel released in July 2012. Uprobes are similar to kprobes and are used in user mode.

Third, BPF

Berkeley Packet Filter (BPF), first developed in THE BSD operating system, is an industrial standard for TCP/IP Packet filtering and used by tcpdump.

It works in a somewhat special way: user-defined packet filtering expressions are injected into the kernel and run in BPF. This has the advantage of filtering in the kernel rather than copying packets to the user state, avoiding a large amount of data copied from the kernel state to the user state, so it has good performance.

Later came eBPF(EXTEND BPF), eBPF has its own language, users write their own program compiled through BPF call injected into the kernel OF the BPF virtual machine run, can safely access the kernel memory, it makes the kernel programmable. BPF is used by many performance tracking tools because running in the kernel does not require copying data into user space.

Part2 uses the probe to observe MySQL

Following the technical background above, I’ll look at bPFTrace using the trace tool, which is a front-end tool for performance analysis using BPF and is very easy to use, similar to awK. Since MySQL is running in user mode, only USDT and Uprobes can be used to track MySQL itself.

Use USDT to observe MySQL

Mysqld DTrace Probe Reference(DTrace is a dynamic tracing tool in Solaris) Bpftrace is the Linux version of DTrace) below is a script that logs the slow queries traced.

#! /usr/bin/bpftrace BEGIN { printf("Tracing mysqld queries slower than %d ms. Ctrl-C to end.\n", $1); printf("%-10s %-6s %6s %s\n", "TIME(ms)", "PID", "MS", "QUERY"); } usdt:/usr/sbin/mysqld:mysql:query__start { @query[tid] = str(arg0); @start[tid] = nsecs; } usdt:/usr/sbin/mysqld:mysql:query__done /@start[tid]/ { $dur = (nsecs - @start[tid]) / 1000000; if ($dur > $1) { printf("%-10u %-6d %6d %s\n", elapsed / 1000000, pid, $dur, @query[tid]); } delete(@query[tid]); delete(@start[tid]); }Copy the code

Read that BEGIN is the first time the script is executed, and print some hints. Usdt: / usr/sbin/mysqld: mysql: query__start is the probe give query__start this function, when the program execution here, record the first one parameter arg0 (query), the timestamp and the current time, This information is stored in a BPF map. When the query__done probe is executed, the current time is recorded, minus the start time (taken from BPF’s map), which is the time the query took, and printed if the threshold is exceeded. If you say MySQL has slow query logs, but the advantage is that you don’t need to restart MySQL, you can change the threshold in real time. Also, the USDT is removed in advanced versions of MySQL, and requires the compile-time parameter -denable_dtrace =1 to support the USDT.

Use uprobes to observe MySQL

Unlike USDT, which requires observation points to be set up in the code beforehand, Uprobes can be added dynamically in the program and can be inserted at the inlet and outlet of any function. The following shows the use of uprobes to trace dispatch_command and print slow query statements.

#! /usr/bin/bpftrace BEGIN { printf("Tracing mysqld queries slower than %d ms. Ctrl-C to end.\n", $1); printf("%-10s %-6s %6s %s\n", "TIME(ms)", "PID", "MS", "QUERY"); } uprobe:/usr/sbin/mysqld:*dispatch_command* { $COM_QUERY = 3; if (arg2 == $COM_QUERY) { @query[tid] = str(*arg1); @start[tid] = nsecs; } } uretprobe:/usr/sbin/mysqld:*dispatch_command* /@start[tid]/ { $dur = (nsecs - @start[tid]) / 1000000; if ($dur > $1) { printf("%-10u %-6d %6d %s\n", elapsed / 1000000, pid, $dur, @query[tid]); } delete(@query[tid]); delete(@start[tid]); }Copy the code

So let’s run this and use the parameter 10 to indicate that the slow query threshold is 10 ms.

sudo ./mysql_uprobe_slow.bt 10
Attaching 3 probes...
Tracing mysqld queries slower than 10 ms. Ctrl-C to end.
TIME(ms)   PID        MS QUERY
35976      1083      742 select employees.first_name, employees.last_name, titles.title 
93145      1083      224 select * from employees
125348     1083     1727 select * from salaries
Copy the code

Bpftrace also provides a histogram tool that records all the query time and finally prints a histogram of the time distribution.

#! /usr/bin/bpftrace BEGIN { printf("Tracing MySQL query... Hit Ctrl-C to end.\n"); } uprobe:/usr/sbin/mysqld:*dispatch_command* { @start[tid] = nsecs; } uretprobe:/usr/sbin/mysqld:*dispatch_command* /@start[tid]/ { @usecs = hist((nsecs - @start[tid]) / 1000000); delete(@start[tid]); } END { clear(@start); } sudo ./histo.bt Attaching 4 probes... Tracing MySQL query... Hit Ctrl-C to end. ^C @usecs: [0] 10 | @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ | [1] 0 | | 0 [2, 4) | | 0 [4, 8) | | [8, 16) 0 | | [16, 32) 0 | | [32, 64) 0 | | [64, 128) 0 | | [128, 256) 1 |@@@@@ | [256, 512) 1 |@@@@@ | [512, 1K) 0 | | [1K, 2K) 1 |@@@@@ |Copy the code

Part3 epilogue

In addition to USDT and UProbes shown in this paper, examples are relatively simple. Multiple probes can also be inserted at the same time to share information using BPF map to achieve stronger observation ability.

In addition to these two probes, you can also use TracePoints and KProbe to analyze the kernel state, such as network operation, disk I/O, of course, you need to be familiar with the program, otherwise you don’t know where to add these probes.