Smoothly set breakpoints for the binary library

1. Introduction

In the current iOS component-based development, binarization of components has become a relatively mainstream scheme to improve efficiency in the industry, and with the binarization, how to debug its breakpoint, there are many schemes on the Internet. This paper will introduce and analyze various schemes known or unpopular on the Internet at present, and try to extract some key points, hoping that students who read this article can have a relatively clear understanding of how to debug binary components friendly and put forward more concise schemes to solve existing problems.

2. The DWARF file

When it comes to program breakpoint debugging, modern program breakpoint debugging often requires three roles: the executable, the debugger, and the debug information file. The debug files are the ones we have less exposure to, so we’ll start with a brief introduction to debug files, which we’ll cover in DWARF, the more dominant format.

2.1 a brief introduction

The DWARF format is a concise representation of the relationship between executable programs and source code that debuggers can use to handle this relationship quite effectively.

Most modern programming languages are block-structured, with each entity (such as a class definition or function definition) contained within another entity, so it is natural for the compiler to represent the program internally as a tree. The DWARF format follows this model because it is also block structured. Every descriptive entity DIE in DWARF (except for the topmost DIE that describes the source file) is contained in a parent DIE and may contain child dies. If a DIE contains more than one child DIE, they are all related siblings. So DWARF is also like the compiler’s internal tree structure.

DWARF is usually associated with ELF object files, but it is independent of ELF. It can be used with any other object file format (such as MachO). All that is required is to identify the different sections in the object file or executable that make up the DWARF formatted data.

The basic description in DWARF is DIE (Debugging Information Entry), which has a label to specify what is being described and an attribute list to fill in the details to further describe the entity. Attributes may contain various values: constants (such as the function name), variables (such as the function’s starting address), or references to another DIE (such as the function’s return value type).

2.2 DWARF and MachO

DWARF can be used with a variety of object file formats, but its use in the MachO file format is described here. Like ELF, in MachO DWARF uses almost the same section names for its intent, including:

.debug_abbrev abbreviation for the.debug_info section. Debug_aranges A mapping between the memory address and the compilation unit. Debug_frame stack frame information Debug_line line information. Debug_loc location description. Debug_macinfo macro description. Debug_pubnames A lookup table for global objects and functions . Debug_ranges DIE Information about the address ranges referenced. Debug_str List of strings used by. Debug_infoCopy the code

The data related to the executable and source paths is stored in Compile Unit DIE in the.debug_info section, labeled DW_TAG_compile_unit, which we can remember for now.

Since there are many carriers of executable code in MachO format files, and the corresponding debugging information of each carrier has different storage modes, we will describe them separately next

Executable file

The most common, of course, is an executable whose debugging information is stored in a separate DWARF format with the suffix DSYM.
Static library

Debugging information for static library files is stored directly in the sections of the object files rather than in individual files (eventually incorporated by linking into the DWARF files of the host project).
The dynamic library

Dynamic library files, like executable files, generate a separate DSYM file to store DWARF data.

The debug information for these three different object files is kept this way because each debug information file corresponds to a run-time image.

3. Implementation scheme and principle

Breakpoints can be located to source files by combinations of DW_AT_comp_dir and DW_AT_name of DW_TAG_compile_unit in DWARF. Here we just think of it as simple path concatenation) to get its path to find. So what we need to do is to map it to source so that users can debug binary files as well as source debugging, and even be almost completely transparent about it.

Now let’s start with the three known binary breakpoint debugging schemes:

Rebuild the source code path directly
Use LLDB source-map mapping
Clang -fdebug-prefix-map

3.1 Directly rebuild the source code path

The solution is to literally merge DW_AT_comp_dir and DW_AT_name from DW_TAG_compile_unit in DWARF (only DW_AT_name is a relative path) to get the source code path. Rebuild the path on the debug machine according to compile dir in this path and download the source code into it. If you just use it directly, it’s a relatively stupid solution.

3.2 Mapping through LLDB source-map

In fact, this method has been quoted by other big guys on the Internet, but in my opinion, the operation is relatively cumbersome and not friendly enough for users. There are many operation steps and it cannot be transparent to users. Therefore, here is my complete scheme.

This scheme is the most complex of the three schemes, but also the most flexible. It utilizes the Settings set target.source-map command of LLDB, which is equivalent to the set substitute-path command of GDB. Using this command, you can set up multiple path replacement rules. When the LLDB searches for source code, it matches the rules in the order defined by the rules. The matching strategy is prefix matching.

// https://github.com/llvm-mirror/lldb
: / / node (commit: d01083a850f577b85501a0902b52fd0930de72c7)
// PathMappingList.cpp
bool PathMappingList::RemapPath(llvm::StringRef path, std::string &new_path) const {
  if (m_pairs.empty() || path.empty())
    return false;
  LazyBool path_is_relative = eLazyBoolCalculate;
  for (const auto &it : m_pairs) {
    auto prefix = it.first.GetStringRef(a);if(! path.consume_front(prefix)) {
      // Relative paths won't have a leading "./" in them unless "." is the
      // only thing in the relative path so we need to work around "."
      // carefully.
      if(prefix ! =".")
        continue;
      // We need to figure out if the "path" argument is relative. If it is,
      // then we should remap, else skip this entry.
      if (path_is_relative == eLazyBoolCalculate) {
        path_is_relative =
            FileSpec(path).IsRelative()? eLazyBoolYes : eLazyBoolNo; }if(! path_is_relative)continue;
    }
    FileSpec remapped(it.second.GetStringRef());
    remapped.AppendPathComponent(path);
    new_path = remapped.GetPath(a);return true;
  }
  return false;
}
Copy the code

We know this matching rule, so instead of creating a replacement rule for each source file, we can just add a replacement rule for compile Dir. Of course, knowing this command is not enough, because we want to be able to achieve the same effect as source code debugging breakpoint debugging, even before your Xcode run breakpoint, we do the following steps:

Since you want to break before run, the source code itself should have been downloaded locally through your own design, integrated into the current project, and can be broken with your mouse (such as through a Pod that is not compiled).
LLDB Init File to automatically run a script at the start of an LLDB session. The contents of the script are not shown here, but you can do it yourself:
1. Your own scheme will need to map the library path (the original path is known by default) and source code to make the source map.
2. Scan for pending breakpoints for the current LLDB session (in this case, breakpoints were not resolved before you ran) and save them.
3. All pending breakpoints obtained in the previous step are iterated and re-added (which will activate pending breakpoints due to the operations in step 1), and the breakpoints before run are restored.
This is the same as normal source code breakpoints.

This scheme is called the most flexible because you can map any path you want.

3.3 Clang fdebug prefix — the map

This scheme is actually relatively simple, is an extension of the first scheme, as far as I know baidu EasyBox is also in use. This is done using Clang’s -fdebug-prefix-map option, which allows you to replace the path directly at compile time and then rebuild the path for the source download (the path here is a little friendlier).

3.4 Special treatment of dynamic library

As we have seen before, the DWARF files of the dynamic library are saved separately. According to the principle that three roles are indispensable, we must design the DWARF files to save each dynamic library in advance, download them together when debugging, and use add-dsym command to add them through LLDB Init File.

3.5 the Debug and Release

All of the above discussion applies to Debug and Release binaries, but it is important to note that Release code is optimized so that some strange things can happen during breakpoint debugging (not actually tested and encountered). Haha 😂).

4. Conclusion

In this paper, some places didn’t point out my details are one has brought, because there are many online more detailed articles have already said, but I will where need to be aware of the detailed description and points out, because the purpose is to teach them to fish, all kinds of small holes on the details you need to feel better, these solutions how to choice and use of intention to also all by yourself, Due to the limited ability of the author, there will inevitably be some mistakes in the article. If you find anything, please contact me for correction. Thank you very much for reading!

The original address: www.notion.so/nakahira/dd…

5. Reference materials

Sourceware.org/gdb/current…
Lldb.llvm.org/use/tutoria…
dwarfstd.org/