Author: Lianfu Hao, industry veteran computer technology expert, is currently the chief front-end architect of Agora. Successively served as Principal Engineer/Engineering Director (UTStarcom), SR. Architect (Intel), T4 Architect (YY), etc. He has designed and developed major projects such as telecom core network special operating system, high-performance TCP/IP protocol stack, and acoustic network SDK architecture reconstruction.

The introduction

This article is the first chapter of Dev for Dev (Developer for Developer) interactive innovation practice jointly initiated by Agora and THE RTC Developer community. It is also a real record of open source technology enthusiasts working on the front line. The situation encountered in the article is quite representative, special arrangement to share out for the reader.


Generally, there are three ways to implement application Hook in iOS:

1.Method Swizzling: Using OC (Objective C) Runtime features, dynamic change SEL (Method number) and IMP (Method implementation) corresponding relationship, to achieve the purpose of OC Method call process change, only applicable to dynamic OC Method;

2.Fishhook: FaceBook (now renamed Meta) provides a tool to dynamically modify linked mach-O files, using the principle of mach-O file loading, by modifying the pointer to lazy and non-lazy load table to achieve the effect of C function HOOK; Suitable for static C method;

3. The Cydia Substrate: Formerly known as Mobile Substrate, IT is a powerful framework for HOOK operations against OC methods, C functions, and function addresses, applicable to OC methods, C functions, and function addresses, as well as Android platforms.

Fishhook is an open source third-party framework from Meta that dynamically rebinds symbols of Mach-O binaries running on iOS/macOS on emulators and devices to dynamically modify C functions, often used for application debugging/tracing. The framework contains only two core files, Fishhook.c and fishhook.h, so it is very lightweight and popular in many enterprise applications. But there is a hidden problem in this famously lean open source project…

With the Beta release of iOS 15, many developers saw widespread app crashes — often caused by compatibility issues — and as we went through the troubleshooting process, it became clear that the problem wasn’t as simple as that. Initially, after the developers brought the issue back to Fishhook, various groups and individuals contributed several fixes to the PR, but none of them got to the root of the problem. After carefully analyzing the XNU source code for the iOS and macOS operating system kernel, we finally located the RootCause of the problem.

Tracing the origin of Fishhook Crash problem

In order to locate the problem, we usually try to reproduce the problem based on the existing error log. Through debugging trace, we found that in iOS 15 or macOS 12 environment, Fishhook code would crash 100% of the time when rebinding symbols. It was this crash that made the Fishhook integrated application unusable. Given the scale of the problem, some applications using the Fishhook project immediately removed the component after discovering the problem to mitigate its impact.

The root cause of fishhook’s collapse

The working principle of Fishhook requires Hook modification symbols to dynamically bind data segments. The default permissions of these data segments are generally read-only, so you need to add “write” permission to modify them. This is exactly the problem — during the investigation, we found a Bug in the code of adding “write” permission in Fishhook. The relevant codes are as follows:

There are three serious errors in this code. For ease of reading, we mark the relevant codes with red, green and blue boxes respectively. The specific explanation for these errors is as follows:

1. First of all, you can’t add a “write” segment based on the __DATA_CONST segment. Since iOS 14.5 or even earlier, you need to Hook a segment called __AUTH_CONST. So it is not enough to Hook a __DATA_CONST field;

2. Second, when getting the current VM PROT, the wrong address was passed, which should not be rebindings, because the address we want to write is indirect_symbol_bindings;

3. Finally, the C-O-W mechanism of XNU Kernel is different from that of Linux Kernel. For RO VM segment mapping, VM_PROT_COPY must be explicitly specified to increase the “write” permission. The XNU BSD mProtect system call does nothing of the sort. The XNU MACH key code logic is as follows:

The three errors in Fishhook code were combined to cause a “write” protection error ** during the modification of indirect_symbol_Bindings data, resulting in a Crash that affected the entire application system.

Best way to fix Fishhook crash

Now that we know where the Bug is, the idea to fix it is simply to do the right thing:

  • Modify original wrong address rebindings intoindirect_symbol_bindings;
  • Change the mProtect system call to usevm_protectSystem call, and addVM_PROT_COPYOptions;
  • The code is logically changed to only do “write” if the vm_protect system call succeeds.

Therefore, the core code of Bug repair is as follows:

Note that the VM_PROT_COPY option must be added when adding “write” permission to the data segment dynamically bound by symbols; otherwise, the write operation will fail. Second, add “only vm_PROTECT system call returns success” to the code logic to actually “write” these data segments, otherwise do nothing at all.

After rigorous testing and repeated validation, we completely fixed the Bug and submitted a PR to Fishhook on June 12, 2021 (github.com/facebook/fi… The issue was finally resolved by merging our fix patch and merging it into the main branch.

System warming (level) makes the “frozen” Bug visible

Readers will probably be wondering why this didn’t happen in versions prior to iOS 15 or macOS 12.

In fact, operating systems prior to iOS 15 or macOS 12 also have this defect. The protection of these data segments is not rigorous, and the “write” permission is not removed for data segments that should be “read only”. We found relevant evidence as follows:

In the above evidence clip, protection 3 indicates that the permission is “readable and writable”, so the Fishhook “write” operation does not have any problems in older versions of iOS/macOS. However, the new version of iOS 15/macOS 12 has made some changes to the protection of these data segments. It changed the value of protection in the evidence fragment from “read only” to “read only”. The evidence is as follows:

The protection value 1 in the code snippet above means’ read only ‘– as it should. But it was this “fix” that logically conflicted with the original “improper” configuration, and eventually the Fishhook Bug was exposed on newer iOS 15/macOS 12 systems, causing serious crashes. From a code point of view, it is clear that the Fishhook Bug has always existed, but in earlier versions of iOS and macOS there was no trigger condition, so the Bug remained hidden until the conditions were changed.

conclusion

In general, in the application development process, we often introduce third-party modules, especially underlying open source components that have a wide range of applications, in line with the principle of not reinventing the wheel and going live quickly and iterating. However, as IT infrastructure changes, the system environment will continue to add new features and discard old implementations over time. In this process, our applications will inevitably encounter the challenge of unavailability due to dependency issues. As business application developers, we must constantly improve the ability to trace problems to upstream components, uphold the original heart of developers, from open source, feedback open source.

The Dev for Dev column

Dev for Dev (Developer for Developer) is an interactive and innovative practice activity jointly initiated by Agora and RTC Developer community. Through technology sharing, exchange and collision, project construction and other forms from the perspective of engineers, it gathers the strength of developers, excavates and delivers the most valuable technical content and projects, and fully releases the creativity of technology.