Concepts – What is MachO?

Apple developers must be familiar with it, especially those who like reverse, it is essential to study. In every.app package that can be installed, there is an executable file with the same name as the app. It might look like this :(if you encounter a MachO file that appears white, the current user does not have executable permission on it)

lipo AlipayWallet -thin armv7 -output alipayArmv7
lipo alipayArm64 alipayArmv7 -create -output AlipayWalletNew

Structure – What is the internal structure of MachO?

I. Briefly speaking, it includes the following four components:

  • Header.
    • This command is used to quickly confirm the information about the file, such as the CPU type and file type
  • LoadCommands LoadCommands LoadCommands
    • To tell loader how to set and load binary data
  • Data segment
    • Store data: code, character constants, classes, methods, etc
    • You can have multiple segments, each segment can have zero to multiple sections. Each segment has a virtual address mapped to the process’s address space
  • Loader Info (link information)
    • At the end of a complete user-level MachO file is a set of linked information. It contains symbol tables, string tables, etc. that dynamic loaders use to link executable files or dependencies

Verify the MachO file structure for this example with MachOView:

Two: Explore one by one:

1.Header

In addition to MachOView, you can also use the otool command to view the MachO file information. Let’s analyze the Header first: otool -h AlipayWallet:

  • Magic: The number of Mags in the MachO file to determine whether it is 64-bit (0xFeedFacf) or 32-bit (0xFeedFace).
  • Cputype and CupSubType indicate the CPU type and its subtypes. In this example, cpuType is 12(c) and cupSubtype is 9, and 16777228(100000c) and 0 are defined as follows:
#define CPU_TYPE_ARM((cpu_type_t) 12)
#define CPU_SUBTYPE_ARM_V7((cpu_subtype_t) 9)
#define CPU_TYPE_ARM64((cpu_type_t) 16777228)
#define CPU_SUBTYPE_AR64M_ALL((cpu_subtype_t) 0)
Copy the code
  • Next is fileType, 2, for executable files:
#define MH_EXECUTE 0x2 /* demand paged executable file */
Copy the code
  • NCMDS refers to the number of load Commands, 75 in this example, numbered 0-74
  • Sizeofcmds represents the total sizeof 75 Load Commands in bytes. The Load Commands area is immediately followed by the header area
  • The last flags identifies the functions supported by binary files, which are mainly related to system loading and linking.

Here’s the official apple documentation for mach_header :(see loader.h for more details)

2.LoadCommands

Otool -l alipayArm64 -l alipayArm64 -l alipayArm64 -l alipayArm64 -l alipayArm64

Load command1 CMD LC_SEGMENT_64 // CMD is loadcommandLC_SEGMENT_64 maps this 64-bit segment to the process address space, i.e. Cmdsize 712 // for loadcommandSegname __TEXT // 16-byte segment name __TEXT VMAddr 0x0000000100000000 // Virtual memory start address of the segment VMSize 0x00000000036a4000 // Virtual memory size of the segment Fileoff 0 // Segment offset in the file filesize 57294848 // Segment size in the file maxProt 0x00000005 // Maximum memory protection required by segment page (4=r,2=w,1=x) initProt 0x00000005 // Segment page initial memory protection nSECts 8 // Number of sections contained in segment flags 0x0 // Other miscellaneous flag bits Section sectName __text // The first is __text, which is the main program code Segname __TEXT // The name of the segment to which the section belongs, the first is __TEXT ADDR 0x0000000100006110 // The section is at the beginning of memory, 0x100006110 size 0x000000000358a268 // size Size of section, 0x358A268 offset 24848 // 24848 0x6110 align 2^4 (16) 16 reloff 0 // File offset of the relocation entry 0 nreloc 0 // Number of entries to be relocated 0 flags 0x80000400 // Contains sectionstype// attributes reserved1 0 Reserved2 0 //... Keep usingCopy the code

With the comments finished, I have prepared the official apple documentation for you :(see loader.h for more detailed definitions)

  • LC_SEGMENT_64: Maps the segment (64-bit) to the process address space
  • LC_DYLD_INFO_ONLY: Loads dynamic link library information (redirection address, weak reference binding, lazy load binding, open function offset, etc.)
  • LC_SYMTAB: Load symbol table address
  • LC_DYSYMTAB: Loads dynamic symbol table addresses
  • LC_LOAD_DYLINKER: load the dynamically loaded library, so you can see that the example uses /usr/lib/dyld
  • LC_UUID: identifies the unique identifier of the file, which is also included in crash parsing to check for a match between the DYSM file and the crash file
  • LC_VERSION_MIN_MACOSX/LC_VERSION_MIN_IPHONEOS: Determines the minimum OS version required for binary files
  • LC_SOURCE_VERSION: The source code version used to build the binary
  • LC_MAIN: sets the entry address and stack size of the main thread of the program
  • LC_ENCRYPTION_INFO_64: obtains encryption information
  • LC_LOAD_DYLIB: loads additional dynamic libraries
  • LC_FUNCTION_STARTS: Defines a table of function start addresses, making it easy for debuggers and other programs to see if an address is in the function
  • LC_DATA_IN_CODE: A non-directive table defined in a code segment
  • LC_CODE_SIGNATURE: obtains application signature information

3.Data, link information

If the main purpose of the first two parts is to let the Kern kernel know how to read MachO files, specify the dynamic linker (dyly) for MachO files to complete the subsequent dynamic library loading, and then set up the program entry and other information before the program starts, then the Data and link information sections, When the program is running, it provides a real physical address for every instruction that is mapped to virtual memory. The detailed process will be discussed in a separate article later.

Harvest — What can be done to familiarize yourself with MachO

Understanding the mechanics is important. Understanding the structure of the MachO format and the loading and running of the MachO format will not only help us understand the app executable startup process for MacOS and iOS, but it can also do more than:

  • Bitcode analysis
  • Crash symbolization
  • Symbol module lookup
  • Non-oc function switch
  • Packages support architecture analysis
  • Constant string analysis
  • Process start speed optimization
  • Learn classic data structures

Hope you have learned something! In the next part we will focus on the dynamic loading process around dyld, see you~


Reference article:

  • Mach-o: File format analysis
  • Mach-o file format and program from load to execution process