Welcome to follow the official wechat account: FSA Full stack action 👋

Mach-o is the file format of applications on iOS/macOS. Understanding the file format of Mach-O is helpful for the subsequent static analysis and dynamic debugging of applications.

Analysis of theMach-OFile tools

otool

You can use the command line interface (CLI) to view specific parameters

man otool
Copy the code
. -h Display the Mach header. -l Display the load commands. ...Copy the code

-h displays the Header

Otool -h Mach -o fileCopy the code

-l You can check load commands. If you are interested, you can print it yourself

MachOView

Mach-o is a free open source file analysis tool

  • GitHub link: gdbinit/MachOView
  • Blue Cloud link: machoview.dmg

010 Editor

  • Link: 010 Editor
  • MachO template: machotemplate.bt

The template for 010 Editor is a powerful, paid product, but third-party templates are required to analyze ARM64-based Mach-O programs.

Go to Templates -> View Installed Templates

Click the Add button, select the downloaded machoTemplate.bt, configure the Name and Category, and click OK

Go back to the program, insert Mach-O into the 010 Editor, and from the Templates menu select the template you just clicked

The analysis results are shown in the figure

Mach-OThe structure of the

As shown in the figure above, the Mach-O file consists of three parts

Part of the role
The Mach – O head (Header) Save theCPUBasic information about architecture, size endian, file type, number of load commands, etc
Load command (Load Commands) Specifies the logical structure of the file and the layout of the file in virtual memory
The data block (Data) Load CommandsDefined in theSegmentRaw data of

Header

The Mach-O Header stores basic information such as CPU architecture, size end order, file type, and number of loading commands. It is used to verify the validity of a Mach-O file and determine the operating environment of the file.

To ⌘ the header definition, press the shortcut key to Xcode + Shift + O and type Mach-o /loader.h

Cpus with 32-bit and 64-bit architectures use mach_header and mach_header_64 structures to describe the Mach-O header, respectively. In this article, the contents are mainly 64-bit, as defined below:

/* * The 64-bit mach header appears at the very beginning of object files for * 64-bit architectures. */
struct mach_header_64 {
    uint32_t    magic;      /* mach magic number identifier */
    cpu_type_t  cputype;    /* cpu specifier */
    cpu_subtype_t  cpusubtype; /* machine specifier */
    uint32_t    filetype;   /* type of file */
    uint32_t    ncmds;      /* number of load commands */
    uint32_t    sizeofcmds; /* the size of all the load commands */
    uint32_t    flags;      /* flags */
    uint32_t    reserved;   /* reserved */
};

/* Constant for the magic field of the mach_header_64 (64-bit architectures) */
#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */
Copy the code
field role
magic Magic number (feature field), used to identify whether the current device is big-endian or small-endian.

Due to theiOSIt’s a little endian, so it’s constantMH_MAGIC_64, the fixed value is0xfeedfacf
cputype logoCPUSchema of typecpu_type_t, which is defined asmach/machine.h
cpusubtype The specificCPUArchitecture, which distinguishes between different versions of the processor, of typecpusubtype, which is defined asmach/machine.h
filetype Mach-OFile types (e.g., executable files, library files, etc.) can be found in themach-o/loader.hFind the specific definition and value in.

Common areMH_OBJECT(Intermediate target file),MH_EXECUTE(Executable file),MH_DYLIB(Dynamic link library),MH_DYLINKER(Dynamic linker)
ncmds Load CommandsThe number of
sizeofcmds Load CommandsThe total size of bytes
flags Some identifying information is available inmach-o/loader.hFind the specific definition and value in.

Among them#define MH_PIE 0x200000Note that only files of typeMH_EXECUTE“, it indicates that the function is enabledASLRTo increase program security.
reserved System reserved field

ASLR: Address Space Layout Randomization ASLR: Address Space Layout Randomization ASLR: Address Space Layout Randomization ASLR: Address Space Layout Randomization ASLR: Address Space Layout Randomization ASLR: Address Space Layout Randomization ASLR: Address Space Layout Randomization

Load Commands

Following the Header, Load Commands specify the logical structure of the file and its layout in virtual memory, explicitly telling the loader how to handle binary data. Some commands are handled by the kernel and some by the dynamic linker (DYLD).

Load Commands can be regarded as a collection of multiple Commands, each of which has a constant type CMD prefixed with LC_, such as LC_SEGMENT.

In the header file Mach-o /loader.h, you can see the definition of each command. Each command has its own independent structure, but the first two fields of the structure are fixed as CMD and cmdsize

struct load_command {
    uint32_t cmd;        /* type of load command */
    uint32_t cmdsize;    /* total size of command in bytes */
};
Copy the code
field role
cmd The currentLoad CommandsType, such asLC_SEGMENT
cmdsize The currentLoad CommandsTo ensure that it can be correctly parsed

Depending on the command type (CMD), the kernel uses different functions for parsing.

Several important command types are described below.

LC_SEGMENT

LC_SEGMENT and LC_SEGMENT_64 are segment load commands. Each segment defines a virtual memory region that the dynamic linker maps to the process address space. Its structure is defined as follows:

struct segment_command_64 { /* for 64-bit architectures */
    uint32_t	cmd;		/* LC_SEGMENT_64 */
    uint32_t	cmdsize;	/* includes sizeof section_64 structs */
    char	segname[16];	/* segment name */
    uint64_t	vmaddr;		/* memory address of this segment */
    uint64_t	vmsize;		/* memory size of this segment */
    uint64_t	fileoff;	/* file offset of this segment */
    uint64_t	filesize;	/* amount to map from the file */
    vm_prot_t	maxprot;	/* maximum VM protection */
    vm_prot_t	initprot;	/* initial VM protection */
    uint32_t	nsects;		/* number of sections in segment */
    uint32_t	flags;		/* flags */
};
Copy the code
field describe
cmd The currentcommandThe type of
cmdsize The currentcommandThe size of the
segname Segment name, 16 bytes
vmaddr Segment virtual memory address
vmsize Segment virtual memory size
fileoff The offset of the segment in the file
filesize The size of the segment in the file
maxprot Maximum memory protection level for segment pages
initprot Initial memory protection level for segment pages
nsects A segment contains the number of sections. A segment can contain zero or more sections
flags Section flag information (SG_HIGHVM,SG_FVMLIBEtc.)

The system loads contents with filesize size from fileoff to virtual memory VMADDR with vmsize. The permissions of segment pages are initialized by initprot. The permissions can be changed but cannot exceed the maxprot value.

The four sections in the figure above work as follows:

Period of describe
__PAGEZERO A static linker is created__PAGEZEROAs the first segment of the executable, the location and size of the segment in virtual memory are0, cannot read, write, or execute, used to handle null Pointers.
__TEXT Contains executable code and other read-only data. The static linker sets the virtual memory permissions for this segment to be readable and executable, and the process is allowed to execute the code, but not modify it.
__DATA Contains data that will be changed. The static linker sets the virtual memory permissions for this segment to read and write.
__LINKEDIT Contains the raw data of the dynamic link library, such as symbols, strings, and relocation table entries.

64 bit section structure definitions:

struct section_64 { /* for 64-bit architectures */
    char	sectname[16];	/* name of this section */
    char	segname[16];	/* segment this section goes in */
    uint64_t	addr;		/* memory address of this section */
    uint64_t	size;		/* size in bytes of this section */
    uint32_t	offset;		/* file offset of this section */
    uint32_t	align;		/* section alignment (power of 2) */
    uint32_t	reloff;		/* file offset of relocation entries */
    uint32_t	nreloc;		/* number of relocation entries */
    uint32_t	flags;		/* flags (section type and attributes)*/
    uint32_t	reserved1;	/* reserved (for offset or index) */
    uint32_t	reserved2;	/* reserved (for count or sizeof) */
    uint32_t	reserved3;	/* reserved */
};
Copy the code
Period of describe
sectname Section name, occupy16bytes
segname Section directs the segment name, accounting16bytes
addr The starting location of the section in memory
size The size of memory occupied by the section
offset Section to the offset address of the file
align The byte alignment size of the section
reloff Relocation entry file offset
nreloc Number of entries to be repositioned
flags Section type and properties
reserved1/2/3 System reserved field

LC_LOAD_DYLIB

LC_LOAD_DYLIB points to the loading information of the program dependent libraries, which can be viewed using MachOView

The LC_LOAD_DYLIB structure is defined as dylib_command

struct dylib {
    union lc_str  name;			/* library's path name */
    uint32_t timestamp;			/* library's build time stamp */
    uint32_t current_version;		/* library's current version number */
    uint32_t compatibility_version;	/* library's compatibility vers number*/
};

struct dylib_command {
    uint32_t		cmd;		/* LC_ID_DYLIB, LC_LOAD_{,WEAK_}DYLIB, LC_REEXPORT_DYLIB */
    uint32_t		cmdsize;	/* includes pathname string */
    struct dylib	dylib;		/* the library identification */
};
Copy the code
field describe
name The full path to the dependent library. The dynamic linker uses this path for dynamic library loading
timestamp Dependent library build time stamp
current_version Current version number
compatibility_version Compatible version number

The structure of LC_LOAD_WEAK_DYLIB is also dylib_command, the difference is that the declared dependency library is optional, that is, the lack of declared dependency library will not affect the operation of the main program, while LC_LOAD_DYLIB declared dependency library if not found, the loader will give up and end the process.

You can use otool to see which libraries are available

otool -arch arm64 -L LXFProtocolTool_Example LXFProtocolTool_Example: / System/Library/Frameworks/Accelerate framework/Accelerate (compatibility version 1.0.0, The current version 4.0.0) @ rpath/Alamofire framework/Alamofire (compatibility version 1.0.0, /usr/lib/libobjc.a. dylib (Compatibility version 1.0.0, The current version 228.0.0)/usr/lib/swift/libswiftCoreMIDI dylib (compatibility version 1.0.0, current version 5.0.0, weak) ...Copy the code

In addition to System paths like /System/Library/ and /usr/lib, you may encounter paths like @rpath and @executable_path

The path describe
@executable_path Refers to the directory of executable files
@rpath byLC_RPATHLoad specified specified,iOSOn is usually the application itselfframeworkFile, default is:@executable_path/Framework

These paths can be modified using the install_name_tool provided on MacOS. Note: This is a must for injecting dynamic libraries on unjailbroken platforms!

#Modify the dependent library path
install_name_tool -change @rpath/Alamofire.framework/Alamofire @executable_path/Alamofire.framework/Alamofire LXFProtocolTool_Example
Copy the code

Universal binary

The Universal Binary format file (also known as the fat Binary) is actually a package of mach-O files of different architectures, and a fat_header structure is added at the beginning of the file to indicate the supported schema and offset address information, as shown in the following figure:

The definition of common binaries can be seen in the header file Mach-o /fat.h:

#define FAT_MAGIC    0xcafebabe
#define FAT_CIGAM    0xbebafeca /* NXSwapLong(FAT_MAGIC) */

struct fat_header {
    uint32_t magic;     /* FAT_MAGIC or FAT_MAGIC_64 */
    uint32_t nfat_arch; /* number of structs that follow */
};
Copy the code
field role
magic Magic number (characteristic field), which is defined as constantFAT_MAGIC, the fixed value is0xcafebabe
nfat_arch logoMach-ONumber of schemas contained in the file

The FAT_header is followed by the fat_ARCH structure, which is used to describe the details of the corresponding Mach-O file

struct fat_arch {
    cpu_type_t  cputype; /* cpu specifier (int) */
    cpu_subtype_t    cpusubtype; /* machine specifier (int) */
    uint32_t    offset;  /* file offset to this object file */
    uint32_t    size;    /* size of this object file */
    uint32_t    align;   /* alignment as a power of 2 */
};
Copy the code
field role
offset Specifies the offset of the corresponding schema relative to the beginning of the file
size Specifies the size of the corresponding schema data
align Specifies the memory alignment boundary for the data2 çš„ NTo the power

Cputype and cpusubType have already been mentioned, so I won’t repeat them here

data

  • Xnu source
    • github
    • opensource.apple.com
  • OS X ABI Mach-o File Format Reference