Summary of basic principles of iOS

Virtual memory & physical memory

Early data access was directly through physical addresses, which had two problems:

  • 1. Memory is not enough
  • 2. Security of memory data

For problem 1, we added an intermediate layer between the process and physical memory. This intermediate layer is called virtual memory, which is mainly used to manage physical memory when multiple processes exist at the same time. Improved CPU utilization, enabling multiple processes to load simultaneously and on demand. Therefore, virtual memory is essentially a mapping table of the corresponding relationship between virtual addresses and physical addresses

  • Each process has a separateVirtual memory, their addresses areStarting from 0, the size is fixed 4G, each virtual memory will be divided into one by onepage(The page size is in16K on iOS, 4K on others), each load is loaded in the unit of page, and the processes cannot access each other, ensuring the security of data between processes.
  • In a process, only some functions are active, so onlyPlace the active parts of the process into physical memoryTo avoid the waste of physical memory
  • When the CPU needs to access data, it first accesses virtual memory and then addresses it through virtual memory. That is, it looks for the corresponding physical address in the table and then accesses the corresponding physical address
  • Occurs if the contents of the virtual address are not loaded into physical memory at access timepagefaultTo block the current process, the data needs to be loaded into physical memory, and then addressed for reading. This avoids memory waste

The following figure shows the relationship between virtual memory and physical memory

Security of in-memory data: ASLR technology

In the above explanation of virtual memory, we mentioned the starting address of the virtual memory and size are fixed, this means that when we visited, the data address is fixed, which can lead to our data is very easy to be cracked, in order to solve this problem, so apple in order to solve this problem, at the beginning of the iOS4.3 ASLR technology is introduced.

ASLR concept: Address Space Layout Randomization is a security protection technology against buffer overflow. By randomizing the linear area Layout of heap, stack and shared library mapping, it increases the difficulty for the attacker to predict the destination Address. It is a technique to prevent the attacker from locating the attack code directly and prevent overflow attack.

Its purpose is to configure the data address space in a random way, so that some sensitive data (such as APP login registration, payment related code) can be configured to an address that the malicious program cannot know in advance, making it difficult for attackers to attack.

Due to the existence of ASLR, the loading address of executable files and dynamic linked libraries in virtual memory is not fixed every time they are started, so the resource pointer in the image needs to be fixed at compile time to point to the correct address. The correct memory address = ASLR address + offset value

Executable file

Different operating systems have different executable file formats. The system kernel reads the executable into memory and determines the format of the binary file based on the executable’s header signature (magic number)

PE, ELF and Mach-O are all variations of the Command File format (COFF). The main contribution of COFF is the introduction of “segments” mechanism in the target file. Different target files can have different numbers and types of “segments”.

Universal binary

Because different CPU platforms support different instructions, such as ARM64 and x86, the Common binary format in Apple is to package mach-O files of various architectures together, and then the system chooses the appropriate Mach-O based on its CPU platform. Therefore, the common binary format is also known as the fat binary format, as shown in the following figure

The general binary format is defined in

, which can be found in xnu -> EXTERNAL_HEADERS ->mach-o. The general binary format starts with fat_header. Fat Archs are how many Mach-Os there are in common binaries, and individual Mach-Os are described via the FAT_arch structure. The two structures are defined as follows:

/* -magic: can let the system kernel read the file to know that it is a common binary file -nfat_arch: Struct fat_header {uint32_t magic; /* FAT_MAGIC */ uint32_t nfat_arch; /* number of structs that follow */ }; /* Fat_arch is used to describe mach-o-cputype and cpusubtype: */ struct fat_arch {cpu_type_t cputype; struct fat_arch {cpu_type_t cputype; /* cpu specifier (int) */ cpu_subtype_t cpusubtype; /* machine specifier (int) */ uint32_t offset; /* file offset to this object file */ uint32_t size; /* size of this object file */ uint32_t align; /* alignment as a power of 2 */ };Copy the code

So, to sum up,

  • Universal binary file is a new binary file storage structure proposed by Apple. It can store binary instructions of various architectures at the same time, so that the CPU can automatically detect and select the appropriate architecture to read the binary file in the most ideal way

  • Because common binaries store multiple schemas at the same time, they are much larger than single-schema binaries and take up a lot of disk space, but since the system automatically selects the most appropriate one, unrelated schema code does not take up memory space and is more efficient to execute

  • You can also merge and split Mach-O with instructions

    • View the current mach-O architecture:Lipo-info MachO file
    • Merger:Lipo-create MachO1 macho2-output Specifies the output file path
    • Resolution:Lipo MachO file - THIN architecture - Output Output file path

The Mach – O files

Mach-o files are short for The Mach Object file format, which is a file format for executables, dynamic libraries, and Object code. As an alternative to the A. out format, the Mach-o format provides greater extensibility and faster access to symbol table information

Familiarity with the Mach-O file format will help you better understand the underlying operating mechanism of Apple and better master the steps of dyLD to load Mach-O.

Viewing a Mach-O file You can view information about a Mach-O file in either of the following ways. The second method is recommended

  • [Method 1] Otool Terminal command:Otool -l Mach -o Specifies the file name

– [Method 2] MachOView Tool (recommended) : Drag the Mach-O executable file to MachOView to open it

Mach-o file format

For OS X and iOS, Mach-O is the executable file format, which includes the following file types

  • Executable: Executable file
  • Dylib: dynamic link library
  • Bundle: dynamic library that cannot be linked and can only be loaded at run time using dlopen
  • Image: means one of various Executable, Dylib, and Bundle
  • Framework: a collection of Dylib, resource files, and header files

The following illustration shows the Mach-O image file format

The above is the format of a Mach-O file. A completed Mach-O file is divided into three main parts:

  • The Header Mach - O the headMach-o CPU architecture, file types, and load commands
  • Load Commands Load Commands: describes the specific organization structure of data in a file. Different data types are represented by different load commands
  • The Data of DataThe data for each segment of the data is stored here. The concept of segment is similar to the concept of the middle section of ELF files. Each segment has one or more sections that hold specific data and code, including code, data, such as symbol tables, dynamic symbol tables, and so on

Header

The mach-o Header contains the key information of the entire Mach-o file, so that the CPU can quickly know the basic information of the MAC-O, which is in mach.h (the path is the same as the previous fat.h) for 32-bit and 64-bit architectures. The mach_header and mach_header_64 constructs are used to describe the Mach-O header, respectively. Mach_header is the first object read by the connector when it is loaded, which determines the infrastructure, system type, number of instructions, etc. The mach_header_64 structure of the 64-bit architecture is defined here. Compared to the mach_header of the 32-bit architecture, only a reserved field is added.

/* -magic: 0xfeedFace (32-bit) 0xfeedfacf(64-bit) -cpuType: CPU type, such as arm-cpusubtype: The specific CPU type, such as arm64 and armv7-fileType: Since the executable file, object file, static library, and dynamic library are all in the Mach-O format, the fileType is required to specify the type of the file. -ncmds: sizeofcmds: -sizeofcmds: indicates the sizeof the LoadCommands to load. -flags: indicates the flags that indicate the functions supported by the binary file, mainly related to system loading and linking. -Reserved: indicates the sizeof the LoadCommands to load. */ struct mach_header_64 {uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; /* type of file */ uint32_t ncmds; /* number of load commands */ uint32_t sizeofcmds; /* the size of all the load commands */ uint32_t flags; /* flags */ uint32_t reserved; /* reserved */ };Copy the code

Filetype mainly records the file types of Mach-O

#define MH_OBJECT 0x1 /* Target file */ #define MH_EXECUTE 0x2 /* executable file */ #define MH_DYLIB 0x6 /* dynamic library */ #define MH_DYLINKER 0x7 /* Dynamic linker */ #define MH_DSYM 0xA /* Store binary file symbol information for debug analysis */Copy the code

Correspondingly, the Header is displayed in the MachOView as follows

Load Commands

In mach-o files, Load Commands are primarily used to Load instructions, the size and number of which are provided in the Header and are defined below in Mach.h

*/ struct load_command {uint32_t CMD; /* type of load command */ uint32_t cmdsize; /* total size of command in bytes */ };Copy the code

We look at Load Commands in the MachOView, which records a lot of information, such as the location of the dynamic linker, the entry of the program, information about the dependent libraries, the location of the code, the location of the symbol table, and so on, as shown below

The LC_SEGMENT_64 type segment_command_64 is defined as follows

/* segment_command -cmd: indicates the type of the segment_command, -cmdsize: indicates the size of the segment_command (including the size of the nsects sections immediately following it) -segName: indicates the name of the 16-byte segment_command -vmaddr: -vmsize: indicates the virtual memory size of a segment. -fileOff: indicates the offset of a segment in a file. -filesize: indicates the size of a segment in a file. -initProt: specifies the initial memory protection for a segment page. -nsects: specifies the number of sections in a segment. -flags: Other Miscellaneous flag bits - Take the binary data of filesize bytes from fileOFF (offset) and put the VMsize bytes at vmADDR in memory. - Each segment has the same permissions (or, at compile time, the compiler groups data with the same permissions into segments) and its permissions are initialized according to initProt. Initprot specifies how to initialize the protection level of the page with read/write/execute bits - the protection Settings of the segment can change dynamically, but cannot exceed the value specified in maxProt (in iOS, Struct segment_command_64 {/* for 64-bit architectures */ uint32_t CMD; /* LC_SEGMENT_64 */ uint32_t cmdsize; /* includes sizeof section_64 structs */ char segname[16]; /* segment name */ uint64_t vmaddr; /* memory address of this segment */ uint64_t vmsize; /* memory size of this segment */ uint64_t fileoff; /* file offset of this segment */ uint64_t filesize; /* amount to map from the file */ vm_prot_t maxprot; /* maximum VM protection */ vm_prot_t initprot; /* initial VM protection */ uint32_t nsects; /* number of sections in segment */ uint32_t flags; /* flags */ };Copy the code

Data

Load Commands is followed by the Data area, which stores specific read-only, read-write code, such as methods, symbol tables, character tables, code Data, and Data required by connectors (redirects, symbol bindings, and so on). The main thing is to store specific data. Most of these mach-O files contain three sections:

  • __TEXT code: read-only, including functions and read-only strings
  • __DATA data segment: Read and write, including global variables that can be read and write
  • __LINKEDITThe: __LINKEDIT contains metadata for methods and variables (locations, offsets), as well as information such as the code signature.

Sections make up a large part of the Data Section, and sections are represented in Mach. H by the section_64 structure (under arm64 architecture), as defined below

-sectname: the name of the current Section -segname: the name of the segment where the Section is located -addr: the start location of the memory -size: Section size -offset: section file offset -align: byte size alignment -reloff: relocation entry file offset -nreloc: relocation entry number -flags: Flag, section type and attribute - reserveD1: Reserved (for offset or index) - reserved2: reserved (for count or sizeof) - reserved3: Retain */ struct section_64 {/* for 64-bit architectures */ char sectname[16]; /* name of this section */ char segname[16]; /* segment this section goes in */ uint64_t addr; /* memory address of this section */ uint64_t size; /* size in bytes of this section */ uint32_t offset; /* file offset of this section */ uint32_t align; /* section alignment (power of 2) */ uint32_t reloff; /* file offset of relocation entries */ uint32_t nreloc; /* number of relocation entries */ uint32_t flags; /* flags (section type and attributes)*/ uint32_t reserved1; /* reserved (for offset or index) */ uint32_t reserved2; /* reserved (for count or sizeof) */ uint32_t reserved3; /* reserved */ };Copy the code

Section can be seen in MachOView, mainly embodied in the TEXT and DATA sections, as shown below

Common sections include the following

section – __TEXT instructions
__TEXT.__text Main program code
__TEXT.__cstring C language string
__TEXT.__const A constant decorated with the const keyword
__TEXT.__stubs The placeholder code for stubs is referred to in many places as the Stub code
__TEXT.__stubs_helper The final point to when the Stub cannot find a real symbolic address
__TEXT.__objc_methname Objective-c method name
__TEXT.__objc_methtype Objective-c method type
__TEXT.__objc_classname Objective – C class name
section – __DATA instructions
__DATA.__data Initialized mutable data
__DATA.__la_symbol_ptr A lazy binding pointer table that starts with a pointer to __stub_helper
__DATA.nl_symbol_ptr Table of non-lazy binding Pointers, each of which points to a symbol searched by the dynamically-linked machine during loading
__DATA.__const Constants that have not been initialized
__DATA.__cfstring Core Foundation strings used in the program (CFStringRefs)
__DATA.__bss BSS, which stores initialized global variables, also known as static memory allocation
__DATA.__common No initialized symbol declaration
__DATA.__objc_classlist Objective – C class list
__DATA.__objc_protolist Objective – C prototype
__DATA.__objc_imginfo Objective-c image information
__DATA.__objc_selfrefs Objective – C self reference
__DATA.__objc_protorefs Objective-c stereotype references
__DATA.__objc_superrefs Objective-c superclass references

So, to sum up, the format diagram for Mach-O is shown below