Virtual memory and the remap mechanism of virtual memory, as well as the implementation of thunk code blocks constructed by static instructions.

👉Thunk program implementation principle and application entry in iOS.

A Thunk program is simply a block of code that can be constructed dynamically at run time or at compile time. The Thunk program can be used as trampoline code for some real function calls in addition to the purpose described in the first article, and for docking calls with inconsistent function arguments. The Thunk program can be used as an Adapter from a design pattern perspective. This article focuses on implementing the Thunk program through static code at compile time to address the constraints of the previous article on dynamic instruction construction in iOS.

A brief introduction to virtual memory implementation

Before introducing statically constructed Thunk programs, familiarize yourself with virtual memory. Virtual memory is a very important technology for memory management in modern operating systems. Through the virtual memory mapping mechanism, each process can have very large and completely isolated and independent memory space. The operating system allocates and manages virtual memory by page. When an executable file or dynamic library is loaded into the memory for execution, the operating system maps the code segment and data segment in the file to the corresponding virtual memory area in the form of a memory-mapped file. The execution of a program code is part of code is always assigned in a piece of executable permissions in virtual memory area, different operating systems of executable code, the area of memory requirements of such as iOS, executable code in the virtual memory area is executable permissions only, otherwise it will produce a system crash, This means that instructions cannot be dynamically constructed from a read-write area of memory (such as heap or stack memory) for the CPU to execute. This means that in iOS, it is not supported to set the protection mechanism of a certain segment of memory to read and write so that the data can be filled and then set to executable protection mechanism for dynamic instruction construction (so-called JIT technology). Fortunately, the operating system provides a virtual memory remap mechanism to solve this problem. The so-called virtual memory remap mechanism is that the newly allocated virtual memory page can be remapped to the allocated virtual memory page. The newly allocated virtual memory page can be consistent with the content of the existing virtual memory page, and can inherit the protection rights of the original virtual memory page. The remap mechanism for virtual memory enables processes to share the same physical memory between processes or within processes.

Some conclusions can be drawn from the graph above:

  1. Both physical and virtual memory are managed on a page basis and are generally the same size.
  2. The operating system creates a process page table for each process. The page table records the mapping between virtual memory pages and physical memory pages and the related permissions. And page tables are stored in physical memory pages. The essence of virtual memory allocation is to create a mapping relationship from virtual memory pages to physical memory pages in the page table. A remap simply maps different virtual page numbers to the same physical page number. For example, page 1 and page 4 of process 1 are mapped to the same physical page 6.
  3. Different virtual page numbers between different processes can be mapped to the same physical page number. One such application is to solve the problem of shared loading of dynamic libraries. For example, UIKit framework library is loaded into memory when the first process is running. Then when the second process is running and needs UIKit library, it does not need to reload memory from a file but to share the UIKit dynamic library that has been loaded into physical memory. In the example above, page 5 of process 1 and page 7 of process 2 share the same physical memory page 9.
  4. The operating system also maintains a global physical page free information table that records the physical memory that is not currently allocated. This way, when a process needs to allocate virtual memory space, it looks up free areas from the table for quick allocation.

IOS kernel system has a layer of Mach subsystem, Mach subsystem is the kernel of the kernel, it is a microkernel. In the Mach subsystem, tasks, threads and memory management are all called an object, and each object is assigned a port number called port. Communication and function calls between all objects are carried out through the Mach message identified by port.

Remap mechanism for virtual memory

The following code shows virtual memory allocation destruction and the remap mechanism for virtual memory. This example shows how to use the remap mechanism to implement two different entry addresses of the same function:


#import <mach/mach.h>// Since the newly allocated virtual memory is page aligned, the memory to be mapped is also page aligned, so the function starting address here is page aligned. int __attribute__ ((aligned (PAGE_MAX_SIZE))) testfn(int a, int b) { int c = a + b;returnc; } int main(int argc, char *argv[]) {// Vm_alloc allocates a block of virtual memory in pages. vm_size_t page_size = 0; host_page_size(mach_host_self(), &page_size); Vm_address_t addr = 0; // Allocate a page of virtual memory in the free area of the current process. Addr points to the starting location of virtual memory. kern_return_t ret = vm_allocate(mach_task_self(), &addr, page_size, VM_FLAGS_ANYWHERE);if(ret == KERN_SUCCESS) {memcpy((void*)addr,"Hello World! \n", 14);
        printf((const char*)addr); // After executing the above code, the contents of the memory addr will be in addition to the initial "Hello World! \n "The other areas are blank and not executable code areas. // Remap virtual memory. After executing the vm_remap function, addr's memory will be remapped to the memory page where testfn is located. Instead, the code is consistent with the testfn function. vm_prot_t cur,max; ret = vm_remap(mach_task_self(), &addr, page_size, 0, VM_FLAGS_FIXED | VM_FLAGS_OVERWRITE, mach_task_self(), (vm_address_t)testfn,false, &cur, &max, VM_INHERIT_SHARE);
        if(ret == KERN_SUCCESS) { int c1 = testfn(10, 20); / / execution testfn function int c2 = ((int (*) (int, int)) addr) (10, 20). // Addr remapped will have the same content as testfn, so you can call addr as if it were testfn. NSAssert(c1 == c2, @"oops!");
        }

       vm_deallocate(mach_task_self(), addr, page_size);
    }

   return 0;
}
Copy the code

First, we use the VM_allocate function to allocate a page of virtual memory in the free area, in unit of page size, and addr points to the first address of the memory. Once the allocation is successful we can read and write to this block of memory just like normal memory. Addr is written to memcpy, and addr is read to printf. The memory that addr refers to has read and write properties. Addr Memory stores the following information:

Then we use vm_remp function to re-map addr memory address, vm_remap function has two port parameters to specify the target process and the original process, that is to say, vm_remap function can map any two process memory address to each other. This memory mapping support can also be used for communication between processes. Of course, cross-process memory mapping cannot be implemented in iOS, so the target process and the original process must have the same port. In addition to specifying the source and destination process ports, you also need to specify the destination and source addresses. That is, the vm_remap function maps the destination address to the source address so that the memory referenced by the destination address is consistent with the source address. The destination address is addr and the source address is the start address of the testfn function. The result of the mapping is that the memory referenced by addr is the same as the memory referenced by testfn, and that addr inherits protection privileges from the source address testfn. Because testfn is compile-time code, it ends up in a code segment that has only executable permissions. The end result is that addr is also an executable memory area that points to the same thing testfn points to, which is an executable code. The results of the two subsequent function calls are the same, which proves that the results are correct. We can see that addr and testfn point to exactly the same content:

The vm_remap function enables two different virtual memory addresses to refer to the same physical address.

It is interesting to say that the unique identification of an object in object-oriented systems is its memory address, including the implementation of the equal function in some systems, which compares the addresses of objects to be equal. If there is vm_remap processing, this conclusion will be broken, so through VM_remap we can achieve an object can be accessed through multiple different addresses, here we can also think about whether we can use this technology to solve some of the current problems?

Vm_allocate can be used to allocate virtual memory, and MALloc can also be used to allocate heap memory. What is the relationship between the two? The former is actually a much lower level memory management API, and the size of the allocated memory is bounded by multiples of pages; In the latter, heap memory is an advanced memory management API. In the implementation, vm_ALLOCATE is used to allocate a large memory area (including stack memory). Then, advanced operations such as partition management and idle reuse are carried out in this large memory area to achieve some fragmentary and range memory allocation operations. But eventually we can use these functions to read and write the allocated memory.

If testfn is mapped to addr, addr can have the same ability as testfn, but this ability requires all constraints on the function body of TestFN. The constraint is that testFN cannot have constants, global variables, and function calls. Reason is that these operations after compiled into machine instructions to access these data is achieved through the relative offset, so if after the success of the addr mapping because function base address have change, if during a visit by addr, the relative offset value in the instruction will be the result of a mistake, resulting in the crash of a function call.

Statically construct the Thunk program

The previous article implemented a thunk code by dynamically constructing machine instructions in memory, but this mechanism does not work in a distribution-certificate-packaged program on iOS. Examine the instructions for constructing thunk code manually:

    mov x2, x1
    mov x1, x0
    ldr x0, #0x0c
    ldr x3, #0x10
    br x3
  arg0:
    .quad 0
  realfn:
    .quad 0
Copy the code

As you can see, the focus of the instruction block is on instructions 3 and 4. These instructions assign values to specific registers by reading data at 0x0C and 0x10 offsets from the current instruction, which can be dynamically adjusted and set during memory construction to achieve runtime thunk capability. Now change the above code:

     mov x2, x1
     mov x1, x0
     ldr x0, PAGE_MAX_SIZE - 8
     ldr x3, PAGE_MAX_SIZE - 4
     br x3
Copy the code

You can see that the offset of instruction 3 and instruction 4 has changed to PAGE_MAX_SIZE, which is the value of a virtual memory page size. The offset of instruction fetching data has been enlarged. The problem is that if only a small portion of memory is dynamically constructed to store instructions, and no extra page is allocated to store data, what’s the point?

Imagine if that part of the instructions above were not dynamically constructed, but statically compiled code? This way, this part of the code will not run on iOS due to signature issues. Further, we can allocate two pages of virtual memory at run time, and when the allocation is complete, remap the virtual memory address of page 1 to the memory address of the above part of the code, and use the virtual memory allocated on page 2 to store the data offset specified in the instruction. According to the description of the remap mechanism above, it can be seen that the virtual memory of page 1 allocated after remap has the ability to execute the code, and since the data obtained by the 3rd and 4th instructions in the code is the data of the corresponding virtual memory of page 2, This solves the problem of generating thunk programs without dynamically constructing instructions. The principle of the whole implementation is as follows:

From the above flowchart, it is clear that the ability to construct a Thunk block can be achieved by remapping virtual memory without dynamic construction instructions. Let’s combine the quicksort from the first article with the remap mechanism to achieve the ability to construct a Thunk block statically

  1. Start by adding an assembly file with the suffix.s to your project. The code in this file only implements support for ARM64-bit systems
// // thunktemplate.s // Thunktest // // Created by YoungSoft on 2019/1/30. // Copyright © 2019 YoungSoft. All Rights reserved. //#if __arm64__

#include <mach/vm_param.h>/* In the code segment, declare the external symbol _thunktemplate, and the directive address is aligned to the page size! */ .text .private_extern _thunktemplate .align PAGE_MAX_SHIFT _thunktemplate: mov x2, x1 mov x1, x0 ldr x0, PAGE_MAX_SIZE - 8 ldr x3, PAGE_MAX_SIZE - 4 br x3#endif

Copy the code
  1. Then we implement the sorting code in another file:
extern void *thunktemplate; Typedef struct {int age; char *name; }student_t; Int ageIdXcomparfn (student_t students[], const int *idx1ptr, const int *idx2ptr) {returnstudents[*idx1ptr].age - students[*idx2ptr].age; } int main(int argc, const char *argv[]) { vm_address_t thunkaddr = 0; vm_size_t page_size = 0; host_page_size(mach_host_self(), &page_size); // Allocate 2 pages of virtual memory, kern_return_t ret = vm_allocate(mach_task_self(), &thunkaddr, page_size * 2, VM_FLAGS_ANYWHERE);if(ret == KERN_SUCCESS) {// The first page is used to remap to the thunktemplate address. vm_prot_t cur,max; ret = vm_remap(mach_task_self(), &thunkaddr, page_size, 0, VM_FLAGS_FIXED | VM_FLAGS_OVERWRITE, mach_task_self(), (vm_address_t)&thunktemplate,false, &cur, &max, VM_INHERIT_SHARE);
        if (ret == KERN_SUCCESS)
        {
            student_t students[5] = {{20,"Tom"}, {15,"Jack"}, {30,"Bob"}, {10,"Lily"}, {30,"Joe"}}; Int idxs [5] =,1,2,3,4 {0}; // Fill in the corresponding position of the second page with data. void **p = (void**)(thunkaddr + page_size); p[0] = students; p[1] = ageidxcomparfn; // use thunkaddr as the address of the callback function. qsort(idxs, 5, sizeof(int), (int (*)(const void*, const void*))thunkaddr);for (int i = 0; i < 5; i++)
            {
                printf("student:[age:%d, name:%s]\n", students[idxs[i]].age, students[idxs[i]].name);
            }
        }
        
        vm_deallocate(mach_task_self(), thunkaddr, page_size * 2);
    }
    
   return 0;
}

Copy the code

It can be seen through the remap mechanism can solve the dynamic structure of creative memory instructions to implement thunk process defects, the entire process doesn’t need we construct instruction, but borrow existing has instructions to thunk program structure, and the problem of signing on such code does not exist, can also be safe operation under the iOS any signature. This technique can also be used on Linux/Unix systems.

Afterword.

The techniques and techniques described in this article refer to libffi’s support for closures and the iOS Runtime implementation of IMP function Pointers from a block object.


Welcome to ouyang Big Brother 2013Making the address