For more exciting articles, please pay attention to the author’s wechat official number: Code worker notes

I. Program introduction

OCPack is a technical solution for App dynamic on iOS platform. Users can use Objective-C language to write function logic to be dynamic (generate. M files), and then generate patch files (.bin format) through the tool chain provided by OCPack. The client has a built-in virtual stack machine based on Native environment, which can dynamically load and execute the methods stored in the patch file of the client. Patch files can be downloaded, updated, and reloaded and run by VMS at any time based on service requirements.

The main advantages of this scheme:

  • Development efficiency

    Developers can use familiar Xcode IDE and Objective-C language for development and debugging just like writing common functional codes. After development, patch files can be easily generated by using the tool chain to improve the efficiency of patch development.

  • Grammar covered

    Considering the user convenience and the balance of development cycle, the current implementation of OCPack covers the basic syntax of C language and the more common syntax in Objective-C, ensuring that most commonly used syntax can be directly supported by developers, while some unsupported syntax also has corresponding alternative implementation methods.

  • Problem orientation

    • For syntax that is not currently supported, OCPack’s toolchain can clearly indicate the cause of the error and the location of the error code, which is convenient to locate the problems encountered during development.

    • After the virtual machine goes online, the corresponding interface can also be called to obtain the call stack information of each thread of virtual machine. Combined with the symbol file generated during compilation, the development can accurately locate the number of code lines of the source file called by virtual machine at that time, which is convenient to locate and solve online problems. Symbol parsing tools are also included in the tool chain.

  • performance

    Because it is based on Native environment, and the custom stack machine instruction set has clear meaning and simple design, and most interactions with Native environment are direct operation of memory address, eliminating the frequent string parsing and Box/Unbox overhead like JSPatch. Where the efficiency of OC bridge call is approximately native.

  • Memory and Stability

    • JavaScript’s GC’s memory management mechanism causes memory to fail to be freed in a timely manner, and forcing JSContext to be freed can lead to some weird crashes on the line that are difficult to locate and resolve.

    • This scheme supports ARC memory management and can cooperate with the CLIENT’s ARC/MRC memory management mechanism correctly, eliminating the problem of uncontrollable GC timing.

    • Combined with the characteristics of the project itself, the main used memory is put into the file map (MMAP), as far as possible does not occupy the memory quota of the application program, improve the stability of the application.

II. Technical proposal

OCPack reuse clang front end to analyze the target Objective-C code syntax tree, through the custom ASTFrontendAction to traverse the syntax tree, generate custom instruction set assembler. On the client side, the binary instructions in the assembler are interpreted and executed by our own virtual stack machine.

The basic data flow for generating patch files is as follows:

  • The Objective-C source code is first compiled through OCPack into an assembler (.s) that is converted into a custom instruction set. This is done primarily by parsing the Objective-C code syntax tree (AST) generated by LLVM.

  • The assembler program (.s) is then converted to a binary patch file (.bin) using a custom assembly tool (SMC).

Note: OCPack self-defined stack machine assembly instructions mainly have 67, in addition to the basic instructions, mainly based on the syntax tree node type design.

At runtime, the built-in virtual stack machine of the client can load the specified patch file according to user requirements, and then execute any of the methods.

The main technical points are introduced in the following modules:

Compiling module:

Objective-c program (.m) -> syntax tree -> assembler (.s)

1. Independent compiler

The libTooling interface of Clang was used to implement the AST FrontendAction. The AST consumer recursively traversed the syntax tree and generated executable assembly instructions for different node types.

  • Compiler options

    To call clang’s module to generate a syntax tree for the target Objective-C source file (.m), you need to provide the compilation options required to compile this.m. For the integrator, the object file may need to rely on many relevant.h or other compile switches. To facilitate the integrator to obtain a complete list of compile options, we have developed a tool that can easily obtain a complete list of compile options for the object file from the compile log of the integrator project.

  • Compilation errors and warnings

    OCPack supports common syntax types in Objective-C. For unsupported syntax types, a log file is generated during compilation to specify the error type and location, facilitating development.

Note: To further improve development efficiency, OCPack also implements a standalone Clang plugin that displays compilation errors in Xcode and generates.bin files with one click by adding an.xcconfig file to the project (instead of the default clang). Eliminating the need to get compilation options and manually view error logs simplifies the development process.

2. Stack machine assembly instruction set

To connect a virtual machine that contains a syntax tree of Objective-C code logic to a client run, OCPack needs to define a fairly complete set of assembly instructions. The instruction set should satisfy the following two conditions:

  • Provides sufficient functional support to implement predefined objective-C syntax ranges. Specifically, for the specified set of syntax tree node types, the corresponding assembly instruction combination can be generated by compiling logic, equivalent to accomplish the logic function intended by the original Objective-C code.

  • Minimize the complexity of instruction set: on the one hand, the number of instructions should be reduced as far as possible to reduce the complexity of virtual machine implementation; On the other hand, the semantic complexity of a single instruction should be reduced as much as possible, and each instruction should complete clear and limited functions.

The following is a brief introduction to some typical instruction design schemes:

2.1 Push and POP instructions

The most basic part of the stack machine is the operation stack, which is used to store the operands and operation results in progress. For example, to calculate 1 + 2, the stack machine needs to execute instructions like the following:

    push instant 1
    push instant 2
    add
Copy the code

The operands 1 and 2 are pushed onto the stack, and then the add operation is performed. The add operation pops the operands, adds them, and pushes them onto the stack. After the above instructions are executed, operation result 3 is stored at the top of the operation stack.

But only the operation stack is not enough, the complexity of the program logic requirements such as local variables, method parameter data has determined the same memory address, therefore OCPack to local variables, static variables, constants and the number of Pointers, immediately one for each section, corresponding to each type of variable belongs in the period of a serial number (index).

  • Segment: used to store a variety of non-temporary data (addressable data), which is thread-related and thread-independent. Thread-related data segments mainly include:

            local    // Local variables
            arg      // Method arguments
            this     // Store self (for super implementation)
            that     // Member variables used to implement struct
            pointer  // Help implement this, that
            // Note: Thread-specific segment data is stored in thread_context(thread-local storage) and is visible only to the local thread
    Copy the code

    Thread-independent data segments mainly include:

            const    // Constant string
            static   // Static variables
            instant  / / the number immediately
            // Note: thread independent segment data is stored in machine, all threads are visible
    Copy the code

As it traverses the syntax tree, the OCPack compiler maintains a symbol table, creating a symbol table entry for each variable declaration (VarDecl), holding its segment name and index.

For variable references in the syntax tree (VarDeclRef node), the OCPack compiler will find the corresponding VarDecl symbol table entry and generate the corresponding push and POP instructions.

The arguments to the push and pop directives are the segment name and index:

  • push segment indexPush index to the top of the stack
  • pop segment indexPop the data at the top of the stack to index in the segment
2.2 prolog instruction
  • Prolog instruction is the first instruction method in each virtual machine, it will be according to the instruction method parameters for the current stack frame of the local space, open up the size of the corresponding paragraphs and record the current stack frame of the return address, and then calculate and record the parameter list (arg) starting address, then the caller’s stack pointer parameter table before, Finally switch to the called stack frame.

  • Format: prolog arg_size local_size

    • Arg_size indicates the total length of all parameters and is used to calculate the starting address of the parameter list

    • Local_size indicates the length of the local variable segment

The design of method calls and pass-throughs requires some special considerations. The main requirements are:

  • The caller simply pushes the parameters and return values on the stack as required, then jumps directly to the starting address of the called method, and the program executes smoothly without incurring other unnecessary responsibilities

  • According to the general call logic of the stack machine, when the called function returns, the parameters and return values pushed in should have been popped by the called function. At this time, the top of the stack should only have the return value, and below the top of the stack should be other data irrelevant to the call

  • The called party needs to know the specific location of the parameter and the return address. The parameter must have a fixed address and support random access. It cannot be a temporary value obtained only by POP

  • Local variables also need to be randomly accessed, and their size needs to be allocated at the beginning of function execution

To meet these requirements, the prolog directive is designed in OCPack:

  • Add this instruction to the head of each method, and execute this instruction as soon as the caller jumps to the current method. Relevant Settings are executed in this instruction, minimizing the requirements for callers.

  • According to the convention, when prolog is executed, the return address is stored at the top of the stack, and the inverted parameter list is below the top of the stack. The prolog instruction first pops the return address and saves it, and then adjusts the length of the parameter list back to the sp of the caller’s stack frame (this length is determined by the compilation as the instruction parameter. The caller does not need to pass this information through the stack), which points to the first parameter. Note that the frame of the caller has not yet been generated; all operations are still in the context of the caller’s frame. This ensures that the sp of the caller is in the appropriate position when the caller returns, and then pushes the return value directly.

  • At this point, Prolog generates a new stack frame according to the return address and the caller’s stack frame information. The ARG segment established in the new stack frame directly points to the starting position of the parameter list. Then, the parameters can be accessed using push arg I or POP Arg I instructions.

  • At the same time, the local segment of the local variable also needs to be established, and its size is determined by the compilation time, that is, the local_size parameter of the prolog instruction. After building the stack frame and switching the current stack frame, that is, the transition stage of method call is completed, and the program flow can continue.

2.3 ret instruction
  • The RET directive is the last directive of a method in the virtual machine, corresponding to prolog, and is used to unwind the unwind frame and copy the return value data from the top of the stack of the called party to the top of the stack of the caller after the unwind to implement the calling convention that the return value is at the top of the stack after the call is completed.

  • Format: RET retSize

There is a copy of the data, and the copy size is the size of the returned value. In order to minimize the influence on callers, add retSize parameter to ret method at compile time, so that data copy can be completed when RET is executed. After the stack frame falls back to the caller, the caller can expect the return value to be at the top of his stack. The subsequent logic is clear, regardless of whether the current stack top value is returned by a method call or pushed by itself.

2.4 Jump instruction

In order to realize the process control syntax such as if/else and for loop, OCPack defines JMP and JMP_if instructions, which generate corresponding jump instruction and jump label according to the specific situation of the node of the corresponding type in the syntax tree. These text jump labels are stored in the.s file and then replaced with the appropriate offset address at the next stage when Assembler converts.s to.bin.

2.5 the switch command
1) Switch jump table

Switch needs to decide the corresponding address of which case label to jump to at runtime, only jMP_if needs to insert multiple comparison statements at the end of the case list code, and the stack machine needs to push corresponding parameters before each comparison, which is cumbersome and has poor performance. So OCPack added cMP_N, resolve_label, and Jmp_TOS directives to the instruction set.

  • First, the OCPack compiler generates instructions by pushing the switch’s comparison targets onto the stack, then pushing the values of each case, and then adding cMP_n instructions. At run time, the cMP_n n instruction pops n data from the stack, compares it with the data at the top of the stack (the target of the switch), and ipush to the top if it is equal to the ith.

  • Then add the resolve_label label_prefix directive. When this instruction runs, label_prefix and I at the top of the stack will be string splicted to generate the target label name, which will be searched in machine to find the corresponding label address and pushed to the top of the stack. Where label_prefix is unique to each switch statement and can support switch nesting.

  • Then add the jmp_tos directive. When running, this instruction jumps to the address at the top of the stack, thus realizing the function of switch.

2) Continue and break support:

A break and continue label stack is maintained, with the top element being the target label that should be JMP to on the current break or continue call, and pushed and unloaded at the start and end of the target expression. When the node in the syntax tree is break or continue, the target label at the top of the current stack is extracted and the JMP target label instruction is generated.

2.6 call instructions
  • Dynamic C method calls using libffi. For each c method called, there is a DECL_C_FUNC declaration in.s that contains the name of the method, its signature (including the number and type of arguments), and so on.

  • The parameter types in.s are OCPack custom strings that correspond to libffi types. For structs, the instruction is generated by recursively finding the types of all the struct members, assembling them into a string, and then undoing the string at run time to construct the data types required by libffi.

  • For a variable parameter method, if the method name is the same but the number or type of parameters is different, it will have a different entry in DECL_C_FUNC. The VM will construct the corresponding libffi parameter data according to the corresponding entry.

2.7 Basic unary and binary operator instructions
  • The instruction set has corresponding instructions for arithmetic, logic, shift and other basic operators. The instruction parameters include return value types, operand types, etc.

  • In the implementation code of virtual machine, the combination of various operations and data types are assigned to the corresponding C language implementation, and the runtime calls the corresponding implementation according to the incoming instructions and parameters.

Note: This instruction only supports the operation of basic data types such as integer and floating point. It does not support the operator of user-defined type overloading

2.8 Conversion of left and right values
  • The instruction set has an lvalue to rvalue instruction that takes the size of an rvalue. This instruction is used to pop the address (ADDR) stored at the top of the stack, and then push the size of the memory data whose address is ADDR to the top of the stack.

  • In the AST generated by CLang, all VarDeclRef actually correspond to the address of the variable, and the parent node of VarDeclRef in the AST is the left and right value conversion node for the code accessing the content of the variable (the right value of the variable). Therefore, the push instruction in OCPack, similar to push SEG index, pushes the address at the INDEX of SEG segment into the operation stack, and the specific content at the corresponding address is completed by the left and right value conversion instruction.

Note: At the beginning of implementation, the push instruction of OCPack directly pushed the rvalue of the variable at index of SEG segment into the operation stack (that is, the node of left-right value conversion was ignored in this case), but later it was found that there was no left-right value conversion node in AST in the case of the lvalue variable in assignment operation. If these cases are treated in a special way, the logic will become complicated and it is difficult to guarantee complete coverage. Later, we decided to change the object of push operation into the lvalue of the corresponding variable in accordance with the arrangement logic of the nodes in AST, sacrificing part of the performance for the reliability of the program.

2.9 Objective-C method Call instructions
  • The instruction set has directives OBJC_MSG_CLASS/OBJC_MSG_INST dedicated to calling Objective-C methods.

  • The virtual machine interprets all the parameters stored on the stack and constructs the NSInvocation invocation to invoke the Objective-C Runtime.

  • In the implementation of the command, target and parameter are referenced __unsafe_unretained, that is, their life cycle is not changed. Return values are always autoRelease, ensuring that the return value is valid when returned to the direct caller.

Note: During the implementation, the memory arrangement order of input data required for objective-C invocation instructions was also modified. Because for objective-C methods, you don’t know how many arguments there are until you get the selector, so the design was to put the argument list backwards, so the first argument goes to the top of the stack, and the second argument goes down the stack. So you can pop it stably twice and get the declaration of the selector, and then pop all the arguments based on the number of arguments and the size specified in the selector. However, this method is more difficult to handle when the size of the argument is larger than 64 bits (such as struct), because to get the correct struct data, the program needs the corresponding number of 64 bits, and then do concatenation, cumbersome and error prone. OBJC_MSG_CLASS/OBJC_MSG_INST is still pushed sequentially (the first parameter is pushed first, and the last parameter is at the top of the stack). In the implementation of the instruction, Calculate the starting position of the first parameter directly from sp according to the length of the parameter list provided in the instruction parameter, so that all parameters can be accessed by pointer, regardless of the size. The previous need for multiple POP commands, instead of just before the command exit, sp back parameter table length can be.

Assembler modules

Function: Assembler (.s) -> binary patch (.bin)

Parses the whole.s text and converts the text token into the corresponding binary data, mainly including:

  • Conversion of instant data from text to binary data

  • Indicates the conversion of a label to an address

  • Constant string conversion, where strings are stored directly in.s, but later changed to direct bytecode when ‘\n\t’ is not well supported

  • Generate an export function table that records the mapping between method names and addresses defined in the VM

  • Generate a table of imported functions, containing all c method declarations and their indexes, which are directly called when c methods are called in the code segment

  • Static Data segment size and total size of global extents. This size is required because the global extents need to be placed on a shared and anonymous Mmap memory during VM initialization

  • Store the GUID value

  • Stores Target arch, which is used to validate 32bit and 64bit to ensure a match between platform and.bin files

  • Text instructions are converted to binary instructions

After conversion, each data is stored in the corresponding data segment in the memory, and then the whole memory is dumped into a binary file.

Note: Most of the data required by the 2-process file is arranged exactly in the same way as the.bin file, which makes it easy to use memory mapping to load the.bin file, thus reducing the amount of private memory.

Load module

Function: binary patch (.bin) loading

When load_image is called, the virtual machine first mmap the.bin file into a segment of memory to check whether magic number, bin version and ARCH match.

Then apply for a shared Anonymous Mmap memory based on the size of the global partition.

Then load each data segment separately to establish the necessary runtime memory data, the main data segment includes:

  • Constant string segment, allocates memory of the corresponding size of the global area to the constant segment, and points the corresponding index to the corresponding start of the string address

  • Static data segment that allocates memory corresponding to the size of the global area to the static segment

  • Exporting symbol table

  • Import symbol table

  • Code segment

  • GUID data

Note:binThe file format is as follows:

Perform module

Function: binary patch (.bin) execution

1) Basic VM information

  • Stacks and segments are measured in bits

  • Before calling a method, the corresponding parameters need to be pushed onto the action stack

  • After the call is complete, the return value is placed at the top of the stack

  • Run time context (thread_context) and stack frame

    • Thread_context maintains a linked list of stack frames to hold call relationships

    • Stack frames are used to store runtime information about the current method, including:

      • Stack frame base address
      • The return address
      • Pointer to the base address of the caller stack frame
      • Segment table base address
      • Operation stack base address
      • Operation stack pointer sp
      • Program counter IP

Note: The specific memory layout of VM function stack frame is as follows:

2) Objective-C calls virtual machine methods

  • Objective-c code interacts with the virtual machine by passing the name of the function to be called and its parameters (here the addresses of the real parameters) to the virtual machine’s callFunctionWithArgs: method and getting the return value.

  • CallFunctionWithArgs: The method internally looks up the exported table by function name to find its method signature and address. Then, according to the parameter size specified in the method signature, the data of the corresponding size at the passed parameter address is pushed onto the operation stack, and then jumps to the starting address of the called method to start the execution of the assembly instruction.

  • After executing all the assembly instructions, the virtual machine returns to callFunctionWithArgs:, which in turn copies the return value data at the top of the stack to the return value address passed in by the caller.

  • After the call is complete, the top of the stack address of the virtual machine should remain exactly the same as before the call.

  • If the return value of a vm method is NSObject*, OCPack determines whether the __brige_retained object needs to be retained based on whether the variable storing the returned value is a strong reference, so as to neutralize the release operation of a strong variable. In other cases, because the object returned is an AutoRelease, it is returned without special treatment (see The Objective-C method call instructions in Section 2.9 of compiling modules).

3) Virtual machine calls OC method (f1), f1 calls virtual machine method (F2)

To support this process, it is necessary to ensure that the SP of the current stack frame of the VM is exactly the same after f2 invocation to ensure that the execution of F1 is not affected.

4) Virtual machine methods call each other

When the OC method is called, it will check whether the corresponding method is in the exported function table first. If so, go to this flow. In this case, the parameter table of the VM method must be the same as that of the OC method. Otherwise, the vm performance must be degraded by copying parameters.

5) Multithreading support

  • Run time context (thread_context) Pointers are placed in thread-local storage, and each thread reads and writes data from the context to manipulate its own data. This ensures that the running state of each thread is isolated from each other, thus supporting multi-threaded invocation scenarios.

  • OCPack registers callback functions for thread exit, and when a thread exits, OCPack removes all virtual machine context-related data from that thread.

6) Memory usage

  • Binary file loads use mmap, global data areas use shared & anonymous mmap, and constant string data points directly to addresses in.bin.

  • Thread_context is one copy per thread, using shared and anonymous mmap.

  • The runtime virtual machine classes themselves maintain only import and export function data and a small amount of pointer data.

7) Stack backtracking on crashes

  • The runtime context (thread_context) maintains a linked list of stack frames (thread_frame) that correspond to the method invocation relationships in the virtual machine.

  • When a crash occurs, thread_context of all virtual machines in all threads is traversed, and the linked list of stack frames is traversed. The IP address stored in each stack frame is written to the crash log.

  • The crash log organizes crash stacks by thread. It also records the address of each VM and the GUID of the.bin file loaded by the VM to distinguish threads, VM instances, and bin files.

8) Crash symbol parsing

A GUID is generated or specified by the OCPack compiler and is present in all subsequent generated related files, including.s,.sym,.bin, and crash logs generated at run time. After the online crash log is sent back, the crash resolution server can find the corresponding symbol file according to the GUID in the log for symbol resolution. At the same time, the source information of all dependencies (including the corresponding.s file, the corresponding source code version of bin code, the version of OCPack tool chain, etc.) of the corresponding GUID bin file is stored in the database of the construction server to facilitate the development of recurrence and location of related problems.

9) Hook Objective-C methods

  • The principle is similar to JSPatch. By replacing the target method of the target class with objc_msgForward and the Forward Invocation with a custom forward method, when the target method is invoked, Switch to a custom implementation of forwardInvocation. In the custom Forward implementation, the NSInvocation is sent to the built-in VIRTUAL machine, the virtual machine will pick up its parameters and call the corresponding virtual machine method, and the return value is set back to the NSInvocation, that is, the Hook function is completed.

  • Compared with JSPatch, OCPack method calls save a lot of string parsing operations, most of the parameters can be directly passed to the VIRTUAL machine for processing, the overall cost of method calls is less than JSPatch.

Performance optimization

  1. Size optimization of binary programs
  • In the early implementation of OCPack, the template class is used to achieve a binary operator for different operands and return value types of support, so it is more convenient to debug. But it turned out that this solution caused the code to explode in size. The template method generates a large number of methods based on the type of input and output parameters, most of which are only a few instructions long, and the method name alone takes up a lot of memory in the final binary content analysis.

    • After the function was basically stable, under the guarantee of unit test, the template was changed to macro implementation, and the volume of code and data sections was greatly reduced. The size of framework files was reduced from 3.5m to less than 150K.
  1. Performance optimization
  • Assembly code optimization

    • Assembly code generated directly from the syntax tree will have a lot of useless push operations, mainly because the return value of some expressions is not used, then push is redundant, and in large loops may increase the stack length, affecting performance and stability

    • The optimization method is to remove as many useless push instructions as possible. It mainly defines some rules to determine whether the return value of an instruction is used. It does not push the return value of an expression that is useless.

  • Optimize VM performance

    • Cache frequently used data: Fetching runtime context and reading/writing stack frames are frequent operations that involve reading thread-local storage data. Pointers to runtime context structures are accessed in only one thread. Refactoring code to cache them in Executor classes improves runtime efficiency.

    • Minimize the number of memory accesses for code in the core loop: Pull some data from the stack frame (such as IP) into the Executor class to reduce the number of unnecessary memory operations caused by frequent IP reads and writes.

    • Maximize code efficiency at core loop: use array instead of map to realize instruction to instruction processor mapping, improve runtime query efficiency.

    • After optimization, it increased nearly twice than before optimization

III. Future plans

The linker

  • Multiple.bin files corresponding to multiple.m files can be linked to a.bin file, and each.bin file can call each other

Other syntax support

  • Supports syntax for declaring blocks

Performance optimization

  • Remove virtual function calls
  • Instruction length alignment
  • Ensure that sp, IP, and other frequently manipulated data are put into registers — assemble the relevant core methods, or change the code structure
  • Optimization of assembly code generation (peephole optimization, etc.)
  • other

The Debug tool

  • Since virtual machine execution is executed according to instructions, it is not easy to directly correspond to Objective-C code, which is a bit troublesome to debug. In the future, we plan to do some functions to easily display the corresponding relationship between instruction address and Objective-C source code during debugging.

Finally, welcome everyone to pay attention to my wechat public number, free to communicate more