The article links

The idea of writing an article using assembly on iOS has been on the back burner for a long time, but it hasn’t been done yet. There was an earlier effort to optimize startup time by intercepting objc_msgSend and inserting assembly instructions to count method call time, but that was about it. Just recently the project is doing security reinforcement, need to write more assembly to improve the security (assembly in the article using the command set ARM64), also have this article

Embedded assembly format

__asm__ [key words] (instructions: [output operand list] : [input operand list] : [the contaminated register list]);Copy the code

For example, there are three variables a, B, c, to implement a = b + c code, assembly code is as follows:

__asm__ volatile(
    "mov x0, %[b]\n"
    "mov x1, %[c]\n"
    "add x2, x0, x1\n"
    "mov %[a], x2\n"
    : [a]"=r"(a)
    : [b]"r"(b), [c]"r"(c)
);
Copy the code

volatile

The volatile keyword prevents the compiler from reoptimizing assembly code, but it makes little difference whether or not the compiled instructions are declared

The operand

Operands are in the format of “[limits]constraint” and are divided into permissions and qualifiers. For example, “=r” indicates that arguments are write-only and stored in the general purpose register

  • limits

    The keyword ideographic
    = Write only, used for output operands
    + Read/write, can only be used to output operands
    & The declaration register can only be used for output
  • constraint

    The keyword ideographic
    f Floating point registers F0 ~ F7
    G/H Floating-point constant instant number
    I/L/K Immediate number used in data processing
    J The value ranges from -4095 to 4095
    l/r Register r0 to r15
    M A constant from 0 to 32 over 2
    m Memory address
    w Vector registers S0 ~ S31
    X Operands of any type

instruction

Because ARM64 has a lot of instructions, you can read the instructions in the end of the article. Here are only some of the keywords in the instructions:

  • %0~%N / %[param]

    In the case of C code and assembly, the % header is used to correlate parameters. The parameter name can be declared by %[param], and the parameters can be sequentially matched using the anonymous parameter format %N (ABC parameters will match in 012 order) :

      __asm__ volatile(
          "mov x0, %1\n"
          "mov x1, %2\n"
          "add x2, x0, x1\n"
          "mov %0, x2\n"
          : "=r"(a)
          : "r"(b), "r"(c)
      );
    Copy the code

    In practice, the device does not necessarily support the anonymous parameter format of %N. It is recommended to use %[param] for readability

  • [reg]

    In most cases, the register stores the address where the data is stored. The register is wrapped with [], indicating that the stored value of the register is used as the address to access the data. The following instructions were to take out the address 0 x10086 stored data stored in the x1 registers, and then deposit to address 0 x100086 memory:

      "mov x0, #0x10086\n"
      "mov x1, [x0]\n"
      "mov x2, #0x100086\n"
      "str x1, [x2]\n"
    Copy the code
  • #1 / #0x1

    Use # to indicate immediate numbers (constants). Hexadecimal notation is recommended

Call specification

ARM64 call convention uses AAPCS64, parameters are stored in x0~ X7 registers from left to right, when more than 8 parameters, the excess is stacked from right to left, according to the size of the return value is stored in X0 / X8 return. Register rules are as follows:

register A special name The rules
r31 SP Store the top of stack address
r30 LR Store the return address of the function
r29 FP The store function uses the stack frame address
r19~r28 The register that the called needs to protect
r18 Platform register, not recommended as a temporary register
r17 IP1 In-process registers are not recommended as temporary registers
r16 IP0 Same as R17, also as soft interruptsvcSystem call parameters in
r9~r15 Temporary register (used to hold function addresses when a function address parameter is embedded in an assembly instruction)
r8 Return value register (same as R9 to R15 at other times)
r0~r7 Pass storage call parameter, r0 can be used as return value register
NZCV Status register

In actual combat

Commissioning test

In iOS application security hardening, the sysctl + kinfo_proc scheme can be used to check whether the application is debugged:

__attribute__((__always_inline)) bool checkTracing() {
    size_t size = sizeof(struct kinfo_proc);
    struct kinfo_proc proc;
    memset(&proc, 0, size);
    
    int name[4];
    name[0] = CTL_KERN;
    name[1] = KERN_PROC;
    name[2] = KERN_PROC_PID;
    name[3] = getpid();
    
    sysctl(name, 4, &proc, &size, NULL, 0);
    return proc.kp_proc.p_flag & P_TRACED;
}
Copy the code

However, it is not safe to use syscTL directly due to fishhook’s scheme for modifying lazy addresses, so most developers will replace this call with inline assembly:

size_t size = sizeof(struct kinfo_proc);
struct kinfo_proc proc;
memset(&proc, 0, size);

int name[4];
name[0] = CTL_KERN;
name[1] = KERN_PROC;
name[2] = KERN_PROC_PID;
name[3] = getpid();

__asm__(
    "mov x0, %[name_ptr]\n"
    "mov x1, #4\n"
    "mov x2, %[proc_ptr]\n"
    "mov x3, %[size_ptr]\n"
    "mov x4, #0x0\n"
    "mov x5, #0x0\n"
    "mov w16, #202\n"
    "svc #0x80\n"
    :
    :[name_ptr]"r"(&name), [proc_ptr]"r"(&proc), [size_ptr]"r"(&size)
);

return proc.kp_proc.p_flag & P_TRACED;
Copy the code

Hit the pit

A fatal problem with C code inline assembly development is that function entries push temporary variables and store them in registers. When the above mixed code is actually run, the following happens:

Add x1, sp, #0x34 // x1, proc add x2, sp, #020 // x2, size...... Mov x3, x2 mov x4, #0x0 mov x5, #0x0 mov x5, #0x0 mov x4, #0x0 mov x5, #0x0 mov x12, #0xca svc #0x80Copy the code

Due to the temporary variable order problem, the compiled code caused SVC to interrupt sySCTL call and failed to pass in the correct parameters, and finally froze the application

repair

Insert temporary variable

A corresponding table is obtained from the compiled instruction:

variable register Entry register
name x0 x0
proc x1 x2
size x2 X3

If you can keep the register in which the temporary variable is stored the same as the input parameter register when the SVC interrupts, it will not be corrupted

ARM64 call convention, arguments are pushed from right to left

Proc (name), proc (size), proc (size), proc (name), proc (size);

size_t size = sizeof(struct kinfo_proc);
struct kinfo_proc proc;
memset(&proc, 0, size);

int placeholder;
int name[4];
name[0] = CTL_KERN;
name[1] = KERN_PROC;
name[2] = KERN_PROC_PID;
name[3] = getpid();
Copy the code

The compiled instruction becomes:

// add x0, sp, #0x34; // add x0, sp, #0x34; Proc add x3, sp, #020 // x3 add size...... Mov x0, x0 mov x1, #4 mov x2, x2 mov x3, x3 mov x4, #0x0 mov x5, #0x0 mov x12, #0xca SVC #0x80Copy the code

Modify instruction order

The instruction to set the input parameter will destroy the existing value in the register, so ensure that the register is not damaged before setting the input parameter:

__asm__(
    "mov x0, %[name_ptr]\n"
    "mov x3, %[size_ptr]\n"
    "mov x2, %[proc_ptr]\n"
    "mov x1, #4\n"
    "mov x4, #0x0\n"
    "mov x5, #0x0\n"
    "mov w16, #202\n"
    "svc #0x80\n"
    :
    :[name_ptr]"r"(&name), [proc_ptr]"r"(&proc), [size_ptr]"r"(&size)
);
Copy the code

The compiled instructions are as follows:

/ / inline assembler mov x0, x0 / / x0 save name mov x3, x2 / / x3 save size mov x2, x1 x2 stored proc/mov x1, mov # 4 x4, # 0 x0 mov x5, #0x0 mov x12, #0xca svc #0x80Copy the code

Full assembly implementation

There is no guarantee of which registers will be destroyed when mixed with C code, so it is a good choice to implement the whole logic directly in assembly. There are two things to note:

  1. To ensure that no outbound instruction is generated before or after the function call, use__attribute__((naked))To deal with
  2. All variables are stored on the stack, and the use of the stack needs to be controlled
  3. Use secure registers (r19~r28)

To determine how much stack space is required, use sysctl(name, 4, &proc, &size, NULL, 0)

  • parameternameA total footprint4 * intSpace, let’s call it0x10
  • parameterprocinarm64Next,sizof()Calculated length is0x288
  • parameter&sizePointer length is0x8
  • A total of0x2a0

During function entry, the FP/LR register needs to be pushed to ensure that the function can exit correctly. In addition, a total of 10 registers r19~ R28 need to be pushed into the protection, and finally get the stack space diagram when the function is running:

---------- | FP | ---------- sp + 0x2f8 | LR | ---------- sp + 0x2f0 | r20 | ---------- sp + 0x2e8 | r19 | ---------- sp  + 0x2e0 | r22 | ---------- sp + 0x2d8 | r21 | ---------- sp + 0x2d0 | r24 | ---------- sp + 0x2c8 | r23 | ---------- sp  + 0x2c0 | r26 | ---------- sp + 0x2b8 | r25 | ---------- sp + 0x2b0 | r28 | ---------- sp + 0x2a8 | r27 | ---------- sp  + 0x2a0 | p_size | ---------- sp + 0x298 | proc | ---------- sp + 0x10 | name | ---------- spCopy the code

After saving r19~ R28 registers, use five of them to save some parameters:

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - | | | parameter register -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - | name | r19 | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - | proc | r20 | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - |  p_size | r21 | ------------------ | size | r22 | ------------------ | sp | r23 | ------------------ | temp | r24 | ------------------Copy the code

After confirming the use of space on the stack, we can start to implement it step by step:

Function outlet

There are two things you are responsible for in and out of the function: FP/LR and R19 to R28

__asm__ volatile( "stp x29, x30, [sp, #-0x10]! \n" "stp x19, x20, [sp, #-0x10]! \n" "stp x21, x22, [sp, #-0x10]! \n" "stp x23, x24, [sp, #-0x10]! \n" "stp x25, x26, [sp, #-0x10]! \n" "stp x27, x28, [sp, #-0x10]! \n" ...... "ldp x19, x20, [sp], #0x10\n" "ldp x21, x22, [sp], #0x10\n" "ldp x23, x24, [sp], #0x10\n" "ldp x25, x26, [sp], #0x10\n" "ldp x27, x28, [sp], #0x10\n" "ldp x29, x30, [sp], #0x10\n" );Copy the code

Stack space

A total of 0x2A0 space is used for temporary variables, and five registers are required to hold variables

__asm__ volatile( ...... "Sub sp, sp, #0x2a0\n" // x20 = name "add, x20, sp, #0x10\n" // x20 = proc "add, x21, sp, #0x298\n" // x21 = p_size "mov x22, #0x288\n" // x22 = size "mov x23, sp\n" // x23 = sp "str x22, [x21]\n" // p_size = &size "add sp, sp, #0x2a0\n" ...... ) ;Copy the code

kinfo_proc

After determining the proc memory, you need to add:

size_t size = sizeof(struct kinfo_proc);
struct kinfo_proc proc;
memset(&proc, 0, size);
Copy the code

Proc is stored in x20, x22 is stored in size, and memset takes three parameters:

__asm__ volatile(
    ......
    
    "mov x24, %[memset_ptr]\n"
    "mov x0, x20\n"
    "mov x1, #0x0\n"
    "mov x2, x12\n"
    "blr x24\n"
    
    ......
    :
    :[memset_ptr]"r"(memset)
);
Copy the code

name

Since name is an int array, when its storage location is clear, four 4-byte parameters need to be stored in the corresponding memory location, and their location distribution is as follows:

------------- | name[3] | ------------- sp + 0xc | name[2] | ------------- sp + 0x8 | name[1] | ------------- sp + 0x4 |  name[0] | ------------- spCopy the code

In addition, name needs to be configured with getPID (), which can be obtained by SVC interrupt (SVC system call parameters can be refer to Kernel Syscalls in extended reading).

#define CTL_KERN 1 #define KERN_PROC 14 #define KERN_PROC_PID 1 __asm__ volatile( ...... Mov x0, #0\n" "mov w16, #20\n" "mov x3, x0\n" // name[3]=getpid() #0xe\n" "mov x2, #0x1\n" "str w0, [x23, 0x0]\n" "str w1, [x23, 0x4]\n" "str w2, [x23, 0x8]\n" "str w3, [x23, 0xc]\n" ...... ) ;Copy the code

sysctl

Sysctl = sysctl; sysctl = sysctl;

__asm__ volatile( ...... "mov x0, x19\n" "mov x1, #0x4\n" "mov x2, x20\n" "mov x3, x21\n" "mov x4, #0x0\n" "mov x5, #0x0\n" "mov w16, #202\n" "svc #0x80\n" ...... ) ;Copy the code

Flag detection

P_flag; P_TRACED; P_TRACED; P_TRACED;

struct extern_proc {
    union {
        struct {
            struct  proc *__p_forw; /* Doubly-linked run/sleep queue. */
            struct  proc *__p_back;
        } p_st1;
        struct timeval __p_starttime;   /* process start time */
    } p_un;
    
    #define p_forw p_un.p_st1.__p_forw
    #define p_back p_un.p_st1.__p_back
    #define p_starttime p_un.__p_starttime
    
    struct  vmspace *p_vmspace;     /* Address space. */
    struct  sigacts *p_sigacts;     /* Signal actions, state (PROC ONLY). */
    int     p_flag;                 /* P_* flags. */
    char    p_stat;                 /* S* process status. */
    pid_t   p_pid;                  /* Process identifier. */
    pid_t   p_oppid;         /* Save parent pid during ptrace. XXX */
    int     p_dupfd;         /* Sideways return value from fdopen. XXX */
    /* Mach related  */
    caddr_t user_stack;     /* where user stack was allocated */
    void    *exit_thread;   /* XXX Which thread is exiting? */
    int             p_debugger;             /* allow to debug */
    boolean_t       sigwait;        /* indication to suspend */
    /* scheduling */
    u_int   p_estcpu;        /* Time averaged value of p_cpticks. */
    int     p_cpticks;       /* Ticks of cpu time. */
    fixpt_t p_pctcpu;        /* %cpu for this process during p_swtime */
    void    *p_wchan;        /* Sleep address. */
    char    *p_wmesg;        /* Reason for sleep. */
    u_int   p_swtime;        /* Time swapped in or out. */
    u_int   p_slptime;       /* Time since last blocked. */
    struct  itimerval p_realtimer;  /* Alarm timer. */
    struct  timeval p_rtime;        /* Real time. */
    u_quad_t p_uticks;              /* Statclock hits in user mode. */
    u_quad_t p_sticks;              /* Statclock hits in system mode. */
    u_quad_t p_iticks;              /* Statclock hits processing intr. */
    int     p_traceflag;            /* Kernel trace points. */
    struct  vnode *p_tracep;        /* Trace to vnode. */
    int     p_siglist;              /* DEPRECATED. */
    struct  vnode *p_textvp;        /* Vnode of executable. */
    int     p_holdcnt;              /* If non-zero, don't swap. */
    sigset_t p_sigmask;     /* DEPRECATED. */
    sigset_t p_sigignore;   /* Signals being ignored. */
    sigset_t p_sigcatch;    /* Signals being caught by user. */
    u_char  p_priority;     /* Process priority. */
    u_char  p_usrpri;       /* User-priority based on p_cpu and p_nice. */
    char    p_nice;         /* Process "nice" value. */
    char    p_comm[MAXCOMLEN + 1];
    struct  pgrp *p_pgrp;   /* Pointer to process group. */
    struct  user *p_addr;   /* Kernel virtual addr of u-area (PROC ONLY). */
    u_short p_xstat;        /* Exit status for wait; also stop signal. */
    u_short p_acflag;       /* Accounting flags. */
    struct  rusage *p_ru;   /* Exit information. XXX */
};
Copy the code

The size of union p_un is 0x10, and the two Pointers before p_flag occupy 0x8 respectively.

------------------- | p_flag | ------------------- kinfo_proc + 0x20 | p_sigacts | ------------------- kinfo_proc + 0x18  | p_vmspace | ------------------- kinfo_proc + 0x10 | union p_un | ------------------- kinfo_procCopy the code

Matching flags and storing detection results in x0 returns:

#define P_TRACED 0x00000800 __asm__ volatile( ...... "ldr, x24, [x20, #0x20]\n" // x24 = proc.kp_proc.p_flag "mov x25, #0x800\n" // x25 = P_TRACED "blc x0, x24, x25\n" // x0 = x24 & x25 ...... ) ;Copy the code

Further reading

Kernel_Syscalls

ARM64 architecture on/off stack operations

Deep into the CPU register at the bottom of iOS

Procedure Call Standard for the ARM 64-bit Architecture