I have already written several articles about hook system calls under Linux, 1,2,3 are based on x86_64 platform, this article will first introduce how to hook system calls under arm64 platform, and finally hand lift a simple example.

The experimental environment of this paper is on galaxy Kirin server V10 system:

[root@localhost ~]# cat /etc/os-release NAME="Kylin Linux Advanced Server" VERSION="V10 (Tercel)" ID="kylin" VERSION_ID="V10" PRETTY_NAME="Kylin Linux Advanced Server V10 (Tercel)" ANSI_COLOR="0; 31"Copy the code

The kernel version is 4.19.90-17.ky10. Aarch64, based on aARCH64 platform:

[root@localhost ~]# cat /proc/version Linux version 4.19.90-17.ky10.aarch64 ([email protected]) (GCC Version 7.3.0 (GCC)) #1 SMP Sun Jun 28 14:27:40 CST 2020 [root@localhost ~]# uname -a Linux localhost. Localdomain Aarch64 #1 SMP Sun Jun 28 14:27:40 CST 2020 AARCH64 AARCH64 GNU/Linux [root@localhost ~]#Copy the code

The hardware environment is:

[root@localhost ~]# lSCPU: aARCH64 CPU: 64-bit byte order: Little Endian CPU: 4 Online CPU list: 0-3 Number of threads per core: 1 Number of cores per seat: 4 NUMA node: 1 Vendor ID: HiSilicon Model: 0 Model name: Kunpeng-920 Step: 0x1 Maximum CPU MHz: 2600.0000 CPU Min MHz: 200.0000 BogoMIPS: 200.00 L1d Cache: 256 KiB L1i cache: 256 KiB L2 cache: 2 MiB L3 cache: 32 MiB NUMA Node 0 CPU: 0-3...Copy the code

X86_64 hook system call (x86_64)

1. Obtain the first address of the system call table;

2. Replace the address of the system call to hook in the system call table with a custom function;

3. Implement custom functions.

The difficulty and difference lies in the second point, because the system call table is located in the read-only memory area of the kernel and cannot be modified directly.

On x86, the cr0 register can be changed by changing the value of the cr0 register, and some kernel versions provide functions to change the value of the cr0 register. We can temporarily change the read-only area to writable, and then change the read-only property back after the replacement.

But in ARM64, there is no cr0 register, how to change the value of this read-only area?

1. Determine the memory region where the system call table resides

1. The entire kernel read-only memory area

Because the system call table is in the read-only area of the kernel, we can do this by modifying the read/write properties of the entire read-only area.

The area between the kernel’s __start_rodata and __end_rodata.

[root@localhost ~]# grep -nr rodata /boot/System.map-4.19.90-17.ky10.aarch64 
...
39608:ffff000008a30000 D __start_rodata
...
65911:ffff000008f30000 R __end_rodata
...
[root@localhost ~]# 
Copy the code

2. Specify the memory area where the system call address resides

Modify only the pointer area read and write properties of the corresponding system call in the specific system call table.

For example, change the read and write properties of only the page where sys_call_table_ptr+__NR_openat is located.

Modify the read and write attributes of the memory region

1. Try using set_memory_ro/rw to modify read-only properties

Let’s start with the kernel source code implementation and see what the code does, as follows:

/arch/arm64/mm/pageattr.c

int set_memory_ro(unsigned long addr, int numpages)
{
	return change_memory_common(addr, numpages,
					__pgprot(PTE_RDONLY),
					__pgprot(PTE_WRITE));
}

int set_memory_rw(unsigned long addr, int numpages)
{
	return change_memory_common(addr, numpages,
					__pgprot(PTE_WRITE),
					__pgprot(PTE_RDONLY));
}
Copy the code

Set_memory_ro /rw calls change_memory_common. In the same source file, the code for Change_memory_common is as follows:

/* * This function assumes that the range is mapped with PAGE_SIZE pages. */ static int __change_memory_common(unsigned long start, unsigned long size, pgprot_t set_mask, pgprot_t clear_mask) { struct page_change_data data; int ret; data.set_mask = set_mask; data.clear_mask = clear_mask; ret = apply_to_page_range(&init_mm, start, size, change_page_range, &data); flush_tlb_kernel_range(start, start + size); return ret; } static int change_memory_common(unsigned long addr, int numpages, pgprot_t set_mask, pgprot_t clear_mask) { unsigned long start = addr; unsigned long size = PAGE_SIZE * numpages; unsigned long end = start + size; struct vm_struct *area; int i; if (! PAGE_ALIGNED(addr)) { start &= PAGE_MASK; end = start + size; WARN_ON_ONCE(1); } /* * Kernel VA mappings are always live, and splitting live section * mappings into page mappings may cause TLB conflicts. This means * we have to ensure that changing the permission bits of the range * we are operating on does not result in such splitting. * * Let's restrict ourselves to mappings created by vmalloc (or vmap). * Those are guaranteed to consist entirely of page mappings, and * splitting is never needed. * * So check whether the [addr, addr + size) interval is entirely * covered by precisely one VM area that has the VM_ALLOC flag set. */ area = find_vm_area((void *)addr); if (!area || end > (unsigned long)area->addr + area->size || !(area->flags & VM_ALLOC)) return -EINVAL; if (!numpages) return 0; /* * If we are manipulating read-only permissions, apply the same * change to the linear mapping of the pages that back this VM area. */ if (rodata_full && (pgprot_val(set_mask) == PTE_RDONLY || pgprot_val(clear_mask) == PTE_RDONLY)) { for (i = 0; i < area->nr_pages; i++) { __change_memory_common((u64)page_address(area->pages[i]), PAGE_SIZE, set_mask, clear_mask); } } /* * Get rid of potentially aliasing lazily unmapped vm areas that may * have permissions set that deviate from the ones we are setting here. */ vm_unmap_aliases(); return __change_memory_common(start, size, set_mask, clear_mask); }Copy the code

We see the following comment:

    /*

* Kernel VA mappings are always live, and splitting live section

* mappings into page mappings may cause TLB conflicts. This means

* we have to ensure that changing the permission bits of the range

* we are operating on does not result in such splitting.

*

* Let’s restrict ourselves to mappings created by vmalloc (or vmap).

* Those are guaranteed to consist entirely of page mappings, and

* splitting is never needed.

*

* So check whether the [addr, addr + size) interval is entirely

* covered by precisely one VM area that has the VM_ALLOC flag set.

*/

And this code:

if (! area || end > (unsigned long)area->addr + area->size || ! (area->flags & VM_ALLOC)) return -EINVAL;Copy the code

As you can see from this comment and the following code, set_memory_ro/ RW only applies to virtual memory mapped areas created by vMALloc or VMAP, but the system call table is in the kernel read-only area, not the virtual memory generated by vMALloc or VMAP. So the set_memory_ro/rw function cannot be used.

2. Try using update_mapping_prot

/arch/arm64/mm/mmu.c

static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
				phys_addr_t size, pgprot_t prot)
{
	if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
		pr_warn("BUG: not updating mapping for %pa at 0x%016lx - outside kernel range\n",
			&phys, virt);
		return;
	}

	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
			     NO_CONT_MAPPINGS);

	/* flush the TLBs after updating live kernel mappings */
	flush_tlb_kernel_range(virt, virt + size);
}
Copy the code

In experimental tests, update_mapping_PROt successfully modified the read and write properties of the memory region where the system call table resides.

Three, code examples

1. Define relevant variables

void (*update_mapping_prot)(phys_addr_t phys, unsigned long virt, phys_addr_t size, pgprot_t prot);
unsigned long start_rodata, end_rodata;
#define section_size  (end_rodata - start_rodata)
Copy the code

2. Obtain the value from kallsyms_lookup_name

update_mapping_prot = (void *)kallsyms_lookup_name("update_mapping_prot");
start_rodata = (unsigned long)kallsyms_lookup_name("__start_rodata");
end_rodata= (unsigned long)kallsyms_lookup_name("__end_rodata");
printk("%s. update_mapping_prot:%lx, start_rodata:%lx, end_rodata:%lx.\n", update_mapping_prot, start_rodata, end_rodata);
Copy the code

Note: If __start_rodata and __end_rodata are not available with kallsyms_lookup_name, they can be read in the system.map file as described in part 2 of this article.

3. Use update_mAPPing_prot to rewrite the read-only attribute of the system call table region

static void disable_wirte_protection(void)
{
    update_mapping_prot(__pa_symbol(start_rodata), (unsigned long)start_rodata, section_size, PAGE_KERNEL);
    return ;
}

static void enable_wirte_protection(int val)
{
    update_mapping_prot(__pa_symbol(start_rodata), (unsigned long)start_rodata, section_size, PAGE_KERNEL_RO);
    return ;
}
Copy the code

4. Save the original openAT system call entry address of the system

old_openat_func = (openat_t)sys_call_table_ptr[__NR_openat];
Copy the code

5. Replace system call entry with custom function

preempt_disable();
disable_wirte_protection();

sys_call_table_ptr[__NR_openat] = (openat_t)my_stub_openat;

enable_wirte_protection();
preempt_enable();
Copy the code

6. Custom system call implementation

asmlinkage long my_stub_openat(const struct pt_regs *pt_regs) { atomic_inc(&ref_count); long value = -1; char kfilename[80] = {0}; int dfd = (int)p_regs->regs[0]; char __user *filename = (char*)p_regs->regs[1]; int flags = (int)p_regs->regs[2]; int mode = (int)p_regs->regs[3]; value = old_openat_func(pt_regs); copy_from_user(kfilename, filename, 80); printk("%s. process:[%d:%s] open file:%s.\n\t-----> open flags:0x%0x, open %s, fd:%d.\n", __FUNCTION__, current->tgid, current->group_leader->comm, kfilename, flags, value>=0?" sucess":"fail", value); openat_return: atomic_dec(&ref_count); return value; }Copy the code

Here, after we hook successfully, we only print information about the process executing Openat and the file to be opened.

7, when the module is uninstalled, replace the original system call address, otherwise the system will crash

static void patch_cleanup(void)
{
        preempt_disable();
        disable_wirte_protection();

        if(sys_call_table_ptr[__NR_openat] == my_stub_openat)
                sys_call_table_ptr[__NR_openat] = old_openat_func;

        enable_wirte_protection();
        preempt_enable();

        return ;
}
Copy the code

8. After compiling, load the module

Screenshot of running results:

9, postscript

In a recent test on uOS20 Desktop for ARM64 professional edition, it was found that the write protection failed to be disabled, and the kernel reported the following error:

[二 年 5月 18 14:54:08 2021] Kernel BUG at ARCH /arm64/mm/mmu.c:152! [2 月 18 月 18 14:54:08 2021] Modules linked in: lkm4arm64(O+) bluetooth ecdh_generic fuse st cfg80211 rfkill firmware_class nls_iso8859_1 nls_cp437 aes_ce_blk crypto_simd cryptd aes_ce_cipher crc32_ce crct10dif_ce ghash_ce aes_arm64 sha2_ce sha256_arm64 sha1_ce virtio_balloon qemu_fw_cfg binder_linux(O) ashmem_linux(O) efivarfs virtio_rng ip_tables x_tables btrfs xor raid6_pq hid_generic usbkbd  usbmouse usbhid rtc_efi virtio_blk virtio_scsi virtio_net net_failover failover button virtio_mmio [last unloaded: Lkm4arm64] [二 年 5月 18 14:54:08 2021] Process Insmod 1535, Stack Limit = 0x000000005AC3DDE5) [二 5月 18 14:54:08 2021] CPU: 1 PID: 1535 Comm: insmod Tainted: G O 4.19.0-arm64-desktop #3100 [二 5月 18 14:54:08 2021] Hardware Name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [二 5月 18 14:54:08 2021] pstate: 60400005 (nZCv DAIf + PAN-UAO) [二 月 18 14:54:08 2021] PC: alloc_init_pud+0x518/0x550 [二 月 18 14:54:08 2021] LR: Alloc_init_pud +0x494/0x550 [二 月 18 14:54:082021] sp: ffffad7ce3fefa80 [二 月 18 14:54:082021] x29: Ffffad7ce3fefa80x28: ffff7dfffe637FA8 [二 5月 18 14:54:08 2021] x27: ffff3d7d5b5c1fff x26: 0000000087A00000 [二 月 18 14:54:082021] x25: ffFF3D7d5b000000 x24:0068000000000F13 [二 月 18 14:54:082021] x23: 0000000087A00000 Ffff3d7d5b400000 x22: ffff3d7d5b5C2000 [二 月 18 14:54:08 2021] x21: ffff7dfffe6386d0 x20: ffff3d7d5b400000 x22: ffff3d7d5b5c2000 [二 月 18 14:54:08 2021] x21: ffff7dfffe6386d0 x20: Ffff3d7d5b5c2000 [二 月 18 14:54:082021] x19:0060000087a00f91 x18: ffFF3D7d5ba24000 [二 月 18 14:54:082021] x17: ffff3d7d5b5C2000 [二 月 18 14:54:082021] x17: 0000000000000000 x16:0000000000000000 [2 14:54:08 18th May 2021] x15:00000000 fffffff0 x14: Ffff3d7d5bd47648 [二 月 18 14:54:082021] x13: 0000000000000000 x12: ffFF3D7d5bd46000 [二 月 18 14:54:082021] x11: ffff3D7d5bd47648 [二 月 18 14:54:082021] x13: 0000000000000000 x12: FFFF3D7d5bd46000 [二 月 18 14:54:082021] x11: Ffff3d7d5ba24000 x10: ffff3d7d5bd46CA0 [二 5月 18 14:54:08 2021] x9:0000000000000000 x8: ffff3d7d5ba24000 x10: ffff3d7d5bd46ca0 [二 May 18 14:54:08 2021] x9:0000000000000000 x8: 0000000000000004 [二 May 18 14:54:08 2021] x7: ffFF3D7FFFFFFf x6: ffFF7dFFFE6386c8 [二 May 18 14:54:08 2021] x5: ffff3d7ffFFFFf x6: ffFF7dffFE6386c8 [二 5月 18 14:54:08 2021] x3:0000000000000000 x2: 0000000000000001 [二 月 18 14:54:082021] x0:0000000000000001 x0: 0060000087a00F91 [二 5月 18 14:54:08 2021] Call Trace: [二 月 18 14:54:082021] __create_pgd_mapping+0x98/0xe8 [二 月 18 14:54:082021] alloc_init_pud+0x518/0x550 [二 月 18 14:54:082021] __create_pgd_mapping+0x98/0xe8 [二 月 18 14:54:082021] __create_pgd_mapping+ 0x98/0xE8 14:54:082021] update_mapping_prot+0x48/ 0xD0 [二 5月 18 14:54:082021] disabLE_wirte_protection +0x54/0x88 [lkm4arm64] [二 月 18 14:54:082021] disable_wirte_protection+0x54/0x88 [lkm4arm64 Test_replace +0xe0/0x104 [lkm4ARM64] [2 月 18 日 14:54:082021] LKM_init + 0x20/0xD5c [Lkm4ARM64 18 14:54:08 2021] do_one_initCall +0x30/0x19c [二 5月 18 14:54:08 2021] do_one_initCall + 0x58/0x1C8 [二 5月 18 14:54:08 2021] Load_module + 0x128C /0x1490 [二 月 18 14:54:08 2021] __se_sys_finit_module+0x84/ 0xC8 [二 月 18 14:54:08 2021] __arm64_sys_finit_module+0x18/0x20 [二 月 18 14:54:08 2021] EL0_svc_common +0x90/0x160 [二 月 18 14:54:08 2021] El0_svc +0x8/ 0xa8 [二 月 18 14:54:08 2021] Code: A9025bf5 a90363f7 d4210000 d4210000 (d4210000) [二 月 18 14:54:08 2021] --[end trace a452DA0642349ffd]--Copy the code

The solution

In the mmu.c file, mark_rodata_ro calls update_mapping_prot to disable the kernel read-only code:

void mark_rodata_ro(void)
{
	unsigned long section_size;

	/*
	 * mark .rodata as read only. Use __init_begin rather than __end_rodata
	 * to cover NOTES and EXCEPTION_TABLE.
	 */
	section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata;
	update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
			    section_size, PAGE_KERNEL_RO);

	debug_checkwx();
}
Copy the code

The forbidden read-only area is __start_rodata to __init_begin. We can change our test program’s __end_rodata to __init_begin as well:

void (*update_mapping_prot)(phys_addr_t phys, unsigned long virt, phys_addr_t size, pgprot_t prot);
unsigned long start_rodata, init_begin;
#define section_size  (init_begin - start_rodata)

update_mapping_prot = (void *)kallsyms_lookup_name("update_mapping_prot");
start_rodata = (unsigned long)kallsyms_lookup_name("__start_rodata");
init_begin= (unsigned long)kallsyms_lookup_name("__init_begin");
printk("%s. update_mapping_prot:%lx, start_rodata:%lx, init_begin:%lx.\n", update_mapping_prot, start_rodata, init_begin);
Copy the code

After the change, recompile and load the kernel module properly.

Let’s take a closer look at the kernel mapping partition:

/* * Create fine-grained mappings for the kernel. */ static void __init map_kernel(pgd_t *pgdp) { static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext, vmlinux_initdata, vmlinux_data; /* * External debuggers may need to write directly to the text * mapping to install SW breakpoints. Allow this (only) when * explicitly requested with rodata=off. */ pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC; /* * Only rodata will be remapped with different permissions later on, * all other segments are allowed to use contiguous mappings. */ map_kernel_segment(pgdp, _text, _etext, text_prot, &vmlinux_text, 0, VM_NO_GUARD); map_kernel_segment(pgdp, __start_rodata, __inittext_begin, PAGE_KERNEL, &vmlinux_rodata, NO_CONT_MAPPINGS, VM_NO_GUARD); map_kernel_segment(pgdp, __inittext_begin, __inittext_end, text_prot, &vmlinux_inittext, 0, VM_NO_GUARD); map_kernel_segment(pgdp, __initdata_begin, __initdata_end, PAGE_KERNEL, &vmlinux_initdata, 0, VM_NO_GUARD); map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 0, 0); if (! READ_ONCE(pgd_val(*pgd_offset_raw(pgdp, FIXADDR_START)))) { /* * The fixmap falls in a separate pgd to the kernel, and doesn't * live in the carveout for the swapper_pg_dir. We can simply * re-use the existing dir for the fixmap. */ set_pgd(pgd_offset_raw(pgdp, FIXADDR_START), READ_ONCE(*pgd_offset_k(FIXADDR_START))); } else if (CONFIG_PGTABLE_LEVELS > 3) { /* * The fixmap shares its top level pgd entry with the kernel * mapping. This can really only occur when we are running * with 16k/4 levels, so we can simply reuse the pud level * entry instead. */ BUG_ON(! IS_ENABLED(CONFIG_ARM64_16K_PAGES)); pud_populate(&init_mm, pud_set_fixmap_offset(pgdp, FIXADDR_START), lm_alias(bm_pmd)); pud_clear_fixmap(); } else { BUG(); } kasan_copy_shadow(pgdp); }Copy the code

You can see: map_kernel_segment(pgdp, __start_rodata, __inittext_begin, PAGE_KERNEL, &vmlinux_rodata, NO_CONT_MAPPINGS, VM_NO_GUARD);

The kernel read-only area is actually from __start_rodata to __inittext_begin, so our module is safe by following the mark_rodata_ro function.

In fact, when we write a kernel module, many of the solutions or functions are already in the kernel source code, and our safest position is to simply port them over.

In this paper, to the end.

If you are interested, you can follow my wechat official account [Big Fat Chat programming]. There are more articles to share in my official account. You can also contact me in my official account to exchange and learn together.