Write the OS kernel from scratch - system call

Series directory

introductory
The preparatory work
The BIOS is booted into real mode
GDT and Protected Mode
Preliminary Study on virtual memory
Load and enter the kernel
Display and print
The global descriptor table GDT
Interrupt handling
Virtual memory improvement
Implement heap and malloc
The first kernel thread
Multithreaded running and switching
Lock synchronizes with multiple threads
Enter user mode
Process implementation
The system calls
Simple file system
Load the executable
The keyboard driver
To run a shell

The system calls

Following on from the process implementation, we’ll start to actually create the process using the familiar fork system call, so we first need to set up a framework for system calls.

The concept of system call is unnecessary. It is an external functional interface provided by the kernel for users and the main way for users to proactively request to invoke kernel functions. Since it is from user to kernel mode, it needs to be triggered by interrupts. Following the classic Linux 32-bit approach, we will also use an int 0x80 soft interrupt to access Syscall.

Since Syscall is for user use, its implementation consists of two parts:

User part: unified function interface, the bottom layer is to trigger interrupt by int 0x80;
Kernel part: similar to normal interrupt handling;

The user interface

First look at the implementation of the User section. Note that this section of code is compiled and linked into the User program, not the kernel. It will be packaged as a standard library, which we will link into later when we write the User program.

The code in this section is mainly composed of the following files, according to the top-level calling relationship:

Syscall. h and syscall.c, where the user layer interfaces are;
Syscall_trigger.S, which is the implementation of interrupt triggering and passing parameters;

In syscall.c, the user calls the syscall function directly, which is similar to what we normally use in Linux:

int32 fork();
int32 exec(char* path, uint32 argc, char* argv[]);

Their underlying layer calls the trigger function provided by syscall_trigel.s, which is where the actual syscall interrupt is triggered and the parameters are passed.

Syscall uses the same int 0x80 interrupt trigger, but because there are many syscAll, each syscAll has a number. For example:

SYSCALL_FORK_NUM   equ  1
SYSCALL_EXEC_NUM   equ  2

In addition, syscall is different from normal interrupts in that it needs to pass parameters. For this reason, we define multiple macro templates in syscall_trigger.

%macro DEFINE_SYSCALL_TRIGGER_0_PARAM 2
  [GLOBAL trigger_syscall_%1]
  trigger_syscall_%1:
    mov eax, %2
    int 0x80
    ret
%endmacro
        
DEFINE_SYSCALL_TRIGGER_0_PARAM   fork,   SYSCALL_FORK_NUM

This actually gives you the underlying trigger implementation for fork:

[GLOBAL trigger_syscall_fork]
trigger_syscall_fork:
  mov eax, SYSCALL_FORK_NUM
  int 0x80
  ret

So syscAll is essentially going to take a parameter, and at the very least, we’re going to use eax to hold the syscall number. If syscall itself has arguments, then other registers are used, such as ECx, edx, ebx, etc., but this is all artificial.

For example, exec takes three arguments:

%macro DEFINE_SYSCALL_TRIGGER_3_PARAM 2
  [GLOBAL trigger_syscall_%1]
  trigger_syscall_%1:
    push ebx

    mov eax, %2
    mov ecx, [esp + 8]
    mov edx, [esp + 12]
    mov ebx, [esp + 16]
    int 0x80

    pop ebx
    ret
%endmacro

DEFINE_SYSCALL_TRIGGER_3_PARAM   exec,  SYSCALL_EXEC_NUM

We use ecx, edx, and ebx to pass trigger_syscall_exec. Note that ebX does push save here, because according to x86 calling Convention, EBX is a Callee-saved register, which needs to be saved and restored actively.

With the registers and parameters ready, the trigger function then uses int 0x80 to trigger the interrupt, which is the unified entry point for the system call and then goes into the kernel’s processing flow.

The kernel with the syscall

The main code for this section is the following file:

Syscall_wrapper. S is the unified entry for syscall processing.
Syscall_imp. h and syscall_imp. c are true implementations of individual syscall processes;

Prior to this, of course, the syscall is an interrupt, so first to register 0 x80 interrupt handler function, in the SRC/interrupt/interrupt. C, the entrance is syscall_entry function:

set_idt_gate(SYSCALL_INT_NUM,
             (uint32)syscall_entry,
             SELECTOR_K_CODE,
             IDT_GATE_ATTR_DPL3);

Look at the syscall_entry function. It is basically the same as the normal interrupt entry function. It is also divided into two parts.

The first half is to save the user’s context, including all the general purpose registers, segment registers, etc., and then call syscall_handler to enter the real Syscall distribution processing.

syscall_entry:
  ; push dummy to match struct isr_params_t
  push byte 0
  push byte 0
  ; save common registers
  pusha
  ; save original data segment
  mov cx, ds
  push ecx
  ; load the kernel data segment descriptor
  mov cx, 0x10
  mov ds, cx
  mov es, cx
  mov fs, cx
  mov gs, cx

  sti  ; allow interrupt during syscall
  call syscall_handler

The bottom half is the return, which is also similar to the interrupt return, restoring all the registers saved above. Eax does not pop because syscall returns a value. It is eax that holds the value returned by syscall_handler:

syscall_exit:
  ; recover the original data segment.
  ; Do NOT use eax because it's the syscall ret value!
  pop ecx
  mov ds, cx
  mov es, cx
  mov fs, cx
  mov gs, cx

  pop  edi
  pop  esi
  pop  ebp
  pop  esp
  pop  ebx
  pop  edx
  pop  ecx
  ; skip eax because it is used as return value
  ; for syscall_handler
  add  esp, 4

  ; pop dummy values
  add esp, 8

  ; pop cs, eip, eflags, user_ss, and user_esp by processor
  iret

Syscall_handler is the actual syscall distribution handler. It takes the syscall number from the eax argument and finds the corresponding syscall implementation:

int32 syscall_handler(isr_params_t isr_params) { // syscall num saved in eax. // args list: ecx, edx, ebx, esi, edi uint32 syscall_num = isr_params.eax; switch (syscall_num) { case SYSCALL_FORK_NUM: return syscall_fork_impl(); case SYSCALL_EXEC_NUM: return syscall_exec_impl((char*)isr_params.ecx, isr_params.edx, (char**)isr_params.ebx); default: PANIC(); }}

Note that syscall_handler takes an entire isr_params structure as an argument to the entire interrupt stack, just like the normal interrupt handler:

For normal interrupts, the value of the general purpose register stored on the stack is used to save and restore the context information before the interrupt occurred. In the case of syscall, however, their function has changed. Some of them are actually passed as arguments to syscall, and in the syscall_handler above they are taken out for use by the individual syscall handlers.

Remember, where are the register values that are passed in? In each of the trigger_syscall_xxx functions that trigger syscall on the user side, where we assign the initial arguments to each register from the user call to syscall:

trigger_syscall_exec:
  push ebx

  mov eax, %2
  mov ecx, [esp + 8]
  mov edx, [esp + 12]
  mov ebx, [esp + 16]
  
  int 0x80

  pop ebx
  ret

Here we need to clarify the entire parameter passing chain for Syscall:

In the trigger section on the user side, the parameters are stored in the general purpose registers;
When an interrupt is triggered, the values of these registers are pushed into the kernel stack and encapsulated inisr_paramsIn the structure, we end up with PI to PIsyscall_handlerFunctions;

Notice also that if passed arguments use the Callee-saved register, their values are also saved first in the User stack, such as ebx above. This actually means that part of the user context saving and recovery is done by trigger_syscall_xxx on the User stack, not after the interrupt, because some of the register values are stored on the interrupt stack, The values that will be used later in the syscall pass will be overwritten, so they must be saved on the User stack in advance. This is where Syscall differs from normal interrupts.

The fundamental reason for this is that syscall is actively initiated rather than unpredictable like a normal interrupt, so it’s more like a normal function call. As long as the user follows the x86 Calling Convention and leisurely saves the Callee-saved registers on his stack, he can then use those registers to pass parameters. Finally, int 0x80 is used to trigger an interrupt and enter the kernel stack for processing.

The realization of the fork

With all that said, let’s implement the first syscall: fork.

In syscall_handler, the fork is assigned to the syscall_fork_Impl function, which is implemented by the process_fork function defined in SRC /task/process.c.

You should be familiar with the use of fork under Linux:

int pid = fork();
if (pid > 0) {
  // parent process
} else if (pid == 0) {
  // child process
} else {
  // fork failed
}

Unfortunately, our first system call fork is a bit more complicated. Fork creates a new child process that is the same as the parent process. They both return from fork and continue execution. The difference is the return value. The parent process returns the PID of the created child, and the child process returns 0.

First, the create_process function is called to create a new process structure, and the corresponding fields are initialized. Note, however, that the child page directory is copied from the parent, so that they can share the virtual memory space:

pcb_t* create_process(char* name, uint8 is_kernel_process) { pcb_t* process = (pcb_t*)kmalloc(sizeof(pcb_t)); memset(process, 0, sizeof(pcb_t)); / /... process->page_dir = clone_crt_page_dir(); }

The most important function, fork_crt_thread, is to copy the current kernel stack and set the stack to look like a new thread. Then the child thread can start up as a new thread later. It’s the first time it’s started, but it looks as if it’s returned from fork, just like parent.

Recall the stack when the kernel thread started:

The stack starts at kernel_ESP, goes up to the pop universal register, and then jumps to start_eip. Here we set the start_eIP of the child thread to syscall_fork_exit:

thread->kernel_esp = kernel_stack + KERNEL_STACK_SIZE
    - (sizeof(interrupt_stack_t) + sizeof(switch_stack_t));

switch_stack_t* switch_stack = (switch_stack_t*)thread->kernel_esp;
switch_stack->thread_entry_eip = (uint32)syscall_fork_exit;

The syscall_fork_exit function, or more specifically syscall_fork_child_exit, is used when the child process returns after fork. The difference between syscall and the normal syscall return is the general register recovery part:

  pop edi
  pop esi
  pop ebp
  ; Do NOT pop old esp!
  ; Child process is its own stack, not parent's.
  add esp, 4
  pop ebx
  pop edx
  pop ecx
  ; child process returns 0.
  mov eax, 0
  add esp, 4

Esp and EAX have special treatment:

Stored on the stackespThe value of “esp” is the parent esp, and the child has already allocated its own stack, so skip it.
eaxAs aforkMust be 0 in the case of child;

In this case, the CPU will restore the status of the user thread before syscall:

User thread code + stack information:

Code: Saved incs + eip;
Stack: Saved inuser_ss + user_esp;

This is the same information as the parent’s stack, because the child’s kernel stack is copied from the parent. This is why child, when it returns to user, can continue running from the end of the fork, just like parent, as if parent had mirrored a task for itself. Of course, their virtual memory space is isolated, using the copy-on-write mechanism described in the previous article.

When the parent fork_crt_thread finishes creating the child process after the parent fork_crt_thread, it returns the pid of the newly created child process:

// Create new process and fork current thread.
pcb_t* process = create_process(nullptr);  
tcb_t* thread = fork_crt_thread();
if (thread == nullptr) {
  return -1;
}

// Bind child thread to child process.
add_process_thread(process, thread);

// Add child thread to scheduler to run.
add_thread_to_schedule(thread);

// Parent should return the new pid.
return process->id;

Parent returns from syscall, and child’s kernel stack is modified to run as if thread had started for the first time, but there are two important points to note:

Its interrupt stack must be consistent with that of parent, so that when the interrupt returns, the user Thread runs in the same environment as parent. So when the child goes back to the User state, it looks like a task that continues to run just like the parent, which is also trueforkThe original intention.
The return value must be 0;

conclusion

The content of this article is a bit much, the first is syscall framework implementation, to distinguish the user side and kernel side respectively functional responsibilities, as well as syscall and common interrupt similarities and differences. On top of that, we’ve implemented syscall’s most challenging fork, which hopefully will give you a better understanding of syscall’s entry and return mechanisms.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Write the OS kernel from scratch – system call

Series directory

The system calls

The user interface

The kernel with the syscall

The realization of the fork

conclusion

Write the OS kernel from scratch – system call

Series directory

The system calls

The user interface

The kernel with the syscall

The realization of the fork

conclusion

Related Posts

The principle of JIT is briefly introduced

6. Constants in C language

STM32 timer parameter setting (TIM_Prescaler, TIM_Period)