Setjmp and LongjMP in C language are used to realize exception capture and coroutine

Pay attention to + star standard public number, miss the latest article

[TOC]

One, foreword

In the C standard library, there are two powerful functions: setjmp and longjmp. I wonder if you have used them in the code? I asked several colleagues of the body, some of them do not know these two functions, and some of them know this function but have never used it.

From the knowledge point of view, the function of these two functions is relatively simple, a simple example code can be clear. However, we need to diverge and think from this knowledge point, and associate and compare this knowledge point with other similar knowledge points in this programming language in different dimensions. Compare with similar concepts in other programming languages; Then think about where you can use the knowledge and how others are using it.

Today, we’re going to try to pull these two functions apart. They don’t work in general applications, but they may work wonders in the future when you need to deal with some of the more exotic procedures.

For example, we will compare the setjmp/longjmp statement with the goto statement. The analogy with fork is the return value; Compare usage scenarios with coroutines in Python/Lua.

2. Function syntax introduction

1. Simplest example

Let’s take a look at the simplest example code:

Int main() {// A buffer to hold the environment variable jmp_buf buf; printf("line1 \n"); Int ret = setjmp(buf); printf("ret = %d \n", ret); // Check the return type if (0 == ret) {// Return value 0: printf("line2 \n"); Longjmp (buf, 1); Printf ("line3 \n"); printf("line3 \n"); } printf("line4 \n"); return 0; }Copy the code

Execution Result:

The order of execution is as follows (don’t dig into this if you don’t understand, come back after the explanation below) :

2. Function description

Let’s look at the signature of the two functions:

int setjmp(jmp_buf env);
void longjmp(jmp_buf env, int value);
Copy the code

They are all declared in the setjmp.h header, which Wikipedia explains as follows:

setjmp: Sets up the local jmp_buf buffer and initializes it for the jump. This routine saves the program’s calling environment in the environment buffer specified by the env argument for later use by longjmp. If the return is from a direct Invocation, setjMP returns 0. If the return is from a call to LongjMP, setjMP returns a nonzero value. Longjmp: Restores the context of the environment buffer env that was saved by invocation of the setjmp routine in the same invocation of the program. Invoking longjmp from a nested signal handler is undefined. The value specified by value is passed from longjmp to setjmp. After longjmp is completed, program execution continues as if the corresponding invocation of setjmp had just returned. If the value passed to longjmp is 0, setjmp will behave as if it had returned 1; Otherwise, it will behave as if it had returned value.

Now LET me use my own understanding to explain the above paragraph in English:

Setjmp function

Function: to execute this function when the various context information is saved, mainly some register values;

Parameter: buffer used to store context information, which is equivalent to taking a snapshot of the current context information.

Return value: There are two types of return values. If the setjmp function is called directly, the return value is 0. If the longjmp function is called, the return value is non-zero; The analogy here is with the process creation function fork.

Longjmp function

Run the following command to jump to the context (snapshot) saved in the env buffer:

Parameter: env specifies which context (snapshot) to jump to. Value is used to determine the return value of setjmp.

Return value: No return value. Because when you call this function, you jump directly to the code somewhere else and never come back.

Summary: These two functions are used together to realize the jump of the program.

3. Setjmp: Saves context information

As we know, C code, after being translated into binary files, is loaded into memory at execution time, and the CPU executes each instruction sequentially from the code segment. There are many registers in the CPU that hold the current execution environment, such as the code segment register CS, the instruction offset register IP, and of course many other registers. We call this execution environment the context.

When CPU obtains the next execution instruction, it can obtain the instruction to be executed through CS and IP registers, as shown in the figure below:

A few things to add:

In the figure above, the code segment register CS is treated as a base address, that is, CS refers to the starting address of the code segment in memory, and the IP register represents the offset from this base address of the address of the next instruction to be executed. Therefore, every time you fetch an instruction, you only need to add the values in the two registers to get the address of the instruction.

In fact, on x86 platforms, the snippet register CS is not a base address, but a selector. Somewhere in the operating system there is a table that stores the actual starting address of the code segment, and the CS register stores only an index value that points to an entry in the table, which is about virtual memory;

After obtaining an instruction, the IP register automatically moves down to the beginning of the next instruction. The number of bytes moved depends on how many bytes are occupied by the current instruction.

The CPU is a big fool, it has no idea, it does what we tell it to do. Take the instruction for example: as long as we set CS and IP registers, the CPU will use the values in these two registers to get the instruction. If you set these two registers to the wrong value, the CPU will also stupidly fetch instructions, but will crash during execution.

We can simply interpret this register information as context information, and the CPU executes based on this context information. Therefore, C provides us with the setjmp library function to store the current context information temporarily in a buffer.

What is the purpose of preservation? In order to resume execution at the current location later.

Here’s a simpler example: a snapshot in a server. What does a snapshot do? When a server error occurs, you can revert to a snapshot!

4. Longjmp: Realize the jump

When it comes to jumps, the concept that immediately comes to mind is the GoTO statement, and I’ve found that many tutorials have a problem with the goto statement, saying that you should try not to use it in your code. This is a good point: if goto is used too much, it can affect your understanding of the order in which your code is executed.

But if you look at the Linux kernel code, you’ll see a lot of GOTO statements. Again: find a balance between code maintenance and execution efficiency.

A jump changes the execution sequence of a program. A goto statement can only jump within a function, but not across it.

Therefore, C language provides us with the longjmp function to achieve remote jump, as can be seen from its name, that is to say, it can jump across functions.

From the perspective of the CPU, the so-called jump is set in the context of a variety of registers to a snapshot of the moment, it is obvious that the setjmp function above, have put the moment the context information (snapshot) stored in a temporary buffer, if you want to jump to that place to execute, tell the CPU directly.

How do I tell the CPU? This is to override the registers used by the CPU in the temporary buffer.

Setjmp: Return type and return value

In some programs that require multiple processes, we often use fork to “hatch” a new process from the current process, starting with the next statement of fork.

For the main process, if the fork function returns and the next statement continues, how can we tell the difference between the main process and the new process? The fork function provides a return value for us to distinguish:

The fork function returns 0: this is a new process; The fork function returns non-0: the original main process and the new process’s process number.

Similarly, setjmp functions have different return types. There are two scenarios for returning from the setjmp function:

Active call setjMP: return 0, active call purpose is to save the context, create a snapshot.

When jumping over longjmp: returns a non-zero value specified by the second argument to longjmp.

Based on these two different values, we can do different branches. When returning through the longjMP jump, different non-zero values can be returned according to the actual scenario. For those of you who have experience programming Python or Lua, the yield/resume function comes to mind. Their appearance in parameters, return values is the same!

Summary: Here, basically the use of setjmp/longjmp these two functions are finished, I do not know whether I describe clear enough. At this point, take a look at the sample code at the beginning of this article and it should be obvious.

Three, using setJMP/LongjMP to achieve the exception capture

Since the C library provides us with this tool, there must be some usage scenarios. Exception catching is supported directly at the syntactic level in some high-level languages (Java/C++), usually in try-catch statements, but in C you have to implement it yourself.

Let’s demonstrate a simple exception capture model with 56 lines of code:

#include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <setjmp.h> typedef int BOOL; #define TRUE 1 #define FALSE 0 ErrorCode typedef enum _ErrorCode_ {ERR_OK = 100, // no error ERR_DIV_BY_ZERO = -1 // divisor is 0} ErrorCode; Jmp_buf gExcptBuf; // typedef int (*pf)(int, int); Int my_div(int a, int b) {if (0 == b) {// The second argument is the exception code longjmp(gExcptBuf, ERR_DIV_BY_ZERO); } return a/b; Int ret = setjmp(gExcptBuf); int ret = setjmp(gExcptBuf); If (0 == ret) {// Call hash func(a, b); // Return ERR_OK; } else {// an exception occurred, ret is the exception code return ret; } } int main() { int ret = try(my_div, 8, 0); Int ret = try(my_div, 8, 2); If (ERR_OK == ret) {printf("try ok! \n"); } else { printf("try excepton. error = %d \n", ret); } return 0; }Copy the code

The code does not need to be explained in detail, just look at the comments in the code. This code is only illustrative and would certainly need a better wrapper to use in production code.

It is important to note that setjmp/longjmp only changes the order of execution of the application, and some of the application’s own data needs to be manually handled if it needs to be rolled back.

4. Use SETJMP/LongJMP to achieve coroutine

1. What is coroutine

In C programs, what is a coroutine if sequences that require concurrent execution are typically implemented using threads? Wikipedia explains coroutines as follows:

More detailed information on coroutines can be found on this page, which describes coroutines compared to threads, generators, and implementation mechanisms in various languages.

Let’s take a quick look at the difference between coroutines and threads in terms of producers and consumers:

2. Producers and consumers in threads

Producer and consumer are two parallel execution sequences, usually executed with two threads;

Consumers are in a waiting state (blocked) while producers produce goods. After production is completed, consumers are informed to consume goods through semaphore;

While consumers consume goods, producers are in a waiting state (blocked). After consumption, producers are notified by semaphores to continue producing goods.

3. Producers and consumers in coroutines

The producer and consumer execute in the same execution sequence, and execute alternately through the jump of execution sequence.

After the producer produces the good, he gives up the CPU and lets the consumer execute.

After consuming goods, the consumer gives up the CPU and lets the producer execute.

4. Coroutine implementation in C language

Here gives a most simple model, through setJMP/longjMP to achieve the mechanism of the coroutine, the main purpose is to understand the execution sequence of the coroutine, did not solve the problem of the transmission of parameters and return values.

If you want to delve deeper into coroutine implementation in C, take a look at the concept of duff devices, which use goto and switch statements to branch, using weird but legal syntax.

typedef int BOOL; #define TRUE 1 #define FALSE 0 typedef struct _Context_ {jmp_buf mainBuf; jmp_buf coBuf; } Context; // Context global variable Context gCtx; // #define Resume () \ if (0 == setjmp(gctx.mainbuf)) \ {longjmp(gctx.cobuf, 1); #define yield() \ if (0 == setjmp(gctx.cobuf)) \ {longjmp(gctx.mainbuf, 1); Void coroutine_function(void *arg) {while (TRUE) // loop {printf("\n*** coroutine: working \n"); For (int I = 0; i < 10; ++i) { fprintf(stderr, "."); usleep(1000 * 200); } printf("\n*** coroutine: suspend \n"); // Yield (); // Argument 1: func function executed in coroutine // argument 2: func required argument typedef void (*pf)(void *); BOOL start_coroutine(pf func, void *arg) {if (0 == setjmp(gctx.mainbuf)) {func(arg); // Call the function return TRUE; } return FALSE; } int main() {start a coroutine start_coroutine(coroutine_function, NULL); While (TRUE) // loop {printf("\n=== main: working \n"); For (int I = 0; i < 10; ++i) { fprintf(stderr, "."); usleep(1000 * 200); } printf("\n=== main: suspend \n"); // Drop the CPU and let the coroutine resume(); } return 0; }Copy the code

The following information is displayed:

Five, the summary

The focus of this article is to introduce the syntax and usage scenarios of setjMP/longjMP. In some requirement scenarios, it can achieve twice the result with half the effort.

Of course, you can also use your imagination to achieve fancier features by performing jumps in sequences, anything is possible!

The summary article concluded by Doug is written with great care, which is very helpful for my technical improvement. Good things, to share!

Finally, I wish you: in the face of the code, never bug; Face life, spring flowers!

Author: Doggo (public id: IOT Town) Zhihu: Doggo B station: Doggo share Nuggets: Doggo share CSDN: Doggo share

Long press the following two-dimensional code to pay attention, pay attention to + star standard public number, each article has dry goods.

C language pointer – from the underlying principle to the tricks, with graphics and code to help you explain thorough step by step analysis – how to use C to achieve object-oriented programming originally GDB underlying debugging principle so simple about encryption, certificate of those things in-depth LUA scripting language, let you thoroughly understand the debugging principle