The ant group | a Go with embedded Rust on pit practice of library

Author: RuihangXia

background

The story takes place on an internal project that has both Go and Rust, where the Rust repository is relied upon by the business part of Go as a storage component. For a variety of reasons, both parts need to run as the same process, with a layer of C FFI interface between them as a bridge between the ideal and the real. It looks something like this

The Rust part begins by encapsulating the interface and structure of the Library it uses with another small Cdylib [1] Shim project, which generates a compilation of the Rust Library, a dynamically linked object and C interface definition. Then write a Go SDK for the upper layer to use based on the definition of C. The single Go on the left or Rust project on the right is not exciting enough. This paper mainly focuses on the part of the tangle in the middle, which is also where I lost a lot of hair.

This article discusses several accidents from three general aspects of thread, memory and signal.

thread

Shim and the SDK are very simple to start with, the SDK each request goes through CGO to shim, shim calls library to process the request after some structural transformation, and waits for the request to complete and returns the result to the SDK. It was an intuitive process, but something went wrong during testing, and the whole program didn’t behave properly. The component and library/ Shim were both individually tested at the beginning of the tuning, and of course everything was fine when tested separately, so the scenario looks something like this…

It was soon suspected that the SDK and SHIm parts were involved, but it wasn’t clear exactly what the problem was. I didn’t realize something was wrong until I found a log in the Library that was supposed to be printed multiple times at startup. For some reason, TLS [2] (Thread local storage) structures exist in shiM and library, and an implementation is provided in Rust STD [3]. It initializes the structure the first time each thread uses it, and the same thread continues to access the structure, while different threads access different structures. The repeated log is the output of a TLS structure at initialization time, that is, TLS has been repeatedly initialized as expected.

Let’s take a look at what happens when Go makes a CGO call. In Go, each thread does not correspond to the system thread one by one. When a non-GO call is made, the current Goroutine is placed on a system thread, which is used to complete the subsequent call flow. So if no special handling is done for shim, each request may occur on a new thread, invalidating the state previously stored in TLS. To make matters worse, shim’s TLS included some library thread handles, causing some of the library’s state to get messed up and the whole process to behave strangely.

To solve this problem, we need to anchor the ShiM thread to the Library, and the Shim will interact with the Library through thread messaging when receiving requests, including sending requests and receiving results. The following code

struct Handle {
    runtime: Runtime // this was TLS
}

impl Handle {
    fn new() - >Self{
        let worker_thread = std::thread::spawn(|| {}); // main thread, channels inside
        let runtime = Runtime::new(); // cb threads
        Self { runtime }
    }
}
Copy the code

It also exposes the library’s asynchronous interface to the upper layer. Instead of a cGO completing the entire request, a CGO submitting request is returned asynchronously through another C->Go call. It also prevents large requests from blocking the CGO thread that spawn the Go Runtime for too long. The interaction flow now looks like the following:

However, this is only a very basic scheme. The principle of “transformer” CGO is very simple, but there is also a lot of room for optimization, which can reduce the loss while solving the problem [5, 6, 7].

memory

In addition, some changes to the data structure are required to pass data through the C interface, which is mainly done in shiM and SDK. In a nutshell, all types that need to be passed can be expressed in C. Here [13] provides some examples, and we also have an example below:

#[repr(C)]
pub struct Bytes {
    ptr: *mut libc::c_void,
    len: libc::size_t,
}

crate fn make_bytes(mut bytes: Vec<u8- > >)Self {
    bytes.shrink_to_fit();
    let (ptr, len, cap) = bytes.into_raw_parts();
    debug_assert_eq!(len, cap);

    Bytes {
        ptr: unsafe { mem::transmute::<*mut u8, *mut libc::c_void>(ptr) },
        len,
    }
}
Copy the code

Both Go and Rust handle memory problems well (in general) away from FFI boundaries, where illegal Pointers and addresses are basically thrown by the other party. In this scenario, we have reduced the number of places where bare Pointers are used by using a large number of copies to reduce the amount of work. The rest is mainly concentrated in two places: the memory that receives the data passed by the other party and the memory that notifies the other party to reclaim this data. In addition to adhering to the old “allocate, release” principle, we also need to be able to locate problems as they arise.

The first place to start is the stack typed when an error occurs. According to experience, the first Go routine backtrace following the error message is usually the place where the problem occurs. You can start by following the call stack to see if any Pointers are being used incorrectly in your code. But only the Go part of the stack is going to be there. When the problem arises outside of Go we can only know which CGO function is suspect. This part is similar to ordinary memory troubleshooting, we have several common methods as follows:

Print log

If the problem is easy to reproduce, you can add more log output to the path, print out the questionable pointer addresses and their dereference, and sometimes observe how a pointer is becoming illegal. Note that it is best to put the dereference in another line of the log. Every dereference operation is an illegal edge test. If the pointer fails in the dereference operation, it may disappear with other useful log output.

With the help of tools

Some tools can also be used to monitor memory usage, such as Sanitizer [8] and Valgrind [9]. Sometimes, not all memory problems will lead to program failure, and some illegal memory operations may be ignored, such as arrays that are slightly out of bounds. Carelessly free many times or forget free, etc. These tools can help detect these problems in a timely manner. However, they all have some performance impact, so they are not applicable in all cases.

# valgrind
$ cargo build
$ valgrind --leak-check=full --show-leak-kinds=all --tool=memcheck $(BIN)

# address sanitizer
$ export RUSTFLAGS=-Zsanitizer=address RUSTDOCFLAGS=-Zsanitizer=address
$ cargo run -Zbuild-std --target x86_64-unknown-linux-gnu
Copy the code

Special values

Some environments will set a special value to indicate the state of the memory before and after the memory operation, so that it can be easily observed. Such as jemalloc’s –enable-fill parameter [10] or MariaDB’s TRASH_ALLOC() [11] macro.

Narrow the scope of

There are two aspects of narrowing. If possible, you can comment out some code to narrow the scope of the investigation, for example, do not perform free operation or pause part of the path to observe; As well as shortening the invocation path and mock testing individual components.

There are other small things to look at, such as whether the Pointers are aligned, and the range between the failed Pointers and the surrounding Pointers. For example, the pointer to Go and the pointer to Rust run in the same process. It can be observed that the pointer to Go and the pointer to Rust are in two address segments. . However, in the actual debugging process encountered memory problems on the surface of the same, secretly each. A program with a memory problem can cause all kinds of weird behavior, so you need to be able to categorize the different problems. Instead of having a fixed process with manuals, you can adjust your methods as new information becomes available. After all, in general, there are fewer cases where there are no options at all, and often there are many options but it is not clear which one will get the information. Every skill will have a cast time cost, if you are haunted by invalid pointer for too long and panic dreams will be very painful.

While making shim, Binding, and their C demo, I re-experienced the progress of civilization with some C, but looking back, was there a more advanced way to avoid the FFI of the unsafe shuttles? That’s something we’ll need to think about later.

signal

This is a problem that took a long time, starting with a SIGSEGV, but just to give you a hint of the plot this is also essentially a memory problem.

fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x2 addr=0x18589091b1 pc=0x7f9205f814ba] runtime stack: runtime.throw({0xace954, 0xefc4e055}) /home/go/src/runtime/panic.go:1198 +0x71 fp=0x7f91dce8afe8 sp=0x7f91dce8afb8 pc=0x448451 runtime.sigpanic()  /home/go/src/runtime/signal_unix.go:719 +0x396 fp=0x7f91dce8b038 sp=0x7f91dce8afe8 pc=0x45fdf6Copy the code

The reproduction here takes time. The original scene usually takes half a day to a day to appear, and there are other unsolved problems mixed together, and there are many different system environments and flows, which increase the difficulty of investigation in various aspects.

After taking some time to rule out all the other factors, we focused on the core dump file generated by SIGSEGV. But when you look at core through GDB, they all basically look like this

Thread (Thread 0x7f7c45fef700 (LWP 47715)): #0 0x0000000000895b93 in runtime.futex () #1 0x0000000000860a20 in runtime.futexsleep () #2 0x0000000001dea010 in runtime.sched () #3 0x0000000000000080 in ?? () #4 0x00007f7c45feccb0 in ?? () #5 0x0000000000000000 in ?? () Thread (Thread 0x7f7c457ee700 (LWP 47716)): #0 0x0000000000895b93 in runtime.futex () #1 0x00000000008609ab in runtime.futexsleep () #2 0x000000c00011e840 in ?? () #3 0x0000000000000080 in ?? () #4 0x0000000000000000 in ?? () Thread (Thread 0x7f7c4468c700 (LWP 52927)): #0 0x00007f7c49301489 in syscall () from /lib64/libc.so.6 #1 0x00007f7c49834da9 in futex_wait (self=0x7f7c4468a640, ts=...) The at parking_lot_core 0.8.5 / SRC/thread_parker/Linux. Rs: 112Copy the code

The question mark is probably from the M :n thread of Go, but all the other non-GO threads are stopped on Syscall. To see what is going on in Go, we use delve [12] to look at the Go stack, similar to GDB. This was a bit of a painful process, dumping thousands of Goroutine backtraces, and after a brief top-of-the-stack analysis took some time, it turned out that most of them were business Goroutines, nothing suspicious.

(dlv) Goroutine 1 - User: Main.go :207 main.main (0xFA6f0e) [chan receive (nil chan) 451190h16m34.324364485s] Goroutine 2-user: / root/go/SRC/runtime/proc. Go: 367 runtime gopark (0 x7f8076) [force gc (idle) 451190 h16m34. 619109482 s] Goroutine 3 - User:  /root/go/src/runtime/proc.go:367 runtime.gopark (0x7f8076) [GC sweep wait] Goroutine 4 - User: /root/go/src/runtime/proc.go:367 runtime.gopark (0x7f8076) [GC scavenge wait] Goroutine 5 - User: / root/go/SRC/runtime/proc. Go: 367 runtime gopark (0 x7f8076) [finalizers wait 451190 h14m6. 860088484 s] * 17 - Goroutine User: /root/go/src/runtime/sys_linux_amd64.s:165 runtime.raise (0x82acc1) (thread 71636) [GC assist marking] Goroutine 47 - User: / root/go/SRC/runtime/proc. Go: 367 runtime gopark (0 x7f8076) [select 451190 h16m34. 324667846 s] Goroutine 48 - User: /root/go/src/runtime/sigqueue.go:169 os/signal.signal_recv (0x825998) (thread 67111)Copy the code

The only information is to see where SIGSEGV came from (Goroutine 17)

runtime.throw({0xace954, 0xefc4e055}) /home/go/src/runtime/panic.go:1198 +0x71 fp=0x7f91dce8afe8 sp=0x7f91dce8afb8 pc=0x448451 runtime.sigpanic()  /home/go/src/runtime/signal_unix.go:719 +0x396 fp=0x7f91dce8b038 sp=0x7f91dce8afe8 pc=0x45fdf6Copy the code

It looks familiar, the first line of stack that often appears in error messages. This information is useful, but it is also mostly useless:

After some time of struggle, I finally suspect that the core dump seen above is not actually the primary stack. By running the process in GDB, you can finally get the location where the illegal operation actually occurred, and GDB catches the signal before the Go Runtime when it is first thrown. If we had known that the stack signal would be changed hands by the Go Runtime…

After being able to see the scene of the first crime, the follow-up process was relatively common, and the repair was completed in the afternoon.

The last

The most time-consuming part of every bug adjustment is positioning, including preparation in advance which is not mentioned in this paper, such as avoiding bugs, preparing tools in advance, and various reoccurrence methods. And a step-by-step troubleshooting method based on information when bugs appear. Finally, I wish you and yourself often write on a run over 🤤.

reference

1: https://doc.rust-lang.org/reference/linkage.html
2: https://en.wikipedia.org/wiki/Thread-local_storage
3: https://doc.rust-lang.org/std/macro.thread_local.html
5: https://mp.weixin.qq.com/s/PdeLX4loFaXr4E74hsrHCw
6: https://about.sourcegraph.com/go/gophercon-2018-adventures-in-cgo-performance/
7: https://www.cockroachlabs.com/blog/the-cost-and-complexity-of-cgo/
8: https://doc.rust-lang.org/beta/unstable-book/compiler-flags/sanitizer.html
9: https://valgrind.org/
10: https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Find-a-memory-corruption-bug
11: https://github.com/MariaDB/server/blob/c9fcea14e9e1f34a97451706eac51276c85bbea7/include/my_valgrind.h?_pjax=%23js-repo-p jax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-conta iner%5D#L97-L100
12: https://github.com/go-delve/delve
13: http://jakegoulding.com/rust-ffi-omnibus/