论文导读 | Rudra : 查找 Rust 生态系统中的内存安全 Bug

作者：张汉东

引子

美国佐治亚理工学院的系统软件安全实验室开源了Rudra ，用于分析和报告 Unsafe Rust 代码中潜在的内存安全和漏洞，为此他们也将在 2021 年第 28 届 ACM 操作系统原则研讨会论文集上发表相关论文，该论文目前在 Rudra 源码仓库中提供下载。

说明：本篇文章不是论文的翻译，而是本人对该论文的梳理和总结。

概要

Rust 语言关注内存安全和性能，Rust 目前已经在传统的系统软件中得到了广泛的应用，如操作系统、嵌入式系统、网络框架、浏览器等，在这些领域，安全和性能都是不可或缺的。

Rust 内存安全的思想是在编译时验证内存的所有权，具体而言是验证内存分配对象的访问和生存期。Rust 编译器对值的共享和独占引用通过借用检查提供两个保证：

引用的生存期不能长于其拥有者变量的生存期。为了避免 use-after-free (UAF) 。
共享和独占引用不能同时存在，排除了并发读写同一个值的风险。

不幸的是，这些安全规则太过限制。在某些需要调用底层硬件系统，或需要获得更好性能时，需要暂时绕过安全规则。这些需求无法被 Safe Rust 解决，但是对于系统开发却是必不可少的，所以 Unsafe Rust 被引入。Unsafe Rust 意味着，编译器的安全检查职责被暂时委托给了程序员。

Unsafe Rust代码的健全性（soundness ）对于整个程序的内存安全是至关重要的，因为大多数系统软件，如操作系统或标准库，都离不开它。

有些人可能比较天真地以为，Unsafe Rust 只要在审查源码的时候就可以排除它的风险。然而，问题的关键在于，健全性的推理是非常微妙的，且很容易出错，原因有三：

健全性的错误会顺道破坏Rust的安全边界，这意味着所有的外部代码，包括标准库都应该是健全的。
Safe 和 Unsafe 的代码是相互依赖的。
编译器插入的所有不可见的代码路径都需要由程序员正确推理。

为了让 Rust 有一个健全性的基础，已经有了很多研究型项目，比如形式化类型系统和操作语义，验证其正确性，并且建立模型用于检查。这些都是非常重要的，但还不够实用，因为它没有覆盖到整个生态系统。另外还有一些动态方法，比如 Miri 和 Fuzz 模糊测试，但是这些方法不太容易被大规模使用，因为它需要大量的计算资源。

当前，Rust 语言正在变得流行，Unsafe Rust 的包也逐渐变多。因此，设计一个实用的检测内存安全的算法就很重要了。

这篇论文介绍了三种重要的Bug模式，并介绍了 Unsafe 代码，以及提供 Rudra 这样的工具。该论文作者的工作一共有三个贡献：

确定了三种 Unsafe Rust 中的 Bug 模式，并且设计了两种新的算法可以发现它们。
使用 Rudra 在Rust 生态系统中发现263个新的内存安全漏洞。这代表了自2016年以来RustSec中所有bug的41.4%。
开源。Rudra 是开源的，我们计划贡献其核心算法到官方的Rust linter中。

Rudra

Rudra 用于分析和报告Unsafe Rust 代码中潜在的内存安全漏洞。由于Unsafe 代码中的错误威胁到 Rust 安全保证的基础，Rudra 的主要重点是将我们的分析扩展到 Rust 包注册仓库（比如 crates.io）中托管的所有程序和库。Rudra 可以在 6.5 小时内扫描整个注册仓库（43k 包）并识别出 263 个以前未知的内存安全漏洞，提交 98 个 RustSec 公告和 74 个 CVE，占自 2016 年以来报告给 RustSec 的所有漏洞的 41.4%。

Rudra 发现的新漏洞很微妙，它们存在于Rust 专家的库中：两个在 std 库中，一个在官方 futures 库中，一个在 Rust 编译器 rustc 中。 Rudra 已经开源，并计划将其算法集成到官方 Rust linter 中。

Rudra，这个名称来自于梵文，译为鲁特罗（或楼陀罗），印度神话中司风暴、狩猎、死亡和自然界之神。他在暴怒时会滥伤人畜；他又擅长以草药来给人治病。其名意为“狂吼”或“咆哮”（可能是飓风或暴风雨）。

Rudra 和 Miri 的区别：

Rudra 是静态分析，无需执行即可分析源码。Miri 是解释器，需要执行代码。

两者可以结合使用。

关于 Unsafe Rust

因为 unsafe 关键字的存在，引出了一个有趣的 API 设计领域：如何交流 API 的安全性。

通常有两种方法：

内部 Unsafe API 直接暴露给 API 用户，但是使用 unsafe 关键字来声明该 API 是不安全的，也需要添加安全边界的注释。
对 API 进行安全封装（安全抽象），即在内部使用断言来保证在越过安全边界时可以Panic，从而避免 UB 的产生。

第二种方法，即将 Unsafe 因素隐藏在安全 API 之下的安全抽象，已经成为 Rust 社区的一种约定俗成。

Safe 和 Unsafe 的分离，可以让我们区分出谁为安全漏洞负责。Safe Rust 意味着，无论如何都不可能导致未定义行为。换句话说，Safe API 的职责是，确保任何有效的输入不会破坏内部封装的 Unsafe 代码的行为预期。

这与C或C++形成了鲜明的对比，在C或C++中，用户的责任是正确遵守 API 的预期用法。

比如，在 libc 中的printf()，当它调用一个错误的指针而导致段错误的时候，没有人会指责它。然而这个问题却导致了一系列的内存安全问题：格式字符串漏洞（format-string vulnerability）。还记得前段时间苹果手机因为加入一个经过特别构造名字的Wifi就变砖的漏洞否？

而在 Rust 中，println!() 就不应该也不可能导致一个段错误。此外，如果一个输入确实导致了段错误，那么它会被认为是 API 开发者的错误。

Rust 中内存安全Bug 的定义

在 Rust 中有两类 Unsafe 定义： Unsafe 函数和 Unsafe 特质（trait）。

Unsafe 函数希望调用者在调用该函数时，可以确保其安全性。

Unsafe 特质则希望实现该 trait 的时候提供额外的语义保证。比如标准库里的 pub unsafe trait TrustedLen: Iterator { }，该 trait 要求必须检查 Iterator::size_hint() 的上界，才能保证 TrustedLen 所表达的“可信的长度”语义。

该论文对内存安全 Bug 提供了一个清晰的一致性的定义，而非 Rust 操作语义：

定义 1：类型（Type）和值（Value）是以常规方式定义的。类型是值的集合。

定义2：对于类型 T， safe-value(T) 被定义为可以安全创建的值。例如 Rust 里的字符串是内部表示为字节的数组，但它在通过安全 API 创建的时候只能包含 UTF-8 编码的值。

定义3：函数 F 是接收类型为 arg(F)的值，并返回一个类型为 ret(F) 的值。对于多个参数，我们将其看作元组。

定义4：如果在 safe-value(arg(F))集合中存在v （记为：∃𝑣 ∈ safe-value(𝑎𝑟𝑔(𝐹)) ），使得当调用 F(v)时触发违反内存安全的行为，或者返回一个不属于 safe-value(𝑟𝑒𝑡(𝐹)) 集合中的返回值𝑣𝑟𝑒𝑡 时（记为：𝑣𝑟𝑒𝑡 ∉ safe-value(𝑟𝑒𝑡(𝐹))），则函数 F 有内存安全缺陷。

定义5：对于一个泛型函数Λ，pred(Λ)被定义为满足Λ的类型谓词（指trait 限定）的类型集合。给定一个类型𝑇∈pred(Λ)，resolve(Λ,𝑇)将泛型函数实例化为具体函数𝐹。

定义6：如果一个泛型函数Λ可以被实例化为一个具有内存安全缺陷的函数，即，∃𝑇 ∈ pred(Λ)，使得𝐹=resolve(Λ,𝑇)具有内存安全缺陷，则该泛型函数具有内存安全缺陷。

Definition 7: If a Send implementation of a type cannot be transmitted across thread boundaries, then that type has a memory safety issue.

Definition 8: A type has a memory safety problem if its Sync implementation cannot concurrently access the type through an aliased pointer. That is, a non-thread-safe method is defined that accepts &self.

Unsafe Rust has three important Bug patterns

Examining the Unsafe Rust qualitatively, the thesis concludes three important Bug patterns:

Panic Safety: Memory security Bug caused by Panic.
Higher-order Safety Invariant: Bug caused by a higher-order type having no given security guarantee.
Propagating Send/Sync in Generic Types (in Generic TypesSend/SyncPropagation) : Manual by the generic internal type is incorrectSend/SyncThe implementation causes genericsSend/SyncBugs caused by incorrect constraints.

Panic Safety

This is similar to the concept of exception safety in other programming languages, such as C++. The concept of exceptions in Rust, which resembles exceptions in other programming languages, is called Panic. Panic is usually used when an application reaches an unrecoverable state, but it can also be caught in Rust for types that implement the UnwindSafe trait.

When Panic occurs, it causes a stack unwind, which calls the destructor of the stack allocation object and transfers control flow to the Panic handler. So, when panic occurs, the destructor for the current surviving variable will be called, causing some memory safety issues, such as freeing memory that has already been freed.

But reasoning correctly about panic safety in Unsafe code is difficult and error-prone. Typically, encapsulated Unsafe code may temporarily bypass ownership checks, and the safety-wrapped API ensures that the internal Unsafe code does not violate security rules based on security boundary conditions before its value is returned. However, if the encapsulated Unsafe code is panicking, its external security checks may not be performed. This can lead to memory insecurity similar to Uninitialized or Double Free in C/C++.

This paper gives a definition:

If a function 𝐹 drops a value 𝑣 of type 𝑇 that causes 𝑣 to evaluate safe-value(x) during the Unwind and violates memory security, then the function has a panic security vulnerability.

String:: Retain () 'CVE-2020-36317 Panic safety bug

pub fn retain<F>(&mut self.mut f: F)
where 
    F: FnMut(char) - >bool
{
    let len = self.len();
	let mut del_bytes = 0;
 	let mut idx = 0;
 
    unsafe { self.vec.set_len(0); }    // + fix bug code
 	while idx < len {
 		let ch = unsafe {
  			self.get_unchecked(idx.. len).chars().next().unwrap() };let ch_len = ch.len_utf8();
 
 		// self is left in an inconsistent state if f() panics
        // Here if f() is panicking, self's length will be inconsistent
 		if! f(ch) { del_bytes += ch_len; }else if del_bytes > 0 {
 			unsafe {
 				ptr::copy(self.vec.as_ptr().add(idx),
 				self.vec.as_mut_ptr().add(idx - del_bytes),
 				ch_len);
 			}
 		}
 		idx += ch_len; // point idx to the next char
 	}
 	unsafe { self.vec.set_len(len - del_bytes); } // + fix bug code, if panic occurs in while, return length set to 0
}

fn main() {// PoC: creates a non-utf-8 string in the unwinding path
    // A non-UTF-8 encoded string is passed here to cause panic
    0 "e0".to_string().retain(|_| {
        match the_number_of_invocation() {
            1= >false.2= >true, _ = >panic!(),}}); }Copy the code

Higher-order Safety Invariant

A function should safely execute all safe inputs, including parameter data types, generic type parameters, and externally passed closures.

In other words, a secure function should not provide more than the security invariants provided by the Rust compiler. A security invariant is a security function in Rust that should not have undefined behavior under any valid input.

For example, the sort function in Rust should not trigger any undefined behavior, even if a user-supplied comparator does not follow a full order relationship, and segment errors do not occur. However, the sorting function in Cpp can cause segment errors when the user provides a comparator that is not compatible with the current comparator.

The only security invariant that Rust provides for higher-order types is correctness of type signatures. A common mistake, however, is to make incorrect assumptions about the functions provided by callers:

Logical consistency: For example, sort follows a full-order relationship.
Purity: Always return the same output for the same input.
Semantic constraint: Only for parameters, because it may contain uninitialized bytes.

For Unsafe code, you must check these attributes yourself, or specify the correct constraints (for example, with Unafe attributes) to make it obligatory for the caller to check these attributes.

It is difficult to perform high-level type security invariants under Rust type systems. For example, pass an uninitialized buffer to a caller-supplied Read implementation.

Unfortunately, many Rust programmers provide an uninitialized buffer for a function provided by the caller to optimize performance, unaware of its inherent inhealth. Because of its ubiquity and subtlety, the Rust library now explicitly states that calling read() with an uninitialized buffer is inherently unsound behavior.

This paper gives a definition:

High-order immutability bugs are memory-safe bugs in functions caused by the assumption that high-order immutability is guaranteed, and Rust’s type system does not guarantee code supplied by callers.

1 // CVE-2020-36323: a higher-order invariant bug in join()
2 fn join_generic_copy<B, T, S>(slice: &[S], sep: &[T]) -> Vec<T> 
3 where T: Copy, B: AsRef<[T]> + ?Sized, S: Borrow<B>
4 {
5     let mut iter = slice.iter();
6
7 	  // `slice`is converted for the first time
8     // during the buffer size calculation.
9     letlen = ... ;// slice is converted here for the first time
10    let mut result = Vec::with_capacity(len);
11.12    unsafe {
13        let pos = result.len();
14        lettarget = result.get_unchecked_mut(pos.. len);15
16        // `slice`is converted for the second time in macro
17        // while copying the rest of the components.
18spezialize_for_lengths! (sep, target, iter;// 'slice' is converted a second time
19        0.1.2.3.4);
20
21        // Indicate that the vector is initialized
22        result.set_len(len);
23    }
24    result
25 }
26
27 // PoC: a benign join() can trigger a memory safety issue
28 impl Borrow<str> for InconsistentBorrow {
29     fn borrow(&self) - > &str {
30         if self.is_first_time() {
31             "123456"
32         } else {
33             "0"
34         }
35     }
36 }
37
38 let arr: [InconsistentBorrow; 3] = Default::default();
39 arr.join("-");
Copy the code

This code is a demonstration of a function called join_Generic_copy inside the JOIN method that implements Borrow< STR >. In Join_Generic_copy, two transitions are made to slice, but in spezialize_for_lengths! Inside the macro, the.borrow() method is called, and if the second conversion is different from the first, an uninitialized byte string is returned.

Here, Borrow is a higher-order type whose consistency within Borrow is not guaranteed. It may return different slices and, if left untreated, may expose uninitialized bytes to the caller.

Propagating Send/Sync in Generic Types

The rules for Send/Sync can get complicated when it comes to generics, as shown here:

usuallySend/SyncThis is implemented automatically by the compiler, but developers may need to implement both traits manually when dealing with Unsafe. Manual implementationSend/SyncIt’s hard to get it right. A don’t understandSend/SyncDevelopers with manual implementations can easily introduce bugs into their code.

This paper gives a definition:

If a generic implements Send/Sync and it specifies an incorrect Send/Sync constraint on an internal type, the generic Send/Sync constraint becomes incorrect. This is an unsafe Bug caused by Send/Sync propagation in generics.

1 // CVE-2020-35905: incorrect uses of Send/Sync on Rust's futures
2 pub struct MappedMutexGuard<'a, T: ?Sized, U: ?Sized> {
3     mutex: &'a Mutex<T>,
4     value: *mut U,
5     _marker: PhantomData<&'a mut U>, // + fix the code
6 }
7
8 impl<'a, T: ?Sized> MutexGuard<'a, T> {
9     pub fn map<U: ?Sized, F>(this: Self, f: F)
10        -> MappedMutexGuard<'a, T, U>
11        where F: FnOnce(&mut T) -> &mut U {
12            let mutex = this.mutex;
13            let value = f(unsafe{&mut *this.mutex.value.get() });
14                  mem::forget(this);
15                  // MappedMutexGuard { mutex, value }
16                  MappedMutexGuard { mutex, value, _marker: PhantomData } // + fix the code
17    }
18 }
19
20 // unsafe impl
      
        Send
      
21 unsafe impl<T: ?Sized + Send, U: ?Sized + Send> Send // + fix the code
22 for MappedMutexGuard<'_, T, U> {}
23 //unsafe impl
      
        Sync
      
24 unsafe impl<T: ?Sized + Sync, U: ?Sized + Sync> Sync // + fix the code
25 for MappedMutexGuard<'_, T, U> {}
26
27 // PoC: this safe Rust code allows race on reference counter
28 * MutexGuard::map(guard, |_| Box::leak(Box::new(Rc::new(true))));
Copy the code

The problem found in the Rust Futures library, the wrong manual Send/Sync implementation, breaks thread-safety guarantees.

In the affected version, MappedMutexGuard’s Send/Sync implementation only considers differences on T, while MappedMutexGuard removes references to U.

When closures used in MutexGuard::map() return T independent U, this can lead to data contention in secure Rust code.

This problem is told to the compiler by fixing the Send/Sync implementation and adding a PhantomData<&’a mut U> flag to the MappedMutexGuard type that the guard is also on U.

The design of the Rudra

The overall design drawing is as follows:

Rudra uses HIR to retrieve Crate’s code structure (including trait definitions, function signatures, Unsafe blocks, etc.) and MIR to retrieve code semantics (data flow, control flow diagram, call dependencies, etc.). Why not use LLVM IR? Because at this level Rust’s abstraction has disappeared.

Then check for Panic Safety and higher-order Invariant bugs via the internal Unsafe Dataflow Checker (UD), Check Send/Sync Variance Checker(SV) for Send/Sync Variance bugs. Finally, the results are summarized and the output report is prioritized.

Unsafe Dataflow Checker(UD) and Send/Sync Variance Checker(SV) correspond to two algorithms, which can be examined in papers and codes.

Explanation of English terms related to security

There are many words for Safety in English, such as Security and Safety, but there is only one word “Safety” in Chinese. So here’s a clarification:

Security, usually refers to information Security, network Security and so on.
Safety usually refers to functional Safety.

Often, information security problems are caused by functional vulnerabilities.

summary

The final chapter of the paper also contains a lot of data to demonstrate Rudra’s effectiveness, as well as results from Rudra and Fuzz tests, Miri and other Rust static analysis tools.

Above, the authors of the paper examined several operating systems implemented by Rust using Rudra, as detailed in the paper.

This paper is well worth a look at to really understand Rust’s security philosophy. This paper also provides a new perspective on the security state of the Rust language and a static checking tool that deserves our attention.

论文导读 | Rudra : 查找 Rust 生态系统中的内存安全 Bug

引子

概要

Rudra

关于 Unsafe Rust

Rust 中内存安全Bug 的定义

Unsafe Rust has three important Bug patterns

Panic Safety

Higher-order Safety Invariant

Propagating Send/Sync in Generic Types

The design of the Rudra

Explanation of English terms related to security

summary

Related Posts

Cloud Zhisheng Atlas supercomputer platform: Computational acceleration Practice based on Fluid + Alluxio

How to troubleshoot Redis performance problems

Git kernel object model