Phala Network | in Intel Rust freestanding STD SGX environment support

By Kevin Wang

Introduction: Intel SGX is a trusted execution environment that isolates applications from the OS. Applications cannot directly access resources provided by the OS. The Teaclave-SGX-SDK we used only provided a NO_STD environment, resulting in a large number of libraries in the Crates ecosystem being unusable. We simulated Linux platform features by adding liBC functions, enabling the STD-dependent Rust ecosystem to be used in an SGX environment without modification. To keep security margins as small as possible, we apply permission controls to each additional LIBC function. At the same time, binary analysis is introduced to ensure that the program will not appear SGX illegal instructions.

background

The privacy cloud computing service of Phala Network is developed based on TeaclaVE-SGX-SDK. Since the SGX execution environment of Intel CPU is equivalent to bare machine without system, rust programs naturally developed based on Teaclave-SGX-SDK can only be developed with NO_STD.

However, when the project is complex, we want to take advantage of Rust’s Crate ecosystem, in which crate relies on Rust STD for the most part. If our NO_STD environment wants to use the STD Crate, it will have to be ported.

If you simply port STD Crate to NO_STD, each crate will have a heavy workload. Teaclave-sgx-sdk provides us with an SGX_TSTD (STD replica of an SGX environment) for portability. Sgx_tstd retains most of the functionality found in Rust STD, so porting from a simple Crate to SGX_TSTD requires only a few lines of code, such as adding extern SGx_TSTD as STD to the crate root, And add some use STD :: Prelude ::v1::*; . The Teaclave-SGX-SDK team has even ported some commonly used Crate to github.com/mesalock-li… In the.

The problem of sgx_tstd

Porting one Crate may not seem like a lot of work, but most of the time we bring in one crate rather than just one crate, with dozens of dependent trees uprooted. Instead of adding a single line of code to Cargo. Toml, the Rust Ecosystem would normally have to port a load of Crate to the SGX environment.

This development pattern has actually led to the bifurcation of the Rust ecosystem into crates. IO and Mesalock-Linux. This split even affects some NO_STD Crate, making the two sides incompatible. For example, mixing some no_STD Crate that relies on log or serde does not compile properly and has to be modified to use log-sgx and serde-sgx. If one day someone makes a similar RUST SDK for ARM /AMD TEE, are we going to fork crates. IO?

This forking behavior can also lead to delayed code updates in the ported ecosystem, and some security scanning utilities for crates. IO and github.com may miss vulnerabilities in the Mesalock-Linux ecosystem, impacting downstream development or posing security threats.

Let SGX support Rust native STDS

Teaclave-sgx-sdk application development is currently standard practice is to enable #! [no_std] and compile target to x86_64-unknown-linux-GNU.

Target: x86_64-unknown-linux-gnu target: x86_64-unknown-linux-gnu What’s wrong with compiling [no_std]?

A simple attempt would result in a link error like the following:Rust STDS rely on LIBC to interact with the OS, and Intel SGX-SDK has a partially implemented SGX LIBC. However, the part of the LIBC function that Rust requires to interact with the system is often not trusted by SGX, so SGX LIBC does not provide it directly. Instead, most implementations provide an equivalent function under the OCall module in the form of Rust ABI functions to alert developers to untrusted operations.

Therefore, we wanted to provide a relay layer that would fill in the missing LIBC functions and proxy them to sgX-SDK equivalents that would basically compile normally using native STD. For example, if the write function is missing in the figure above, we will replace it with a write function:

#[no_mangle]
pub extern "C" fn write(fd: c_int, buf: *const c_void, count: size_t) -> ssize_t {
	unsafe { ocall::write(fd, buf, count) }
}
Copy the code

Thus, our relay layer simulates the behavior of a Linux gliBC on top and the SGX special implementation on bottom, allowing compiled Rust applications for Linux to run inside SGX.

Indeed, after adding the appropriate LIBC functions and removing some of the special code, our enclave program is up and running.

STD and SGX_TSTD coexist

As mentioned above, part of the code was removed, mainly relying on sgX_TSTD specific functions, such as SGX_TSTD :: SGXFS ::SgxFile. When native STD is turned on, sgX_TSTD cannot compile due to rust’s lang_item conflict. To restore use of these features, we either re-implement a copy of them ourselves, or have sgX_TSTD co-exist with STD. Obviously, the latter is more in line with the principle of sustainable development. Therefore, we patched sgX_TSTD to make lang_item a feature that can coexist with native STD without turning it on.

Safety considerations

Our enclave program is security-sensitive, and it would be rude to simply delegate liBC to ocall. Therefore, we think about the security of each proxy function and adjust its implementation behavior based on our business needs.

getrandom

Random number security is particularly important, which is directly related to our key security. Rust’s Rand Crate calls the getrandom function to get a random entropy source. We proxy the getrandom function to SGx_read_RAND, which in HW mode gets true random numbers from the CPU hardware. The implementation is as follows:

#[no_mangle] pub extern "C" fn getrandom(buf: *mut c_void, buflen: size_t, flags: c_uint) -> ssize_t { if buflen == 0 { return 0; } let rv = unsafe { sgx_read_rand(buf as _, buflen) }; match rv { sgx_status_t::SGX_SUCCESS => buflen as _, _ => { if flags & libc::GRND_NONBLOCK ! = 0 { set_errno(libc::EAGAIN); } else { set_errno(libc::EINTR); } -1}}}Copy the code

Permissions related

With STD support, we need to tightly control the permissions of the code inside the Enclave to minimize security risks. Unauthorized access code is best blocked at build time, or restricted at run time.

The permission control policies of the transfer layer are as follows:

operation	The disposal method
Open the file	Forbidden to open file, run error. Due to the`crate getrandom`The implementation requires open(“/dev/urandom”) as a fallback for platforms that do not support getrandom functions. Although the implementation of getrandom does not execute to the open path, the code dependency cannot be removed. Unless you patch crate Getrandom.
read/write	The runtime restriction allows only stDIO related FDS to be operated.
Network operating	Forbid compiling, link error.
syscall	Only SYS_getrandom is allowed and proxied to sgx_read_RAND. Other calls run with an error.
Create a thread	Forbid compiling, link error.
Get working directory	Returns “(unreachable)”, which is the return value when the working directory defined by linux-glibc is unreachable.
Getting environment variables	Disable, return null forever.
To get the time	Proxies to ocall:: clock_getTime, with the business code taking care not to trust the retrieved time.
Dlsym dynamically loads functions	Only getrandom is allowed and nothing else is returned.
mmap	Allows its memory allocation function, proxies to SGX’s malloc. The remaining operations return an error.

Link error check

Since some functions of our current native STD are still missing, when we introduce a new dependency to SGX, we may encounter a link error in a few cases. For example, there are network operations in the dependency, and the following error is reported:In a simple case, we can examine the source code to see which part of the functionality introduced the dependency. In many cases, however, only a fraction of the code we see will actually make it into the binary, and any code that is statically unreachable will be discarded by the compiler/linker. Therefore, it may be difficult to determine what dependencies are actually in effect based on the original source code.

Therefore, we need to start from the final output binary to see how the connect in the image above was introduced. We wrote a callerfinder.py to help analyze such problems.

The first step is to fill the undefined reference with an empty placeholder, so that it will compile:

Then use the CallerFinder library to find the dependency of undefined:

In [1]: from callerfinder import CallerFinder In [2]: finder = CallerFinder("./enclave/enclave.so") In [3]: finder.print_callers('std::net::tcp::TcpStream::connect', 14) std::net::tcp::TcpStream::connect_timeout::h79c6c1fec8ad56c5 http_req::request::Request::send::h8ea00de7a9d4e562 enclaveapp::create_attestation_report::h08c59df2ec69ab65 enclaveapp::prpc_service::get_runtime_info::h5ee8ea7c8422d583 phala_enclave_api::... ::dispatch_request::hd1bf94703ec9513e ecall_prpc_request sgx_ecall_prpc_requestCopy the code

In this way, we can clearly identify the source of the network operation and take action accordingly, for example, we can replace http_req with the original http_REQ-SGX port.

The CPUID command is faulty

After migrating the enclave code to STD, it ran smoothly in SGX_MODE=SW mode, but crashed in many places under SGX_MODE=HW environment. These crashes all point to the same function rand::thread_rng(), and the rand::thread_rng() internal implementation uses the STD :: IS_x86_Feature_detected macro to detect CPU support for SIMD. The macro uses the CPUID instruction, which is prohibited by the SGX environment, causing the program to crash.

While SGX environments prohibit the CPUID instruction for security reasons, it is common “legitimate behavior” for applications to use the CPUID to check CPU support for SIMD. Although the CPUID triggered the crash although there is no leakage, no unauthorized access, no Unsound trigger security issues, is a run-time security safeguard. But such “legitimate behavior” that triggers a run-time crash is clearly unacceptable, and our business is at risk of downtime at any time if our code relies on relevant detection logic. Therefore, we either made STD :: IS_x86_Feature_detected fit for the SGX environment, or we made sure that our overall code did not touch the CPUID.

The IS_x86_Feature_detected macro was re-implemented in SGX_TSTD for Teaclave to avoid touching the CPUID. When we go with the native STD solution, the problem is slightly more complicated and we cannot re-implement is_x86_Feature_detected (unless STD is patched) through liBC as we did earlier. Also, solving a single IS_x86_Feature_detected obviously does not prevent code from embedding CPUID assembly instructions directly. Therefore, we chose to keep the CPUID out of the code for the moment.

To do this, we add a post-compilation step to disassemble the output enclave.so to check for CPUID instructions. If so, we tell make to report an error and print the function name:

You can then use the previous callerFinder.py to find out which functions caused the CPUID dependency:

Then, we can follow the lead to find the corresponding code implementation. If necessary, we can patch the corresponding Crate to get him to use is_x86_Feature_detected provided by Teaclave, such as rand::thread_rng(); If it is a function that can be cut, we will cut it, such as env_logger in the figure above. We can eliminate this dependency by turning off its regex feature.

About other SGX illegal instructions

Since CPUID has this problem, is it possible to encounter other instructions that SGX specifically forbids? In theory, of course, this is possible. Check out Intel’s guide to see what special instructions there are.

The guidelines describe some illegal directives in SGX environments as follows:

For its 2 and 3, except for system call instructions such as INT/SYSCALL/SYSENTER, all should appear only in the system kernel code. System call functions should be performed by either Rust or C/C++ ecologies by calling liBC’s related functions or syscall functions, rather than directly embedding assembly instructions, unless it is a very specific program.

We need to focus on these directives in 1:

The CPUID is used to check the CPU’s support for SIMD so that different SIMD instruction sets can be used to process computationally-intensive tasks.
GETSEC is a leaf Funtion master entry with a number of sub-functions, all for special purposes. It’s not used in general programs

RDPMC

Read performance counters, special purpose, tools like PERF, don’t worry.
RDTSC/RDTSCP

Read CPU TIMESTAMP counter, may be used by applications, add instruction detection script.
SGDT – Store Global Descriptor Table Register

Only the operating system is used
SIDT – Store Interrupt Descriptor Table Register

Only the operating system is used
SLDT – Store Local Descriptor Table Register

Only the operating system is used
STR – Store Task Register

Only the operating system is used
VMCALL/VMFUNC virtualization instructions, do not pay attention to

To be on the safe side, we put all of these instructions in the post-processing check script and forbid them.

summary

With the addition of STD support, developing SGX applications with Rust is not much different from developing normal Rust applications, but can also use facilities such as unit testing directly from the Rust standard tool chain to increase development efficiency.