Author: Wood color/Post editor: Zhang Handong


Fly book in WASM adaptation, share the process of relational database SQLite adaptation WASM.

SQLite is a cross-platform relational database, widely used in client development, Feishu also uses SQLite as data persistent storage; At the same time, for the convenience of the upper layer, diesel is adopted as orM to interact with SQLite. The overall usage mode is as follows:

rust code -> diesel orm -> sqlite ffi
Copy the code

The call is shown as follows:

To port SQLite to the WEB, we need to do two things:

  1. Compile SQLite to the WASM platform
  2. Encapsulate the wASM platform’s cross-module call interface for Diesel use

Considering the fragility of the persistent storage mechanism on the WEB, as well as the consideration of the business form, there is no need to do persistence on the WEB, just a relational database in memory for the time being; With these features identified, the WASM migration of SQLite has begun. Let’s Go!

WASM working mode

Currently, WASM actually works in three modes: In Rust language, Emscripten mode, WASI mode and pure mode without any dependence correspond to wasM32-unknown-Emscripten, WASM32-WASI and WASM32-unknown-unknown respectively. The first two patterns of WASM artifacts require the host to provide POSIX and WASI interface functionality, respectively, and the last pattern has no external dependencies at all

The friendliness of C/C++ code for these three modes: Emscripten>Wasi>>Unknown

The ecology of Rust community is basically built around WASM32-unknown-Unknown and WASM32-WASI, such as wASM-Bindgen tools, etc. However, considering the unknown environment is less dependent on the outside, we determined the RUST code in THE SDK first, using WASM32-unknown-unknown mode first, wasM32-WASI mode second. For sqLite, we tried all three wASM modes:

Emscripten mode adaptation

Emscripten is a tool chain that helps compile C/C++ code into the WASM target format and provides emulation of POSIX-related calls.

Compile the Emscripten product

SQLite is a C library that makes it easy to compile SQLite into WASM using Emscripten. The process is as simple as using EMCC directly (see github.com/sql-js/sql.

The first step is to compile SQLite into WASM easily: we compile SQLite as an Emscripten target WASM instance, which is loaded by the front end. Then on the SDK side, the interface provided by the SQLite WASM instance is called through the WASM ABI interface.

SQLite interface call

But when it came to step 2, providing WASM’s FFI to Diesel, we ran into trouble: By default, diesel uses libsqlite-sys to provide FFI in the C ABI. In native environments, SQLite and FFI users share the same memory space, so many things are easier to handle, such as memory allocation and direct manipulation of Pointers. However, if SQLite is compiled as a separate WASM instance and ffI calls SQLite as a separate WASM instance, the two WASM instances in different memory Spaces cannot directly use operations such as Pointers that depend on the same memory space. This results in an entirely new implementation of the FFI call flow under The Emscripten Target.

Call in a similar way to a dynamic library

Start the SQLite WASM instance asynchronously, attach the exported interface to the JS global object Window, and bind these JS interfaces in Rust with wASM-bindgen. For example, when creating a DB connection, the wASM environment needs to call the memory allocation function of WASM to allocate memory and write data:

// Native version operation

pub fn establish(raw_database_url: &str) -> ConnectionResult<Self> {

    let mut conn_pointer = ptr::null_mut();

    let database_url = CString::new(raw_database_url.trim_start_matches("sqlite://"))? ;let connection_status = unsafe { ffi::sqlite3_open(database_url.as_ptr(), &mutconn_pointer) }; . }// WASM

#[wasm_bindgen]

extern "C" {

    // sqliteBindings is a global object that hangs on Windows

    // allocateUTF8, stackAlloc is the emscripten wASM export string, stack memory allocation interface

    #[wasm_bindgen(js_namespace = sqliteBindings, js_name = allocateUTF8)]

    pub fn allocate_utf8(s: &str) - > *const i8;

    #[wasm_bindgen(js_namespace = sqliteBindings, js_name = stackAlloc)]

    pub fn stack_alloc_sqlite3(size: usize) - > *mut *mut ffi::sqlite3;

}

pub fn establish(raw_database_url: &str) -> ConnectionResult<Self> {

    let conn_pointer = stack_alloc_sqlite3(0);

    let database_url_ptr = allocate_utf8(raw_database_url.trim_start_matches("sqlite://"));

    let connection_status = unsafe{ ffi::sqlite3_open(database_url_ptr, conn_pointer) }; . }Copy the code

We implemented diesel+ SQLite in Emscripten mode with wASM support where SQLite is used:

In this mode, SQLite is an independent WASM instance, and the other LARK SDK code is an instance. When running, the JS code loads the SQLite WASM instance first, and then the SDK WASM instance. The DIESEL code in the SDK then invokes the functions of the SQLite instance through the encapsulated interactive interface.

In this working mode, each sqLite call involves copying data between two WASM instances (the memory space of each WASM instance is independent), which is too expensive for high frequency data call scenarios like DB.

So we wondered: could we combine sqLite instance code with other SDK instance code to generate a WASM instance? If SQLite is an Emscripten-mode WASM, the rest of the SDK code must also be in Emscripten-mode, but as mentioned earlier, the core of rust’s WASM ecosystem is WASM32-unknown-unknown and WASM32-WASI. So if you want an instance that includes SDK code and SQLite, you can’t use wasM32-unknown-Emscripten. In addition, in WASM32-WASI and WASM32-unknown-unknown modes, we can use THE ABI of C, that is, without wASM interface encapsulation like Emscripten mode, we can call SQLite from Rust in a manner similar to native platform.

WASI mode adaptation

In our practice of optimizing SDK and SQLite into one instance, we excluded the use of Emscripten patterns; In WASI and UNKNOWN mode, WASI is a more friendly platform for C/C++ code, and the interface in WASI standard is closer to POSIX.

However, WASI is currently generally implemented on non-Web platforms, and running on the WEB requires providing a simulation of the corresponding functionality required by WASI. Fortunately, the community already has the corresponding functionality: github.com/wasmerio/wa…

With the host environment in place, let’s look at SQLite itself; There is currently no official support for WASI for SQLite, but SQLite has a very flexible architecture:

SQLite encapsulates all platform-specific operations in OS modules and abstracts the use of platform functionality in the form of a VFS, so we just need to implement a VFS that works in WASI mode

The realization of the direct reference to the official www.sqlite.org/src/doc/tru… Open the SQLITE_OS_OTHER option at compile time, link to our corresponding VFS implemented in C, and work with WASMER-JS’s WASI emulation. Finally, we put sqLite and the rest of the SDK code into a WASM instance in WASM32-WASI mode.

But…

Wasm-bindgen no longer works after upgrading the rust version… Details: github.com/rustwasm/wa… The cause of the problem is that on January 13, 2021, Rust merged a commit to change the ABI format of THE WASI mode. Prior to this commit, the ABI of the WASI mode and unknown mode were the same. However, after this commit, the two modes forked. And wASM-Bindgen has no official plans for wasI…

So now we are left with only one path: WASm32-unknown-unknown

Unknown mode adaptation

Unknown mode is the least C/C++ friendly mode: no header declarations, no string manipulation methods, no FD-related methods, not even malloc… But this is the only way, see the mountains, meet the sea reclamation

There are three features that need to be provided to implement working in Unknown mode: the memory allocator, the C functions used, and the VFS implementation

Memory allocator adaptation

C does not provide a wrapper for MALloc in WASM32-unknown-unknown mode, but rust does provide a wrapper for memory, so we can implement the malloc method in Rust for SQLite to call:

// The name of the call to malloc was changed for minimal impact

#[cfg(all(target_arch = "wasm32", target_os = "unknown")))

mod allocator {

    use std::alloc::{alloc, dealloc, realloc as rs_realloc, Layout};

    #[no_mangle]

    pub unsafe fn sqlite_malloc(len: usize) - > *mut u8 {

        let align = std::mem::align_of::<usize> ();let layout = Layout::from_size_align_unchecked(len, align);

        let ptr = alloc(layout);

        ptr

    }

    const SQLITE_PTR_SIZE: usize = 8;

    #[no_mangle]

    pub unsafe fn sqlite_free(ptr: *mut u8) - >i32 {

        let mut size_a = [0; SQLITE_PTR_SIZE];

        size_a.as_mut_ptr().copy_from(ptr, SQLITE_PTR_SIZE);

        let ptr_size: u64 = u64::from_le_bytes(size_a);

        let align = std::mem::align_of::<usize> ();let layout = Layout::from_size_align_unchecked(ptr_size as usize, align);

        dealloc(ptr, layout);

        0

    }

    #[no_mangle]

    pub unsafe fn sqlite_realloc(ptr: *mut u8, size: usize) - > *mut u8 {

        let align = std::mem::align_of::<usize> ();let layout = Layout::from_size_align_unchecked(size, align);

        rs_realloc(ptr, layout, size)

    }

}
Copy the code

Libc functions are provided

With the SQLITE_OS_OTHER switch turned on, the liBC dependency is much less since the system interface is no longer used, but there are several basic non-system function dependencies:

strcspn
strcmp/strncmp
strlen
strchr/strrchr
qsort
Copy the code

String functions are very simple to implement directly; For the last qsort function, copy the permissive tripartite implementation

VFS implementation

Both Emscripten and WASI use the virtual file system provided by the host to operate. In unknown mode, to avoid external dependencies, we can provide a memory VFS directly inside the SDK code for SQLite to use.

At the heart of implementing VFS are two structural implementations:

typedef struct sqlite3_vfs sqlite3_vfs;

typedef void (*sqlite3_syscall_ptr)(void);

struct sqlite3_vfs {

    int iVersion; /* Structure version number (currently 3) */

    int szOsFile; /* Size of subclassed sqlite3_file */

    int mxPathname; /* Maximum file pathname length */

    sqlite3_vfs *pNext; /* Next registered VFS */

    const char *zName; /* Name of this virtual file system */

    void *pAppData; /* Pointer to application-specific data */

    int (*xOpen)(sqlite3_vfs*, const char *zName, sqlite3_file*,

    int flags, int *pOutFlags);

    int (*xDelete)(sqlite3_vfs*, const char *zName, int syncDir);

    int (*xAccess)(sqlite3_vfs*, const char *zName, int flags, int *pResOut);

    int (*xFullPathname)(sqlite3_vfs*, const char *zName, int nOut, char *zOut);

    void *(*xDlOpen)(sqlite3_vfs*, const char *zFilename);

    void (*xDlError)(sqlite3_vfs*, int nByte, char *zErrMsg);

    void (*(*xDlSym)(sqlite3_vfs*,void*, const char *zSymbol))(void);

    void (*xDlClose)(sqlite3_vfs*, void*);

    int (*xRandomness)(sqlite3_vfs*, int nByte, char *zOut);

    int (*xSleep)(sqlite3_vfs*, int microseconds);

    int (*xCurrentTime)(sqlite3_vfs*, double*);

    int (*xGetLastError)(sqlite3_vfs*, int.char *);

    /* ** The methods above are in version 1 of the sqlite_vfs object ** definition. Those that follow are added in version 2 or later */

    int (*xCurrentTimeInt64)(sqlite3_vfs*, sqlite3_int64*);

    /* ** The methods above are in versions 1 and 2 of the sqlite_vfs object. ** Those below are for version 3 and greater. * /

    int (*xSetSystemCall)(sqlite3_vfs*, const char *zName, sqlite3_syscall_ptr);

    sqlite3_syscall_ptr (*xGetSystemCall)(sqlite3_vfs*, const char *zName);

    const char *(*xNextSystemCall)(sqlite3_vfs*, const char *zName);

    /* ** The methods above are in versions 1 through 3 of the sqlite_vfs object. ** New fields may be appended in future versions. The iVersion ** value will increment whenever this happens. */

};

typedef struct sqlite3_io_methods sqlite3_io_methods;

struct sqlite3_io_methods {

    int iVersion;

    int (*xClose)(sqlite3_file*);

    int (*xRead)(sqlite3_file*, void*, int iAmt, sqlite3_int64 iOfst);

    int (*xWrite)(sqlite3_file*, const void*, int iAmt, sqlite3_int64 iOfst);

    int (*xTruncate)(sqlite3_file*, sqlite3_int64 size);

    int (*xSync)(sqlite3_file*, int flags);

    int (*xFileSize)(sqlite3_file*, sqlite3_int64 *pSize);

    int (*xLock)(sqlite3_file*, int);

    int (*xUnlock)(sqlite3_file*, int);

    int (*xCheckReservedLock)(sqlite3_file*, int *pResOut);

    int (*xFileControl)(sqlite3_file*, int op, void *pArg);

    int (*xSectorSize)(sqlite3_file*);

    int (*xDeviceCharacteristics)(sqlite3_file*);

    /* Methods above are valid for version 1 */

    int (*xShmMap)(sqlite3_file*, int iPg, int pgsz, int.void volatile* *);int (*xShmLock)(sqlite3_file*, int offset, int n, int flags);

    void (*xShmBarrier)(sqlite3_file*);

    int (*xShmUnmap)(sqlite3_file*, int deleteFlag);

    /* Methods above are valid for version 2 */

    int (*xFetch)(sqlite3_file*, sqlite3_int64 iOfst, int iAmt, void **pp);

    int (*xUnfetch)(sqlite3_file*, sqlite3_int64 iOfst, void *p);

    /* Methods above are valid for version 3 */

    /* Additional methods may be added in future releases */

};
Copy the code

VFS is implemented using Rust

To implement a memvfs, you need at least one dynamically adjustable container; If you want to use C language to implement memvfs, you can only implement a similar HashMap or LinkedList, which is slightly troublesome. So this logic is also implemented in Rust.

VFS binding

In rust code, you provide a sqlite3_OS_init method, which is automatically linked to when you link to SQLite

#[no_mangle]

pub unsafe fn sqlite3_os_init() -> std::os::raw::c_int {
    let mut mem_vfs = Box::new(super::memvfs::get_mem_vfs());

    let mem_vfs_ptr: *mut crate::sqlite3_vfs = mem_vfs.as_mut();

    let rc = crate::sqlite3_vfs_register(mem_vfs_ptr, 1); debug! ("sqlite3 vfs register result: {}", rc);

    std::mem::forget(mem_vfs);

    rc
}
Copy the code

In-memory data storage container

Since multiple paths are supported, the simplest implementation is to provide a HashMap, using the path as the key:

struct Node {
    size: usize,
    data: Vec<u8>,
}

lazy_static! {
    static ref FS: RwLock<HashMap<String, Arc<RwLock<Node>>>> = RwLock::new(HashMap::new());
}
Copy the code

Data read and write interface:

fn copy_out(&self, dst: *mut raw::c_void, offset: isize, count: usize) - >Option< > () {if self.size < offset as usize+ count { log::trace! ("handle invalid input offset");
            return None;
        }

        let ptr = self.data.as_ptr();

        let dst = dst as *mut u8;

        unsafe {
            let ptr = ptr.offset(offset);
            ptr.copy_to(dst, count);
        }

        Some(())}fn write_in(&mut self, src: *const raw::c_void, offset: isize, count: usize) {
        let new_end = offset as usize + count;
        
        // Do the expansion according to the offset passed in
        let count_extend: isize = new_end as isize - self.data.len() as isize;
        if count_extend > 0 {
            self.data.extend(vec![0; count_extend as usize]);
        }

        if new_end > self.size {
            self.size = new_end;
        }

        let ptr = self.data.as_mut_ptr();

        unsafe {
            let ptr = ptr.offset(offset);
            ptr.copy_from(src as *const u8, count); }}Copy the code

VFS implementation

Register the corresponding custom SQlite3_IO_methods in the xOpen method implementation of SQlite3_VFS

With the help of the above work, we finally compiled SQLite into WASM file in WASM32-unknown-unknown mode. At the same time, the upper layer can directly reuse diesel, so that the business code does not change.

So far, the lark SDK works on the Web as follows:

The overall working mode is again aligned with native platforms, with no external dependencies and no data copy between WASM instances when querying.