SQLite is a cross-platform relational database, widely used in client development, flying book also uses SQLite as persistent data storage; At the same time, in order to facilitate the use of the upper layer, diesel is used as an ORM to interact with SQLite. The overall use mode is as follows:

rust code -> diesel orm -> sqlite ffi
Copy the code

The call situation is as follows:

To port SQLite to the WEB, we need to do two things:

  1. Compile SQLite to the WASM platform
  2. Encapsulates the wASM platform’s cross-module call interface for diesel

Considering the fragility of persistent storage mechanism on the WEB, and the consideration of business form, there is no need to do persistence on the WEB, only a relational database in memory for the time being; With these requirements in mind, we are on our way to porting SQLite to WASM. Let’s Go!

WASM working mode

Currently, WASM actually works in three modes: Emscripten mode, WASI mode and pure mode without any dependence correspond to WASM32-unknown-emscripten, WASM32-WASI and WASM32-unknown-unknown respectively in Rust language. The wASM artifacts of the first two modes require the host to provide POSIX and WASI interface capabilities, respectively, and the last mode has no external dependencies at all

For these three modes, the friendliness to C/C++ code is: Emscripten>Wasi>>Unknown

The ecology of Rust community is basically built around WASM32-Unknown-Unknown and WASM32-WASI, such as wasm-BindGen tools. However, considering that unknown environment has little external dependence, we have determined the RUST code in SDK first. Wasm32-unknown-unknown mode is used first, followed by WASM32-WASI mode. For the SQLite part, we tried all three wASM working modes:

Emscripten mode adaptation

Emscripten is a toolchain that helps compile C/C++ code into the WASM target format and provides the ability to simulate POSIX-related calls.

Compile the Emscripten artifacts

SQLite is a C library and it is easy to compile SQLite to WASM using Emscripten. The process is simple and can be compiled directly using emcc (see github.com/sql-js/sql….).

The first step to compile SQLite into WASM is easy: we compile sqLite into an Emscripten target instance of WASM, which is loaded by the front end. Then, on the SDK side, invoke the interface provided by the SQLite WASM instance through the ABI interface of WASM.

SQLite interface call

But when it came to the second step, providing WASM’s FFI to Diesel, we ran into a problem: By default, diesel uses libsqLite-sys to provide FFI with the C ABI. In native environments, SQLite libraries and FFI users share the same memory space, so many things are easier to handle, such as memory allocation or direct pointer manipulation. However, if SQLite is compiled into a separate instance of WASM, ffI part of sqLite is called as a separate instance of WASM, and the two WASM instances are in different memory Spaces, you cannot directly use Pointers or other operations that depend on the same memory space. This results in the new implementation of all THE FFI invocation processes under emscripten Target.

Use a dynamic library-like invocation

Start the SQLite WASM instance asynchronously and hang the exported interface on the JS global object window. Then use the wasm-bindgen to bind these JS interfaces in rust. For example, when sqLite connects to DB and creates a DB connection, wASM needs to call wASM’s memory allocation function to allocate memory and write data:

// Native version operation

pub fn establish(raw_database_url: &str) -> ConnectionResult<Self> {

    let mut conn_pointer = ptr::null_mut();

    let database_url = CString::new(raw_database_url.trim_start_matches("sqlite://"))? ;let connection_status = unsafe { ffi::sqlite3_open(database_url.as_ptr(), &mutconn_pointer) }; . }// WASM version operation

#[wasm_bindgen]

extern "C" {

    // sqliteBindings is a global object hung in the window

    // allocateUTF8 and stackAlloc are the string and stack memory allocation interfaces exported by Emscripten Wasm

    #[wasm_bindgen(js_namespace = sqliteBindings, js_name = allocateUTF8)]

    pub fn allocate_utf8(s: &str) - > *const i8;

    #[wasm_bindgen(js_namespace = sqliteBindings, js_name = stackAlloc)]

    pub fn stack_alloc_sqlite3(size: usize) - > *mut *mut ffi::sqlite3;

}

pub fn establish(raw_database_url: &str) -> ConnectionResult<Self> {

    let conn_pointer = stack_alloc_sqlite3(0);

    let database_url_ptr = allocate_utf8(raw_database_url.trim_start_matches("sqlite://"));

    let connection_status = unsafe{ ffi::sqlite3_open(database_url_ptr, conn_pointer) }; . }Copy the code

We implemented diesel+ SQLite to work in emscripten mode. The data flow mode is as follows:

In this mode, sqLite is a separate wASM instance, and the other LARK SDK code is an instance. In the actual running mode, the JS code first loads the SQLite WASM instance, and then loads the SDK wASM instance. The DIESEL code in the SDK then calls the functionality of the SQLite instance through the encapsulated interactive interface.

In this mode, each call to SQLite involves a copy of data between two WASM instances (the memory space of each wASM instance is separate), which is too expensive for the high frequency data call scenario in DB.

So we asked: Is it possible to merge an SQLite instance with other SDK instance code to generate a WASM instance? If SQLite is an Emscripten mode wASM, the rest of the SDK code must be emscripten mode as well, but as mentioned earlier, the core of rust’s WASM ecology is WASM32-unknown-Unknown and WASm32-WASi. So if you want an instance to contain SDK code and SQLite, you can’t use wasm32-unknown-emscripten mode. In addition, in WASM32-WASI and WASM32-unknown-Unknown modes, we can use THE ABI of C, that is, there is no need for wASM interface encapsulation like emscripten mode, and sqLite can be called from rust in a manner similar to that on native platform.

WASI mode adaptation

In the practice of optimizing SDK and SQLite as one instance, we excluded the use of Emscripten mode; In WASI and Unknown mode, WASI is a more friendly platform for C/C++ code, and the interface in WASI standard is closer to POSIX.

However, WASI is currently generally executed on non-Web platforms. To run on the WEB, you need to provide a simulation of the corresponding functionality that WASI needs. Fortunately, the community already has the corresponding functionality: github.com/wasmerio/wa…

Now that the host environment is done, let’s look at SQLite itself; Currently SQLite does not provide official support for WASI, but SQLite has a very flexible architecture:

SQLite encapsulates all platform-related operations in os-related modules, and abstracts the use of platform functionality in the form of a VFS, so long as we implement a VFS that works in WASI mode

The realization of the direct reference to the official www.sqlite.org/src/doc/tru… With the SQLITE_OS_OTHER option turned on at compile time and linked to our CORRESPONDING C implementation of VFS and wasI simulation of WASmer-JS, we finally have sqLite and the rest of the SDK code as a WASM instance of WASM32-WASI mode.

But…

After upgrading rust, wasm-Bindgen no longer works… See: github.com/rustwasm/wa… The problem is that on January 13, 2021, Rust merged a commit that changed the ABI format of WASI mode. Previously, the ABI of WASI mode and Unknown mode were the same under Rust, but after this commit, the two forked. And wasm-Bindgen has no official plans for wasi…

So that leaves us with only one path: WASm32-unknown-unknown

Unknown Mode adaptation

Unknown mode is the least C/C++ friendly mode: no header declarations, no string manipulation methods, no fD-related methods and even no malloc… But this is the only way, see mountains open mountains, meet sea reclamation

There are three features that need to be provided to work in Unknown mode: the memory allocator, the C functions used, and the VFS implementation

Memory allocator ADAPTS

C does not provide malloc encapsulation in WASm32-unknown-unknown mode, but rust has memory-related encapsulation, so we can implement malloc method in rust for SQLite to call after linking:

// Change the name of the call to malloc to minimize the impact

#[cfg(all(target_arch = "wasm32", target_os = "unknown")))

mod allocator {
    use std::alloc::{alloc, dealloc, realloc as rs_realloc, Layout};

    #[no_mangle]
    pub unsafe fn sqlite_malloc(len: usize) - > *mut u8 {
        let align = std::mem::align_of::<usize> ();let layout = Layout::from_size_align_unchecked(len, align);
        let ptr = alloc(layout);

        ptr
    }

    const SQLITE_PTR_SIZE: usize = 8;

    #[no_mangle]
    pub unsafe fn sqlite_free(ptr: *mut u8) - >i32 {
        let mut size_a = [0; SQLITE_PTR_SIZE];
        size_a.as_mut_ptr().copy_from(ptr, SQLITE_PTR_SIZE);
        let ptr_size: u64 = u64::from_le_bytes(size_a);
        let align = std::mem::align_of::<usize> ();let layout = Layout::from_size_align_unchecked(ptr_size as usize, align);
        dealloc(ptr, layout);

        0
    }

    #[no_mangle]
    pub unsafe fn sqlite_realloc(ptr: *mut u8, size: usize) - > *mut u8 {
        let align = std::mem::align_of::<usize> ();let layout = Layout::from_size_align_unchecked(size, align);
        rs_realloc(ptr, layout, size)
    }

}
Copy the code

The LIBC function is provided

When SQLITE_OS_OTHER is turned on, the dependency on LIBC is reduced because the system interface is no longer used, but there are a few basic non-system function dependencies:

strcspn
strcmp/strncmp
strlen
strchr/strrchr
qsort
Copy the code

String several functions are very simple, directly their own implementation of the line; For the last qsort function, copy the loosely licensed three-party implementation

VFS implementation

Both Emscripten and WASi use the virtual file system provided by the host for operation. In unknown mode, in order not to increase external dependency, we can directly provide a memory VFS inside the SDK code for SQLite to use.

At the heart of implementing VFS is the provision of two structural implementations:

typedef struct sqlite3_vfs sqlite3_vfs;

typedef void (*sqlite3_syscall_ptr)(void);

struct sqlite3_vfs {

    int iVersion; /* Structure version number (currently 3) */

    int szOsFile; /* Size of subclassed sqlite3_file */

    int mxPathname; /* Maximum file pathname length */

    sqlite3_vfs *pNext; /* Next registered VFS */

    const char *zName; /* Name of this virtual file system */

    void *pAppData; /* Pointer to application-specific data */

    int (*xOpen)(sqlite3_vfs*, const char *zName, sqlite3_file*,

    int flags, int *pOutFlags);

    int (*xDelete)(sqlite3_vfs*, const char *zName, int syncDir);

    int (*xAccess)(sqlite3_vfs*, const char *zName, int flags, int *pResOut);

    int (*xFullPathname)(sqlite3_vfs*, const char *zName, int nOut, char *zOut);

    void *(*xDlOpen)(sqlite3_vfs*, const char *zFilename);

    void (*xDlError)(sqlite3_vfs*, int nByte, char *zErrMsg);

    void (*(*xDlSym)(sqlite3_vfs*,void*, const char *zSymbol))(void);

    void (*xDlClose)(sqlite3_vfs*, void*);

    int (*xRandomness)(sqlite3_vfs*, int nByte, char *zOut);

    int (*xSleep)(sqlite3_vfs*, int microseconds);

    int (*xCurrentTime)(sqlite3_vfs*, double*);

    int (*xGetLastError)(sqlite3_vfs*, int.char *);

    /* ** The methods above are in version 1 of the sqlite_vfs object ** definition. Those that follow are added in version 2 or later */

    int (*xCurrentTimeInt64)(sqlite3_vfs*, sqlite3_int64*);

    /* ** The methods above are in versions 1 and 2 of the sqlite_vfs object. ** Those below are for version 3 and greater. * /

    int (*xSetSystemCall)(sqlite3_vfs*, const char *zName, sqlite3_syscall_ptr);

    sqlite3_syscall_ptr (*xGetSystemCall)(sqlite3_vfs*, const char *zName);

    const char *(*xNextSystemCall)(sqlite3_vfs*, const char *zName);

    /* ** The methods above are in versions 1 through 3 of the sqlite_vfs object. ** New fields may be appended in future versions. The iVersion ** value will increment whenever this happens. */

};

typedef struct sqlite3_io_methods sqlite3_io_methods;

struct sqlite3_io_methods {

    int iVersion;

    int (*xClose)(sqlite3_file*);

    int (*xRead)(sqlite3_file*, void*, int iAmt, sqlite3_int64 iOfst);

    int (*xWrite)(sqlite3_file*, const void*, int iAmt, sqlite3_int64 iOfst);

    int (*xTruncate)(sqlite3_file*, sqlite3_int64 size);

    int (*xSync)(sqlite3_file*, int flags);

    int (*xFileSize)(sqlite3_file*, sqlite3_int64 *pSize);

    int (*xLock)(sqlite3_file*, int);

    int (*xUnlock)(sqlite3_file*, int);

    int (*xCheckReservedLock)(sqlite3_file*, int *pResOut);

    int (*xFileControl)(sqlite3_file*, int op, void *pArg);

    int (*xSectorSize)(sqlite3_file*);

    int (*xDeviceCharacteristics)(sqlite3_file*);

    /* Methods above are valid for version 1 */

    int (*xShmMap)(sqlite3_file*, int iPg, int pgsz, int.void volatile* *);int (*xShmLock)(sqlite3_file*, int offset, int n, int flags);

    void (*xShmBarrier)(sqlite3_file*);

    int (*xShmUnmap)(sqlite3_file*, int deleteFlag);

    /* Methods above are valid for version 2 */

    int (*xFetch)(sqlite3_file*, sqlite3_int64 iOfst, int iAmt, void **pp);

    int (*xUnfetch)(sqlite3_file*, sqlite3_int64 iOfst, void *p);

    /* Methods above are valid for version 3 */

    /* Additional methods may be added in future releases */

};
Copy the code

Use RUST to implement VFS

To implement a MEMVFS, you need at least one dynamically adjustable container; The C language has no such container officially. If you want to use C language to implement memVFS, you can only implement a similar HashMap or LinkedList, which is a bit troublesome. So this logic is implemented in Rust as well.

VFS binding

In rust code, provide an sqlite3_OS_INIT method that is automatically linked to this function when you link to SQLite

#[no_mangle]
pub unsafe fn sqlite3_os_init() -> std::os::raw::c_int {
    let mut mem_vfs = Box::new(super::memvfs::get_mem_vfs());
    let mem_vfs_ptr: *mut crate::sqlite3_vfs = mem_vfs.as_mut();
    let rc = crate::sqlite3_vfs_register(mem_vfs_ptr, 1); debug! ("sqlite3 vfs register result: {}", rc);

    std::mem::forget(mem_vfs);

    rc
}
Copy the code

In-memory data storage container

Since multiple paths are supported, the simplest implementation is to provide a HashMap with the path as the key:

struct Node {
    size: usize,
    data: Vec<u8>,
}

lazy_static! {
    static ref FS: RwLock<HashMap<String, Arc<RwLock<Node>>>> = RwLock::new(HashMap::new());
}
Copy the code

Data read and write interface:

fn copy_out(&self, dst: *mut raw::c_void, offset: isize, count: usize) - >Option< > () {if self.size < offset as usize+ count { log::trace! ("handle invalid input offset");
            return None;
        }

        let ptr = self.data.as_ptr();

        let dst = dst as *mut u8;

        unsafe {
            let ptr = ptr.offset(offset);
            ptr.copy_to(dst, count);
        }

        Some(())}fn write_in(&mut self, src: *const raw::c_void, offset: isize, count: usize) {
        let new_end = offset as usize + count;
        
        // We need to expand the capacity based on the offset passed in
        let count_extend: isize = new_end as isize - self.data.len() as isize;
        if count_extend > 0 {
            self.data.extend(vec![0; count_extend as usize]);
        }

        if new_end > self.size {
            self.size = new_end;
        }

        let ptr = self.data.as_mut_ptr();

        unsafe {
            let ptr = ptr.offset(offset);
            ptr.copy_from(src as *const u8, count); }}Copy the code

VFS implementation

Register the corresponding custom SQlite3_io_methods in the xOpen method implementation of sqlite3_VFS

With the help of the above work, we finally compiled SQLite into WASM files in WASM32-unknown-unknown mode. Meanwhile, the upper layer can directly reuse Diesel, so that the business code is not changed.

At this point, the Lark SDK works on the Web as follows:

The overall working mode is once again aligned with the Native platform, with no external dependencies and no need to copy data between WASM instances when querying.

Join us

  • Febook-bytedance enterprise collaboration platform is a one-stop enterprise communication and collaboration platform integrating video conference, online documents, mobile office and collaboration software. At present, feishu’s business is developing rapidly, and there are R&D centers in Beijing, Shenzhen and other cities. There are enough HC for front-end, mobile terminal, Rust, server, test, product and other positions. We look forward to your joining us to do challenging things with us. Please poke the link: future.feishu.cn/recruit

  • We also welcome the students and flying book technical issues together to exchange, interested students please click the flying book technical exchange group into the group exchange