FTS5 extension Notes

1. FTS5 extension

Extending FTS5

This note is a low-level translation and understanding of part 5 of the official document Extending on FTSC, building on an understanding of the basic strategy for full-text indexing. If there are any mistakes, please point them out. While the Tencent community has a Chinese translation of the document, the translation of this section is also slightly strange and typographically unfriendly. Finally decided to learn the translation by yourself, to deepen understanding is more beneficial.

(Tokenizer, writing “participle”)

The original address

The FTS5 feature API allows it to be extended in two ways:

  • Add new helper functions (implemented in C)
  • Add new word dividers (also implemented in C)

All of the [built-in toparticiples and helper functions] proposed in this document are implemented using the publicly available apis described below.

Before a new helper function or implementation of a new toggle can be registered with FTS5, the application must first obtain a pointer to the “fts5_API” structure.

Each database connection registered with the FTS5 extension has an FTS5_API structure. To obtain this pointer, the application calls a user-defined fts5() function with only one argument. The sqlite3_bind_pointer() interface must be used to set this parameter to a pointer to an fts5_API object. Here’s a code demo.

/* ** Return a pointer to the fts5_api pointer for database connection db. ** If an error occurs, return NULL and leave an error in the database ** handle (accessible using sqlite3_errcode()/errmsg()). */
fts5_api *fts5_api_from_db(sqlite3 *db){
  fts5_api *pRet = 0;
  sqlite3_stmt *pStmt = 0;

  if( SQLITE_OK==sqlite3_prepare(db, "SELECT fts5(? 1)".- 1, &pStmt, 0) ){
    sqlite3_bind_pointer(pStmt, (void*)&pRet, "fts5_api_ptr".NULL);
   	// ph: assign pRet to sqlite3_bind_pointer.
    
    sqlite3_step(pStmt);
  }
  sqlite3_finalize(pStmt);
  return pRet;
}
Copy the code

The Fts5_api structure is defined as follows. It exposes three methods, one for registering new helper functions and new participles, and one for retrieving existing participles. The latter is intended to facilitate the implementation of “tokenizer Wrappers “, similar to the built-in “Porter” classifier.

typedef struct fts5_api fts5_api;
struct fts5_api {
  int iVersion;                   /* Currently always set to 2 */

  /* Create a new tokenizer */
  int (*xCreateTokenizer)(
    fts5_api *pApi,
    const char *zName,
    void *pContext,
    fts5_tokenizer *pTokenizer,
    void (*xDestroy)(void*));/* Find an existing tokenizer */
  int (*xFindTokenizer)(
    fts5_api *pApi,
    const char *zName,
    void **ppContext,
    fts5_tokenizer *pTokenizer
  );

  /* Create a new auxiliary function */
  int (*xCreateFunction)(
    fts5_api *pApi,
    const char *zName,
    void *pContext,
    fts5_extension_function xFunction,
    void (*xDestroy)(void*)); };Copy the code

To call the methods of the fTS5_API object, the fts5_API pointer itself should be passed as the first argument, followed by other method-specific arguments.

Such as:

rc = pFts5Api->xCreateTokenizer(pFts5Api, ... other args ...) ;Copy the code

The methods of the FTS5_API structure are described separately in the following sections.

1.1 Custom word segmentation

To create a custom word splitter, an application must implement three methods: a word splitter constructor (xCreate), a destructor (xDelete), and a function that actually implements word splicing (xTokenize). Each function has the same type as a member variable of the fts5_tokenizer structure:

typedef struct Fts5Tokenizer Fts5Tokenizer;
typedef struct fts5_tokenizer fts5_tokenizer;
struct fts5_tokenizer {
  int (*xCreate)(void*, const char **azArg, int nArg, Fts5Tokenizer **ppOut);
  // ph: function pointer xCreate
  void (*xDelete)(Fts5Tokenizer*);
  // ph: function pointer xDelete
  int (*xTokenize)(Fts5Tokenizer*, 
      void *pCtx,
      int flags,            /* Mask of FTS5_TOKENIZE_* flags */
      const char *pText, int nText, 
      int (*xToken)(
        void *pCtx,         /* Copy of 2nd argument to xTokenize() */
        int tflags,         /* Mask of FTS5_TOKEN_* flags */
        const char *pToken, /* Pointer to buffer containing token */
        int nToken,         /* Size of token in bytes */
        int iStart,         /* Byte offset of token within input text */
        int iEnd            /* Byte offset of end of token within input text */));// ph: function pointer xTokenize
};


/* Flags that may be passed as the third argument to xTokenize() */
#define FTS5_TOKENIZE_QUERY     0x0001
#define FTS5_TOKENIZE_PREFIX    0x0002
#define FTS5_TOKENIZE_DOCUMENT  0x0004
#define FTS5_TOKENIZE_AUX       0x0008

/* Flags that may be passed by the tokenizer implementation back to FTS5 ** as the third argument to the supplied xToken  callback. */
#define FTS5_TOKEN_COLOCATED    0x0001      /* Same position as prev. token */
Copy the code

The implementation is registered with the FTS5 module by calling the xCreateTokenizer method of the FTS5_API object. If you already have one with the same name, replace it. If a non-null xDestroy parameter is passed to the xCreateTokenizer method, when the database handle is closed or when the table is replaced, It (the xCreateTokenizer method) is called (with a copy of the pContext pointer passed as the only argument).

If successful, the xCreateTokenizer method returns an SQLITE_OK. Otherwise, it returns an SQLITE error code. The xDestroy function is not called in this case.

When an FTS5 table uses a custom word spliter, FTS5 Core calls the xCreate() method to create a word spliter, xTokenize() is then processed zero or more times on the string, and xDelete() frees up any resources allocated by xCreate().

To be more specific:

  • xCreate

This function allocates and initializes an instance of Tokenizer. An instance of Tokenizer is necessary for true Tokenize text.

The first argument passed to this function is a copy of the (void*) pointer provided by the application when the Fts5_tokenizer object is registered with FTS5.

Balabalabala…

  • xDelete

This method is called to remove a tokenizer handle previously allocated by xCreate(). Fts5 ensures that every time xCreate() is successfully called, this method will be called only once.

  • xTokenize

(We’ll focus on that.)

This method is expected to handle an nTextbyte sized string passed in by the pText argument (see above). The string pText may or may not end in nuL. (Terminated translation works like this.)

The first argument passed to the function is a pointer to the Fts5Tokenizer object returned from the previous call to the xCreate() method.

The second parameter indicates the reason for the word segmentation of the supplied text requested by FTS5. This is always one of the following four values.

FTS5_TOKENIZE_DOCUMENT

A document is being inserted into (or removed from) FTS. Tokenizer is being called to determine the set of Tokens to add or remove from the FTS Index.

FTS5_TOKENIZE_QUERY

A MATCH query is being executed against FTS Index. Tokenizer is being called to tokenize a specified bare-word or quoted string as part of the query.

(FTS5_TOKENIZE_QUERY | FTS5_TOKENIZE_PREFIX)

As with FTS5_TOKENIZE_QUERY, the difference is that bare or quoted strings are followed by an * character, indicating that the last token returned by tokenizer will be treated as a token prefix.

FTS5_TOKENIZE_AUX

Tokenizer is being called to satisfy the fts5_api.xtokenize () request made by the helper function. Or an fts5_api.xcolumnsize () request made on a database with columnsize=0.

The provided callback function xToken() must be called for each token in the input string. Its first argument should be a copy of the pointer as the second argument passed to the xTokenize() method. (pCtx)

The third and fourth arguments are a pointer to the buffer containing the token text and the size of the token in bytes. (pToken and nToken)

The last two arguments (iStart and iEnd) are the byte offset, the offset in the input text, and the offset at which it ends.

The second argument (tflags) passed to the xToken() function should normally be set to 0. The exception is when Tokenizer supports synonyms. See the detailed discussion below in this case.

FTS5 considers the xToken() callback function to be called for each token, in the order in which they appear in the input text.

If an xToken() callback returns any value other than SQLITE_OK, then tokenization should be discarded, and the xTokenize() method should like to return a copy of the xToken() return value. Alternatively, xTokenize() should return SQLITE_OK if the input buffer runs out. Finally, if an error occurs in the implementation of xTokenize() itself, it may discard tokenization and return any error code that is not SQLITE_OK or SQLITE_DONE.

1.1.1 Synonyms support

Custom word segmentation can also support synonyms. Consider the case where a user wants to search for a phrase such as “first place”. Using the built-in segmentation, a FTS5 search for “first + place” will match the instance of “first place” in the document set, but not its alternative form “1st place”. In some applications, it is better to match all of these forms, no matter what query text the user specifies.

In FTS5, there are several ways to do this:

  1. By mapping all synonyms to a single token. In this case, using the example above, means that the tokenizer returns the same token for both “first” and “1st” inputs. That is, the token is actually “first”, so that when the user inserts a text “I won 1st place”, the head is appended to the index like this: “I “,”won”,”first”,”place”. If the user queries for “1st + place”, the segmentation will replace “1st” with “first” and the query will proceed as expected.

  2. By querying the index of all synonyms for each term separately. In this case, the tokenizer may provide multiple synonyms for a term in the document when the query text is being segmented. FTS5 then queries index individually for each synonym. For example, faced with a query like this:

    • . MATCH 'first place'

    The tokenizer will provide “1st” and “first” as synonyms to give the “first” token in the MATCH query. And FTS5 actually executes a query like this:

    • . MATCH '(first OR 1st) place'

    Other than that, the query will still contain only two phrases -“first OR 1st” is treated as a single phrase – for auxiliary function purposes.

  3. By adding multiple synonyms for a term to the FTS index. In this way, when using word segmentation for document text, the tokenizer provides multiple synonyms for each token. So when a document such as “I won first place” is segmented, the heading is thus added to the FTS index “I “,”won”,”first”,”1st”,”place”. (more “1st” than the first method). This approach, even if the tokenizer does not provide synonyms for the query text segmentation (it should not. It doesn’t matter if the user queries “first + place” or “1st + place” because all forms of the “first” token are matched in FTS Index.

(pH simple understanding: Firstly, there are two word segmentation processes: word segmentation for document and word segmentation for query text (user input). FTS index will be added after word segmentation for document, and query will be executed after word segmentation for query text. Method 1 is to make synonym mapping in the word segmentation of the document, directly map all synonyms to a term, and use this mapping to replace the query text when looking up. Method 2 is to query all synonyms in FTS index during query. Method 3 is to add synonyms directly to FTS index.)

Any call to xToken() that specifies a “tflags” parameter with the FTS5_TOKEN_COLOCATED bit is considered to provide a synonym for the preceding token, whether it is parsing a document or querying text.

For example, when parsing the document “I won first place”, a synonym-supporting tokenizer calls the xToken() function five times, like this:

xToken(pCtx, 0."i".1.0.1);
xToken(pCtx, 0."won".3.2.5);
xToken(pCtx, 0."first".5.6.11);
xToken(pCtx, FTS5_TOKEN_COLOCATED, "1st".3.6.11);
xToken(pCtx, 0."place".5.12.17);
Copy the code

(This also helps us to analyze the parameters of xToken(). The fourth parameter is the length, followed by the start and end positions.

An error occurs when xToken() is first invoked specifying the FTS5_TOKEN_COLOCATED flag. Multiple synonyms may be assigned to a token by calling xToken multiple times in sequence (FTS5_TOKEN_COLOCATED). There is no limit to the number of synonyms that can be provided for a token.

In many cases, method 1 mentioned above is the best way to do this. It does not need to add additional data to FTS index or require FTS5 to query multiple terms, so it is efficient in terms of disk space and query speed. However, it does not support prefix queries well. If, as suggested above, the tokenizer replaces “1st” with “first”, then query:

. MATCH '1s*'

This will not match documents that contain the token “1st”. (Because the participle may not map “1st” to any prefix of “first”)

For full prefix support, method 3 May be preferred. In this case, prefix queries like FI * and 1s* will be correctly matched because index contains both “first” and “1st”. However, this method uses more space in the database because of the additional lexical headings added to the FTS index.

Approach 2 provides a compromise between approach 1 and Approach 3. In this way, a query like 1s* will match a document that contains Token1st, not first (assuming the participle can’t provide prefix synonyms). However, a non-prefix query like “1st” will match “1st” and “first”. This method requires no additional disk space, and no additional headers are added to the FTS index. On the other hand, it may require more CPU running to run the MATCH query, since a separate query of FTS index is necessary for each synonym.

When using methods 2 or 3, it is important that the tokenizer only provides synonyms for document text segmentation (method 2’s actions) or query text segmentation (method 3’s actions), not all of them.

It doesn’t cause any mistakes, but it’s inefficient.

1.2 Customize auxiliary functions

Implementing a custom helper function is similar to implementing a scalar SQL function. The implementation should be a C function of type fts5_extension_function. The definition is as follows:

typedef struct Fts5ExtensionApi Fts5ExtensionApi;
typedef struct Fts5Context Fts5Context;
typedef struct Fts5PhraseIter Fts5PhraseIter;

typedef void (*fts5_extension_function)(
  const Fts5ExtensionApi *pApi,   /* API offered by current FTS version */
  Fts5Context *pFts,              /* First arg to pass to pApi functions */
  sqlite3_context *pCtx,          /* Context for returning result/error */
  int nVal,                       /* Number of values in apVal[] array */
  sqlite3_value **apVal           /* Array of trailing arguments */
);
Copy the code

The implementation is registered with the FTS5 module by calling the xCreateFunction() method of the fts5_API object. If there is already an auxiliary function with the same name, it is replaced by the new function. If a non-null xDestroy argument is passed to xCreateFunction(), it is called (with a copy of the pContext pointer as the only argument) when the database handle is closed or when the registration helper function is replaced.

If successful, the xCreateFunction() function returns SQLITE_OK. Otherwise, it returns an SQLITE error code. XDestroy () is not called in this case.

The last three parameters passed to the auxiliary function callback are similar to those passed when implementing a scalar SQL function. All parameters except the first one passed to the helper function are available to the implementation in the apVal array. The implementation should return a result or error via the content handle pCtx.

The first argument passed to the auxiliary function callback is a pointer to an adjacent structure that contains a function that can get information about the current query or row. The second argument is an opaque handle that should be passed as the first argument to any such method call. For example, the following auxiliary function definition returns the total number of tokens in all columns of the current row.

/*
** Implementation of an auxiliary function that returns the number
** of tokens in the current row (including all columns).
*/
static void column_size_imp(
  const Fts5ExtensionApi *pApi,
  Fts5Context *pFts,
  sqlite3_context *pCtx,
  int nVal,
  sqlite3_value **apVal
){
  int rc;
  int nToken;
  rc = pApi->xColumnSize(pFts, - 1, &nToken);
  if( rc==SQLITE_OK ){
    sqlite3_result_int(pCtx, nToken);
  }else{ sqlite3_result_error_code(pCtx, rc); }}Copy the code

The following sections describe the apis that provide implementation details for helper functions. More examples can be found in the source file “fts5_aux.c”.

1.2.1 API reference for customization of auxiliary functions

struct Fts5ExtensionApi {
  int iVersion;                   /* Currently always set to 3 */

  void *(*xUserData)(Fts5Context*);

  int (*xColumnCount)(Fts5Context*);
  int (*xRowCount)(Fts5Context*, sqlite3_int64 *pnRow);
  int (*xColumnTotalSize)(Fts5Context*, int iCol, sqlite3_int64 *pnToken);

  int (*xTokenize)(Fts5Context*, 
    const char *pText, int nText, /* Text to tokenize */
    void *pCtx,                   /* Context passed to xToken() */
    int (*xToken)(void*, int.const char*, int.int.int)       /* Callback */
  );

  int (*xPhraseCount)(Fts5Context*);
  int (*xPhraseSize)(Fts5Context*, int iPhrase);

  int (*xInstCount)(Fts5Context*, int *pnInst);
  int (*xInst)(Fts5Context*, int iIdx, int *piPhrase, int *piCol, int *piOff);

  sqlite3_int64 (*xRowid)(Fts5Context*);
  int (*xColumnText)(Fts5Context*, int iCol, const char **pz, int *pn);
  int (*xColumnSize)(Fts5Context*, int iCol, int *pnToken);

  int (*xQueryPhrase)(Fts5Context*, int iPhrase, void *pUserData,
    int(*) (const Fts5ExtensionApi*,Fts5Context*,void*));int (*xSetAuxdata)(Fts5Context*, void *pAux, void(*xDelete)(void*));
  void *(*xGetAuxdata)(Fts5Context*, int bClear);

  int (*xPhraseFirst)(Fts5Context*, int iPhrase, Fts5PhraseIter*, int*, int*);
  void (*xPhraseNext)(Fts5Context*, Fts5PhraseIter*, int *piCol, int *piOff);

  int (*xPhraseFirstColumn)(Fts5Context*, int iPhrase, Fts5PhraseIter*, int*);
  void (*xPhraseNextColumn)(Fts5Context*, Fts5PhraseIter*, int *piCol);
};
Copy the code

There are a whole bunch of functions mentioned here, and we won’t translate every one of them in detail, just the important ones.

  • int (*xInst)(Fts5Context*, int iIdx, int *piPhrase, int *piCol, int *piOff);

Query the iIdx details of the phrase matching the current line. Phrase matches are numbered from zero, so the iIdx argument should be greater than or equal to zero and less than the value output by xInstCount().

In general, the output parameter *piPhrase is set to the phrase number, *piCol is set to the column in which it is located, and *piOff is set to the tag offset of the first tag of the phrase. SQLITE_OK is returned on success, and an error code (SQLITE_NOMEM) is returned on error.

This API can be very slow if you create FTS5 tables using the “detail=none” or “detail=column” options.

  • int (*xPhraseSize)(Fts5Context*, int iPhrase);

Return the number of tokens in iPhrase. Phrases are numbered from zero.

  • int (*xInstCount)(Fts5Context*, int *pnInst);

Set *pnInst to the total number of occurrences of all phrases in the current row query. SQLITE_OK is returned on success, and an error code (SQLITE_NOMEM) is returned on error.

This API can be very slow if you create FTS5 tables using the “detail=none” or “detail=column” options. If the FTS5 table is created with the “detail= None “or “detail=column” and “content=” options (for example, if it is a table with no content), then this API always returns 0.

  • int (*xColumnText)(Fts5Context*, int iCol, const char **pz, int *pn);

This function attempts to retrieve the text of the iCol column of the current document. If successful, set (*pz) to point to a buffer containing UTF-8 encoded text, set (*pn) to the buffer’s byte (not character) size, and return SQLITE_OK. Otherwise, if an error occurs, the SQLite error code is returned, and the final values of (*pz) and (*pn) are undefined.

  • int (*xTokenize)(Fts5Context*, const char *pText, int nText, /* Text to tokenize */ void *pCtx, /* Context passed to xToken() */ int (*xToken)(void*, int, const char*, int, int, int) /* Callback */ );

Mark the text with a marker belonging to the FTS5 table.

To write a Loadable extension, you must also look at the Run-time Loadable Extensions in the documentation section of the official website

(The name of the entry function is mentioned.)

Extensions can be loaded at runtime

1. An overview of the

Sqlite has the ability to load extensions (including application-defined SQL functions, etc.) at run time. This feature allows extension code to be separated from the application, and extensions can be developed and tested separately, and then loaded as needed.

Extensions can also be statically connected to applications. The code template shown below will work just as well as the statically linked extension. You should give your entry function sqlite3_extension_init a different name to avoid name conflicts (if your application contains two or more extensions, since this is the default extension entry function name).

2. Load an extension

An SQLite extension is a dynamic library or DLL. To load it, you need to provide Sqlite with the name of the file containing the shared library or DLL and the entry point to initialize the extension. In C code, this information is provided using the SQlite3_load_extension () API. For additional information on this function, see its documentation.

Note that different operating systems have different file name suffixes for shared libraries. If you want the code to be portable, you can omit the suffix in the shared library file name, and the appropriate suffix will be added automatically by the SQlite3_load_extension () interface.

There is also an SQL function that can be used to load extensions: load_extension(X, Y). It works just like the C interface’s SQlite3_load_extension ().

Both extension loading functions allow you to specify the name of an extension entry point. You can leave this parameter blank by passing a NULL pointer to the C interface sqlite3_load_extension() function, or ignore the second parameter of the SQL interface load_extension(X,Y). The extension loader will attempt to resolve its own entry point. It will first try the common extension sqlite3_extension_init method. If that doesn’t work, it will build an entry point named “sqlite3_x_init()”, where the x will be replaced.

It ignores the last/and the preceding part of the path; The first and the rest will also be ignored. All lowercase.

For example, if the file name is “/usr/lib/libmathfun-4.8. so”, the entry name will be “sqlite3_mathfunc_init”.

(pH: -4.8 why was it ignored?)

Extension loading is turned off by default for security reasons. In order to use C interfaces or SQL extender functions, you must first enable extension loading — use the C API sqlite3_db_config(DB, SQLITE_DBCONFIG_ENABLE_LOAD_EXTENSION, 1, NULL) in your application.

From the command line shell, extensions can be loaded like this:

.load ./YourCode
Copy the code

Note that the command-line shell program has extension loading enabled. (by calling the SQlite3_enable_load_extension () interface as part of its setup), so the above command works without any special switches, Settings, or other complications.

The.load command has an argument that calls sqlite3_load_extension(), where the zProc argument is set to NULL, causing SQLite to first look for an entry point named sqlite3_extension_init, Then sqlite3_X_init (where the X name comes from the file name). If your extension has an entry with a different name, simply provide that name as the second argument. Such as:

.load ./YourCode nonstandard_entry_point
Copy the code

Compile a loadable extension

Loadable extensions are C code. To compile them on unix-like operating systems, the usual command looks like this:

gcc -g -fPIC -shared YourCode.c -o YourCode.so
Copy the code

(Build dynamic shared libraries)

Macs are unix-like, too, but they don’t follow the usual shared library management. To compile a shared library on a Mac, use the following command:

gcc -g -fPIC -dynamiclib YourCode.c -o YourCode.dylib
Copy the code

The window directive is not listed here. It’s in the documentation.

4. Extensible programming ★

A template-loadable extension consists of the following three parts.

  1. Use #include

    in the source file header instead of #include

  2. Place the macro instruction SQLITE_EXTENSION_INIT1 on its own line, following the introduction of the header file above.

  3. Add a routine that extends the load entry point, something like this:

    #ifdef _WIN32
    __declspec(dllexport)
    #endif
    int sqlite3_extension_init( /* <== Change this name, maybe */
      sqlite3 *db, 
      char **pzErrMsg, 
      const sqlite3_api_routines *pApi
    ){
      int rc = SQLITE_OK;
      SQLITE_EXTENSION_INIT2(pApi);
      /* insert code to initialize your extension here */
      return rc;
    }
    Copy the code

    Instead of using the generic name “sqlite3_extension_init”, you customize the name of your entry point to correspond to the name of the shared library you will be generating. Giving your extensions a custom entry point name allows you to statically connect two or more extensions to a program of the same name without causing a connection collision if you later decide to use static links instead of runtime connections. If your shared library ends up being named “yourcode.so” or “yourcode.dll” or “yourcode.dylib” as shown in the compiler example above, the correct entry name should be “sqlite3_yourcode.init”.

Here is a complete template extension that you can copy/paste to try.

/* Add your header comment here */
#include <sqlite3ext.h> /* Do not use 
       
        ! * /
       
SQLITE_EXTENSION_INIT1

/* Insert your extension code here */

#ifdef _WIN32
__declspec(dllexport)
#endif
/ *TODO: Change the entry point name so that "extension" is replaced by
** text derived from the shared library filename as follows:  Copy every
** ASCII alphabetic character from the filename after the last "/" through
** the next following ".", converting each character to lowercase, and
** discarding the first three characters if they are "lib".
*/
int sqlite3_extension_init(
  sqlite3 *db, 
  char **pzErrMsg, 
  const sqlite3_api_routines *pApi
){
  int rc = SQLITE_OK;
  SQLITE_EXTENSION_INIT2(pApi);
  /* Insert here calls to ** sqlite3_create_function_v2(), ** sqlite3_create_collation_v2(), ** sqlite3_create_module_v2(), and/or ** sqlite3_vfs_register() ** to register the new features that your extension adds. */
  return rc;
}
Copy the code

4.1 Some examples of extensions

It can be found on the official website, and there seems to be an extended implementation of FTS5.

5. Persistent loadable extensions

The default behavior of a loadable extension is to unload it from memory when the database connection that originally called “sqlite3_load_extension()” breaks.

However, if the initialization process returns SQLITE_OK_LOAD_PERMANENTLY instead of SQLITE_OK, the extension will not be unloaded and will remain in memory indefinitely.

To clarify: an initialization function that reverted SQLITE_OK_LOAD_PERMANENTLY persists in memory after the database connection is close. However, extensions are not automatically registered with subsequent database connections. This makes it possible to implement new extensions for loading. To permanently load and register an extension that implements a new SQL function, sort sequence, or virtual table, while these added functions are available for all subsequent database connections, the initialization routine should also call sqlite3_auto_extension() on the child functions that will register these services.

For details, see vfsstat.c, an example file provided with the official documentation

Custom SQL functions (Brief)

Applications using SQLite can define custom SQL functions that call back to the application code to evaluate the results. Custom SQL function implementations can be embedded in the application code itself, or they can be loadable extensions.

The sqlite3_create_function() family of interfaces is used to create new custom SQL functions.

Let’s go straight to the function:

int sqlite3_create_function(
  sqlite3 *db,
  const char *zFunctionName,
  int nArg,
  int eTextRep,
  void *pApp,
  void (*xFunc)(sqlite3_context*,int,sqlite3_value**),
  void (*xStep)(sqlite3_context*,int,sqlite3_value**),
  void (*xFinal)(sqlite3_context*)
);
Copy the code

The first argument is the handle DB to sqlite3

The second argument is a string, the name of the function we want to register, like “phslicer”

The third argument, nArg, is the number of arguments the function takes. The value is an integer between -1 SQLITE_MAX_FUNCTION_ARG (127 by default). A value of -1 means that the SQL function is a variable parameter function that can take any number of arguments from 0 SQLITE_MAX_FUNCTION_ARG.

The fourth parameter, eTextRep, conveys various attributes about the new function, covering issues such as text encoding.

The fifth argument, pApp, is an arbitrary pointer to the callback routine passed in. SQLite does nothing to it itself, just makes the pointer available to the callback function and passes it to the destructor if the function is unregistered.

**xFunc is the argument implementation we passed in. ** It takes three arguments.

The first context, a pointer to an opaque object that describes the content of the CALL to the SQL function. The context point becomes the first argument to many routines that function implementations may call.

The second and third arguments argc and argv are the number of arguments in the SQL function itself and the value of each argument (as argc and argv are in the main function). Parameter values can be of any type and are therefore stored in an instance of the SQLITe3_value object. You can use the SQlite3_value () family of interfaces to extract specific C language values from this object.

static void mysqlfunc(
  sqlite3_context *context,
  int argc,
  sqlite3_value **argv
){
  assert( argc==1 );
  sqlite3_result_value(context, argv[0]);
}
Copy the code

demo

int demoTokenize(Fts5Tokenizer*, void *pCtx, int flags, const char *pText, int nText, int (*xToken)(void *, int, Const char *, int, int, int)) {// int rc = SQLITE_OK; int start = 0; int index; std::string res; for (index = 0; index < nText; Index ++) {if (isspace(pText[index])) {// Truncate res.clear(); // Assign STD ::copy(pText + start, pText + index, STD ::back_inserter(res)); STD ::transform(res.begin(), res.end(), res.begin(), [](unsigned char c){return STD ::tolower(c); }); Rc = xToken(pCtx, 0, res.c_str(), res.length(), start, index); Start start = index + 1; } } return SQLITE_OK; }Copy the code