Back in September, the programming world saw the emergence of a group calledjson1.cThis file has been in the SQLite library until now. In addition, the author has also summarized by using newjson1Extensions to compile pysqliteSkills. But now withSQLite 3.9.0The release of users have no longer cost so much effort.

SQLite 3.9.0 has a big upgrade, adding not only the much-anticipated JSON1 extension, but also the new version of the FTS5 extension module with full-text retrieval. The FTS5 extension module improves the performance of complex queries and provides an out-of-the-box BM25 sorting algorithm. The algorithm is also of great significance in the sorting of other related fields. Users can see all the new features by viewing the release notes.

This article focuses on how to add JSON1 and FTS5 extensions to compile SQLite. Here you’ll compile the Python driver using the new version of the SQLite library, as well as taking advantage of the new Python functionality. Since I personally like both Pysqlite and APSW, the following steps will include instructions for setting up both. Finally, the query will be performed in the Peewee ORM through the JSON1 and FTS5 extensions.

Introduction to use

Start by getting the new version of SQLite source code, either by using SQLite source management system Fossil or by downloading a compressed image. SQLite uses TCL and AWK for source fusion, so you need to install the following tools before you begin:

  • tcl

  • Awk (available on most UNIX systems)

  • Fossil (optional)

This process involves several steps, which are as detailed as possible here. First, you need to assign a new directory to the new library. I put it in ~/bin/jqlite, which you can choose according to your personal preference.

export JQLITE="$HOME/bin/jqlite"
mkdir -p $JQLITE
cd $JQLITE

To retrieve the source code from Fossil, run the following command:

fossil clone http://www.sqlite.org/cgi/src sqlite.fossil
fossil open sqlite.fossil

To get the snapshot file, run the following command:

curl 'https://www.sqlite.org/src/tarball/sqlite.tar.gz?ci=trunk' | tar xz
mv sqlite/* .

If you prefer to use the official version, you can download the autoconf zip from the SQLite download page and unzip the contents into the $JQLITE directory.

Compile SQLite with JSON1 and FTS5

When the code is downloaded, place it in the same directory as the SQLite source tree. SQLite supports a large number of compilation configuration options, and in addition to JSON1 and FTS5, there are many other valid options.

Configure -> make -> make install: configure -> make -> make install

export CFLAGS="-DSQLITE_ENABLE_COLUMN_METADATA=1 \
-DSQLITE_ENABLE_DBSTAT_VTAB=1 \
-DSQLITE_ENABLE_FTS3=1 \
-DSQLITE_ENABLE_FTS3_PARENTHESIS=1 \
-DSQLITE_ENABLE_FTS5=1 \
-DSQLITE_ENABLE_JSON1=1 \
-DSQLITE_ENABLE_RTREE=1 \
-DSQLITE_ENABLE_UNLOCK_NOTIFY \
-DSQLITE_ENABLE_UPDATE_DELETE_LIMIT \
-DSQLITE_SECURE_DELETE \
-DSQLITE_SOUNDEX \
-DSQLITE_TEMP_STORE=3 \
-fPIC"
LIBS="-lm" ./configure --prefix=$JQLITE --enable-static --enable-shared
make
make install

In SQLite3 Source Checkout, there should be a lib/ libSQLite3.a file. If the file does not exist, check the output of the controller to see the error log. I’ve had success with Arch and Ubuntu, but I’m not sure if I’ll succeed with Fapple and Windoze.

Create the pysqlite

Most Python developers are familiar with pysqlite, which is more or less similar to the SQLite3 module in the Python standard library. To create the pysqlite corresponding to libsqlite3, the only thing you need to do is modify the setup. CFG file to point to the include and lib directories you just created.

git clone https://github.com/ghaering/pysqlite cd pysqlite/ cp .. /sqlite3.c . echo -e "library_dirs=$JQLITE/lib" >> setup.cfg echo -e "include_dirs=$JQLITE/include" >> setup.cfg LIBS="-lm" python setup.py build_static

To test the installation, go to the build/lib.linux-xfoobar/ directory, start the Python interpreter, and run the following command:

>>> from pysqlite2 import dbapi2 as sqlite >>> conn = sqlite.connect(':memory:') >>> conn.execute('CREATE VIRTUAL TABLE testing USING fts5(data); ') <pysqlite2.dbapi2.Cursor object at 0x7ff7d0a2dc60> >>> conn.execute('SELECT json(?) ', (1337,)).fetchone() (u'1337',)

Depending on your mood, you can either run the Python setup.py installation file, or use the newly created pysqlite2 (available at build/lib.linux… / directory) link to $PYTHONPATH. If you want to use both VirtualEnv and $PythonPath, you can first activate VirtualEnv and then run setup.py in your pysqlite directory to install the files.

Create apsw

The steps to create APSW are almost the same as to create Pysqlite.

cd $JQLITE git clone https://github.com/rogerbinns/apsw cd apsw cp .. /sqlite3{ext.h,.h,.c} . echo -e "library_dirs=$SQLITE_SRC/lib" >> setup.cfg echo -e "include_dirs=$SQLITE_SRC/include" >> setup.cfg LIBS="-lm" python setup.py build

To test the new APSW library, change the directory to build/libXXX. Start the Python interpreter and run the following commands:

>>> import apsw >>> conn = apsw.Connection(':memory:') >>> cursor = conn.cursor() >>> cursor.execute('CREATE VIRTUAL TABLE testing USING fts5(data); ') <apsw.Cursor at 0x7fcf6b17fa80> >>> cursor.execute('SELECT json(?) ', (1337,)).fetchone() (u'1337',)

You can install the new APSW entire system by running the Python setup.py installation file, or by linking to the apsw.so library (available at build/lib.linux… / view) to $PYTHONPATH. If developers want to use both VirtualEnv and APSW, they can first activate VirtualEnv and then run the setup.py installation file in the APSW directory.

Using JSON extensions

There are some neat features in the JSON1 extension, in particular the JSON_TREE and JSON_EACH function/virtual table (details). To demonstrate these new capabilities, this article uses Peewee (a small Python ORM) to write some JSON data and query it.

I intended to get the test data from the GitHub API, but I chose to write a small JSON file (details) to show the minimal verbosity of this feature. Its structure is as follows:

[{
   "title": "My List of Python and SQLite Resources",
   "url": "http://charlesleifer.com/blog/my-list-of-python-and-sqlite-resources/", 
   "metadata": {"tags": ["python", "sqlite"]}
 }, 
 {
   "title": "Using SQLite4's LSM Storage Engine as a Stand-alone NoSQL Database with Python"
   "url": "http://charlesleifer.com/blog/using-sqlite4-s-lsm-storage-engine-as-a-stand-alone-nosql-database-with-python/", 
   "metadata": {"tags": ["nosql", "python", "sqlite", "cython"]}
  },
  ...]

If you prefer to view your code in IPython format, see here.

Populating the database

Get the JSON data file and decode it:

>>> import json, urllib2
>>> fh = urllib2.urlopen('http://media.charlesleifer.com/downloads/misc/blogs.json')
>>> data = json.loads(fh.read())
>>> data[0]
{u'metadata': {u'tags': [u'python', u'sqlite']},
 u'title': u'My List of Python and SQLite Resources',
 u'url': u'http://charlesleifer.com/blog/my-list-of-python-and-sqlite-resources/'}

Now we need to tell Peewee how to access our database by saving it into the SQLite database using the custom Pysqlite interface. The newly compiled version of PysqLite2 is used here, and although it is somewhat confused with TojqLite, it does not conflict. After the database class is defined, an in-memory database is created. (Note: PeeWee will automatically use PysqLite2 in the upcoming 2.6.5 version if it is compiled with a newer version than sqlite3.)

>>> from pysqlite2 import dbapi2 as jqlite
>>> from peewee import *
>>> from playhouse.sqlite_ext import *
>>> class JQLiteDatabase(SqliteExtDatabase):
...     def _connect(self, database, **kwargs):
...         conn = jqlite.connect(database, **kwargs)
...         conn.isolation_level = None
...         self._add_conn_hooks(conn)
...         return conn
...
>>> db = JQLiteDatabase(':memory:')

Populating a database with JSON data is straightforward. Start by creating a generic table with a single TEXT field. At this point, SQLite will not display separate columns/data types of JSON data, so use TextField:

>>> class Entry(Model):
...     data = TextField()
...     class Meta:
...         database = db
... 
>>> Entry.create_table()
>>> with db.atomic():
...     for entry_json in data:
...         Entry.create(data=json.dumps(entry_json))
...

The function of the JSON

Let’s start with json_extract(). It describes the element to be found by a dot/parenthesis path (Postgres uses []). Each Entry in the database contains a single data column, and each data column contains a JSON object. Each JSON object contains a title, a URL, and a top-level metadata key. Here is the code to extract the title of the work:

>>> title = fn.json_extract(Entry.data, '$.title')
>>> query = (Entry
...          .select(title.alias('title'))
...          .order_by(title)
...          .limit(5))
...
>>> [row for row in query.dicts()]
[{'title': u'A Tour of Tagging Schemas: Many-to-many, Bitmaps and More'},
 {'title': u'Alternative Redis-Like Databases with Python'},
 {'title': u'Building the SQLite FTS5 Search Extension'},
 {'title': u'Connor Thomas Leifer'},
 {'title': u'Extending SQLite with Python'}]

Corresponding to the query created by the following SQL:

SELECT json_extract("t1"."data", '$.title') AS title 
FROM "entry" AS t1 
ORDER BY json_extract("t1"."data", '$.title')
LIMIT 5

In the following example, you extract the entries that contain specific tags. Search the list of tags using the json_each() function. This function is similar to a table and returns the filtered specified JSON path. Here’s how to retrieve an entry titled “SQLite.”

>>> from peewee import Entity
>>> tags_src = fn.json_each(Entry.data, '$.metadata.tags').alias('tags')
>>> tags_ref = Entity('tags')

>>> query = (Entry
...          .select(title.alias('title'))
...          .from_(Entry, tags_src)
...          .where(tags_ref.value == 'sqlite')
...          .order_by(title))
... 
>>> [row for row, in query.tuples()]
[u'Building the SQLite FTS5 Search Extension',
 u'Extending SQLite with Python',
 u'Meet Scout, a Search Server Powered by SQLite',
 u'My List of Python and SQLite Resources',
 u'Querying Tree Structures in SQLite using Python and the Transitive Closure Extension',
 u"Using SQLite4's LSM Storage Engine as a Stand-alone NoSQL Database with Python",
 u'Web-based SQLite Database Browser, powered by Flask and Peewee']

The SQL of the above query helps clarify the process:

SELECT json_extract("t1"."data", '$.title') AS title 
FROM
    "entry" AS t1, 
    json_each("t1"."data", '$.metadata.tags') AS tags 
WHERE ("tags"."value" = 'sqlite') 
ORDER BY json_extract("t1"."data", '$.title')

As queries become more complex, you can encapsulate them using PeeWee objects to make them more useful and to make your code reusable.

Here is another example of json_each(). This time we filter the title in each entry and create a string of relevant labels separated by commas. The tags_src and tags_ref defined above will be used again.

>>> query = (Entry
...          .select(
...              title.alias('title'),
...              fn.group_concat(tags_ref.value, ', ').alias('tags'))
...          .from_(Entry, tags_src)
...          .group_by(title)
...          .limit(5))
...
>>> [row for row in query.tuples()]
[(u'A Tour of Tagging Schemas: Many-to-many, Bitmaps and More',
  u'peewee, sql, python'),
 (u'Alternative Redis-Like Databases with Python',
  u'python, walrus, redis, nosql'),
 (u'Building the SQLite FTS5 Search Extension',
  u'sqlite, search, python, peewee'),
 (u'Connor Thomas Leifer', u'thoughts'),
 (u'Extending SQLite with Python', u'peewee, python, sqlite')]

For clarity, here is the corresponding SQL query:

SELECT 
    json_extract("t1"."data", '$.title') AS title, 
    group_concat("tags"."value", ', ') AS tags 
FROM 
    "entry" AS t1, 
    json_each("t1"."data", '$.metadata.tags') AS tags 
GROUP BY json_extract("t1"."data", '$.title') 
LIMIT 5

The last feature I’ll introduce is json_tree(). Like json_each(), json_tree() is also a multi-valued function, similar to a table. Unlike json_each(), which returns only the children of a particular path, json_tree() recursively traverses all objects and returns all children.

If the tag key is nested anywhere in an entry, here’s how to match a given tag entry:

>>> tree = fn.json_tree(Entry.data, '$').alias('tree')
>>> parent = fn.json_tree(Entry.data, '$').alias('parent')

>>> tree_ref = Entity('tree')
>>> parent_ref = Entity('parent')

>>> query = (Entry
...          .select(title.alias('title'))
...          .from_(Entry, tree, parent)
...          .where(
...              (tree_ref.parent == parent_ref.id) &
...              (parent_ref.key == 'tags') &
...              (tree_ref.value == 'sqlite'))
...          .order_by(title))
...
>>> [title for title, in query.tuples()]
[u'Building the SQLite FTS5 Search Extension',
 u'Extending SQLite with Python',
 u'Meet Scout, a Search Server Powered by SQLite',
 u'My List of Python and SQLite Resources',
 u'Querying Tree Structures in SQLite using Python and the Transitive Closure Extension',
 u"Using SQLite4's LSM Storage Engine as a Stand-alone NoSQL Database with Python",
 u'Web-based SQLite Database Browser, powered by Flask and Peewee']

In the code above, Entry itself is selected, along with a binary tree that represents the child nodes of that Entry. Because each tree node contains a reference to the parent node, we can easily search for a parent node named “tag” that contains child nodes with the value “sqlite”. Here is the SQL implementation statement:

SELECT json_extract("t1"."data", '$.title') AS title 
FROM 
    "entry" AS t1, 
    json_tree("t1"."data", '$') AS tree, 
    json_tree("t1"."data", '$') AS parent 
WHERE (
    ("tree"."parent" = "parent"."id") AND 
    ("parent"."key" = 'tags') AND 
    ("tree"."value" = 'sqlite')) 
ORDER BY json_extract("t1"."data", '$.title')

This is just one aspect of the extended functionality of JSON1, and more will be tried out in the coming weeks. Leave me a comment here, or email sqlite-users if you have specific questions about this extension.

FTS5 with Python

The code in this section is the code from the previous JSON example, where you use the titles of the Entry data files and populate the search index with them. Version 2.6.5 of Peewee will include the FTS5Model feature, which is currently available on the main GitHub branch.

Going back to the previous JSON example, create another table that will serve as the query index for the Entry data.

The FTS5 extension requires that all columns contain no types or constraints. The only additional information used to represent a column is no index, meaning only data can be stored and no data queries can be made.

Define a query index on the Entry model to determine the related URLs by the query title. To do this, you need to define the URL field as unindexed.

class EntryIndex(FTS5Model): title = SearchField() url = SearchField(unindexed=True) class Meta: Database = db options = {'tokenize': 'porter', 'prefix': '2,3'} entryIndex.create_table ()

For the FTS5 extension, the optional dictionary provides additional metadata for marking fields, as well as quick prefix lookup by prefix length storage. SQL > create a table using the following statement:

CREATE VIRTUAL TABLE "entryIndex" USING fts5 ("title", "url" UNINDEXED, prefix=2,3, tokenize=porter)

To populate the index, a pair of JSON functions will be used to copy data from the Entry model:

title = fn.json_extract(Entry.data, '$.title').alias('title')
url = fn.json_extract(Entry.data, '$.url').alias('url')
query = Entry.select(title, url).dicts()
with db.atomic():
    for entry in query:
        EntryIndex.create(**entry)

After the index is populated, do some queries:

>>> query = EntryIndex.search('sqlite').limit(3)
>>> for result in query:
...     print result.title

Extending SQLite with Python
Building the SQLite FTS5 Search Extension
My List of Python and SQLite Resources

The SQL statement that implements the above query is:

SELECT "t1"."title", "t1"."url" 
FROM "entryindex" AS t1 
WHERE ("entryindex" MATCH 'sqlite') 
ORDER BY rank

You can also retrieve the results after the query:

>>> query = EntryIndex.search('sqlite AND python', with_score=True) >>> for result in query: ... print round(result.score, 3), Extend extend extend extend extend extend extend extend extend extend extend extend extend extend extend extend extend Structures in SQLite using Python and the Transitive Closure Extension

These results are very accurate and the SQL statement used for the above query is as follows:

SELECT "t1"."title", "t1"."url", rank AS score 
FROM "entryindex" AS t1 
WHERE ("entryindex" MATCH 'sqlite AND python') 
ORDER BY rank

This article has provided a brief overview of the simple functionality of the FTS5 extension, but if users query the documentation, they will find much more powerful functionality. Here are some examples:

  • Multi-column indexes that assign different weights when sorting

  • Prefix queries, recitals, keywords for adjacent lines

  • The above query type is combined with the Boolean operator

  • The Unicode61 default encoding converter and Porter resolver are disallowed

  • A new C API for defining sorting functions and word breaking.

  • Glossary, used to query the number of words and check the index

Thank you for reading

Adding a JSON extension to SQLite is a good thing for both the project and the user. Both PostgreSQL and MySQL already support JSON data types, and I’m glad SQLite is following in their footsteps. However, the JSON data format is not always necessary, such as the use of a dedicated embedded file repository, UNQLITE, in some cases.

The json1.c file is also worth noting. Dr. Hipp mentioned that JSON1.C is only the first step now and there is room for more development in the future. Therefore, regardless of any issues with the current release, I am confident that both performance and APIs will improve significantly in future releases. Also, I believe he would consider using a more efficient binary format.

It’s good to see SQLite continue to improve itself with the full-text query extension module. Provide users with a built-in algorithm and an API that allows users to add what they want.

The original address: http://charlesleifer.com/blog/using-the-sqlite-json1-and-fts5-extensions-with-python/

OneAPM can do that for youPythonAll aspects of the application can monitor not only the terminal user experience, but also the server monitor performance, as well as support for tracking problems with databases, third-party APIs, and Web servers. To read more technical articles, visitOneAPM Official Tech Blog.

This post is taken from the OneAPM official blog