Discarding your preconceptions, SQLite is not a toy database

Whether you’re a developer, data analyst, QA engineer, DevOps person, or product manager –SQLite is the perfect tool for you.

There are a few well-known facts to begin with.

SQLite is the most common DBMS in the world, supported by all popular operating systems.
SQLite is serverless.
For developers, SQLite can be embedded directly into applications.
For others, there is a handy database console (REPL) that comes in the form of a single file (SQlite3.exe on Windows, SQlite3 on Linux/macOS).

Console, import and export

The console is SQLite’s killer for data analysis: more powerful than Excel and simpler than PANDAS. People can import CSV data with a single command, and tables are created automatically.

> .import --csv city.csv city
> select count(*) from city;
1117
Copy the code

The console supports basic SQL functionality and displays query results in a nice ASCIi-drawn table. Advanced SQL functionality is also supported, but more on that later.

select
  century || ' century' as dates,
  count(*) as city_count
from history
group by century
order by century desc;
Copy the code

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ dates │ city_count │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ 21st century 1 │ │ │ 20 century 263 │ │ │ 19 century │ 189 │ 18 century │ 191 │ 17 century │ 137 │... │... │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

Data can be exported as SQL, CSV, JSON, or even Markdown or HTML. All it takes is a few commands.

.mode json
.output city.json
select city, foundation_year, timezone from city limit 10;
.shell cat city.json
Copy the code

[
    { "city": "Amsterdam", "foundation_year": 1300, "timezone": "UTC+1" },
    { "city": "Berlin", "foundation_year": 1237, "timezone": "UTC+1" },
    { "city": "Helsinki", "foundation_year": 1548, "timezone": "UTC+2" },
    { "city": "Monaco", "foundation_year": 1215, "timezone": "UTC+1" },
    { "city": "Moscow", "foundation_year": 1147, "timezone": "UTC+3" },
    { "city": "Reykjavik", "foundation_year": 874, "timezone": "UTC" },
    { "city": "Sarajevo", "foundation_year": 1461, "timezone": "UTC+1" },
    { "city": "Stockholm", "foundation_year": 1252, "timezone": "UTC+1" },
    { "city": "Tallinn", "foundation_year": 1219, "timezone": "UTC+2" },
    { "city": "Zagreb", "foundation_year": 1094, "timezone": "UTC+1" }
]
Copy the code

If you use BI tools more often than a console: Popular data exploration tools such as Metabase, Redash, and Superset all support SQLite.

Local JSON

Nothing is more convenient than SQLite for analyzing and transforming JSON. You can select data directly from the file as if it were a normal table. Or import the data into a table and select from it.

select
  json_extract(value, '$.iso.code') as code,
  json_extract(value, '$.iso.number') as num,
  json_extract(value, '$.name') as name,
  json_extract(value, '$.units.major.name') as unit
from
  json_each(readfile('currency.sample.json'))
;
Copy the code

┌ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ code │ num │ name │ unit │ ├ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ ARS │ │ 032 Argentine peso | peso │ │ CHF │ │ 756 Swiss Franc │ Franc │ │ │ │ EUR 978 Euro │ Euro │ │ GBP 826 │ │ British Pound │ Pound │ INR │ 356 │ Indian Rupee │ Rupee │ JPY │ 392 │ Japanese yen │ yen │ MAD │ 504 Moroccan Dirham │ Dirham │ RUR │ 643 │ Russian Rouble │ Rouble │ SOS │ 706 │ Somali Shilling │ Shilling │ USD │ 840 │ US Dollar │ Dollar │ └ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

No matter how deep the JSON is nested – you can extract any nested object.

select
  json_extract(value, '$.id') as id,
  json_extract(value, '$.name') as name
from
  json_tree(readfile('industry.sample.json'))
where
  path like '$[%].industries'
;
Copy the code

┌ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ id name │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ │ 7.538 Internet provider │ │ 7.539 │ IT consulting │ 7.540 │ Software development │ 9.399 │ Mobile communication │ 9.400 │ Fixed communication │ 43.641 9.401 │ Fiber - called │ │ │ Audit │ │ │ 43.646 Insurance │ │ │ 43.647 Bank │ └ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

CTEs and set operations

Of course, SQLite supports generic table expressions (WITH clauses) and JOINs, which I won’t illustrate here. If the data is hierarchical (tables refer to themselves by columns like parent_id) –WITH RECURSIVE will come in handy. Any hierarchy, no matter how deep, can be “expanded” with a query.

with recursive tmp(id, name, level) as (
  select id, name, 1 as level
  from area
  where parent_id is null

  union all

  select
    area.id,
    tmp.name || ', ' || area.name as name,
    tmp.level + 1 as level
  from area
    join tmp on area.parent_id = tmp.id
)

select * from tmp;
Copy the code

┌ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ id name │ level │ ├ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ┤ │ │ US 93 1 │ │ │ │ US 768, Washington DC 2 │ │ │ │ US 1833, Washington 2 │ │ │ │ US 2987, Washington, Bellevue 3 │ │ │ │ 3021 US, Washington, Everett 3 │ │ │ │ 3039 US, Washington, Kent 3 │ │ │... │... │... │ └ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

The Set? No problem: UNION,INTERSECT,EXCEPT.

select employer_id
from employer_area
where area_id = 1

except

select employer_id
from employer_area
where area_id = 2;
Copy the code

Calculate one column based on the other columns? Enter the generated columns.

Alter table vacancy add column salary_net INTEGER as (case when salary_gross = true then round(salary_from/1.04) else salary_from end );Copy the code

The generated columns can be queried in the same way as “normal” columns.

select
  substr(name, 1, 40) as name,
  salary_net
from vacancy
where
  salary_currency = 'JPY'
  and salary_net is not null
limit 10;
Copy the code

Mathematical statistics

Descriptive statistics? Simple: mean, median, percentile, standard deviation, whatever you want. You have to load an extension, but it’s also a single command (and a file).

.load sqlite3-stats

select
  count(*) as book_count,
  cast(avg(num_pages) as integer) as mean,
  cast(median(num_pages) as integer) as median,
  mode(num_pages) as mode,
  percentile_90(num_pages) as p90,
  percentile_95(num_pages) as p95,
  percentile_99(num_pages) as p99
from books;
Copy the code

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ┐ │ book_count │ mean │ median │ mode of p90 │ │ p95 │ p99 │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ┤ │ │ │ │ │ │ 640 256 295 349 1483 817 1199 │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┘Copy the code

Notes on the extension. SQLite lacks a lot of functionality compared to other DBMSS such as PostgreSQL. But they’re easy to add, and that’s what people do — so the result is quite a mess.

Therefore, I decided to make a consistent set of extensions, divided by domain area and compiled for the major operating systems. There are still few people there, but more are on their way.

sqlean @ GitHub

More on the joys of statistics. You can draw data distribution directly from the console. Look how cute it is.

with slots as (
  select
    num_pages/100 as slot,
    count(*) as book_count
  from books
  group by slot
),
max as (
  select max(book_count) as value
  from slots
)
select
  slot,
  book_count,
  printf('%.' || (book_count * 30 / max.value) || 'c', '*') as bar
from slots, max
order by slot;
Copy the code

┌ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ slot │ book_count │ bar │ ├ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ │ │ 116 0 * * * * * * * * * │ │ │ │ 254 1 * * * * * * * * * * * * * * * * * * * * │ │ │ 2 │ 376 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * │ │ │ │ 285 3 * * * * * * * * * * * * * * * * * * * * * * │ │ │ │ 184 4 * * * * * * * * * * * * * * │ │ │ │ 90 5 6 * * * * * * * │ │ │ │ 54 * * * * │ │ │ │ 41 7 * * * │ │ │ │ 31 8 * * │ │ │ │ │ 15 * 9 10 11 │ x │ │ │ │ 11 12 │ x │ │ │ │ │ 2 * 12 │ └ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

performance

SQLite can handle billions of records just fine. Normal INSERT, on my laptop, shows about 240,000 records per second. If you join the CSV file as a virtual table (with an extension) – inserts are twice as fast.

.load sqlite3-vsv

create virtual table temp.blocks_csv using vsv(
    filename="ipblocks.csv",
    schema="create table x(network text, geoname_id integer, registered_country_geoname_id integer, represented_country_geoname_id integer, is_anonymous_proxy integer, is_satellite_provider integer, postal_code text, latitude real, longitude real, accuracy_radius integer)",
    columns=10,
    header=on,
    nulls=on
);
Copy the code

.timer on
insert into blocks
select * from blocks_csv;

Run Time: real 5.176 user 4.716420 sys 0.403866
Copy the code

select count(*) from blocks;
3386629

Run Time: real 0.095 user 0.021972 sys 0.063716
Copy the code

There is a popular belief among developers that SQLite is unsuitable for the Web because it does not support concurrent access. It’s a myth. In write-first, take-back logging mode (which has been around for a long time), you can have as many concurrent reads as you want. You can have only one current write, but often one is sufficient.

SQLite is the perfect choice for small websites and applications. Sqlite.org uses SQLite as a database and doesn’t bother to optimize (≈200 requests per page). It handles 700,000 visits a month and offers pages faster than 95% of the sites I’ve seen.

Files, charts, and searches

SQLite supports partial index and expression indexing, just like a “large “DBMS. You can index the generated columns and even turn SQLite into a document database. Simply store the raw JSON and index it on the jSON_extract ()-ed column.

create table currency(
  body text,
  code text as (json_extract(body, '$.code')),
  name text as (json_extract(body, '$.name'))
);

create index currency_code_idx on currency(code);

insert into currency
select value
from json_each(readfile('currency.sample.json'));
Copy the code

explain query plan
select name from currency where code = 'EUR';

QUERY PLAN
`--SEARCH TABLE currency USING INDEX currency_code_idx (code=?)
Copy the code

You can also use SQLite as a graphics database. A bunch of complex WITH RECURSIVE, or maybe you’d prefer to add a little Python.

simple-graph @ GitHub

Full-text search out of the box.

create virtual table books_fts
using fts5(title, author, publisher);

insert into books_fts
select title, author, publisher from books;

select
  author,
  substr(title, 1, 30) as title,
  substr(publisher, 1, 10) as publisher
from books_fts
where books_fts match 'ann'
limit 5;
Copy the code

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ author │ title │ publisher │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ Ruby Ann Boxcar │ Ruby Ann 's Down Home Trailer P │ Citadel │ Ruby Ann Boxcar │ Ruby Ann's Down Home Trailer P │ Citadel │ Lynne Ann DeSpelder │ The Last Dance: Encountering D │ McGraw-hil │ Daniel Defoe │ Robinson Crusoe │ Ann Arbor │ Ann Thwaite │ Waiting for the Party: The Lif │ David r. G │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

Perhaps you need an in-memory database for intermediate calculations? One line of Python code.

db = sqlite3.connect(":memory:")
Copy the code

You can even access it from multiple connections.

db = sqlite3.connect("file::memory:? cache=shared")Copy the code

And so much more

There are fancy window functions (as in PostgreSQL). UPSERT,UPDATE FROM, and generate_series(). R – Tree indexes. Regular expressions, fuzzy search, and geography. In terms of functionality, SQLite can compete with any “large “DBMS.

There are also great tools around SQLite. I particularly like Datasette, an open source tool for exploring and publishing SQLite datasets. DBeaver is an excellent open source database IDE that supports the latest VERSION of SQLite.

I hope this article inspires you to try SQLite. Thanks for reading!

Follow on Twitter@ohmypy, learn about the new article 🚀.

antonz.org/sqlite-is-n…

Discarding your preconceptions, SQLite is not a toy database

Console, import and export

Local JSON

CTEs and set operations

Mathematical statistics

performance

Files, charts, and searches

And so much more

Related Posts

The most commonly used Python crawler and data analysis third-party library, bookmarking

💻 senior engineer’s self-cultivation unit test (1) : DAO layer test

When asked about the consistency Hash algorithm, tell the interviewer!