In a time series workload, applications (such as some real-time applications) query for recent information while archiving old information.

  • Docs.citusdata.com/en/v10.2/sh…

To handle this workload, single-node PostgreSQL databases typically use table partitioning to split a time-sorted big data table into multiple inherited tables, each containing a different time range.

  • www.postgresql.org/docs/curren…

Storing data in multiple physical tables accelerates data expiration. In a single large table, deleting rows incurs the cost of scanning for rows to be deleted and then cleaning up the space. Deleting a partition, on the other hand, is a quick operation that has nothing to do with data size. This is equivalent to simply deleting the files on the disk that contain the data.

Storing data in multiple physical tables speeds up data expiration. In a large table, deleting rows requires scanning to find rows to delete, and then clearing empty space. Deleting a partition, on the other hand, is a quick operation that has nothing to do with data size. This is equivalent to simply deleting the files on the disk that contain the data.

  • www.postgresql.org/docs/curren…

Partitioning the table also makes the indexes for each date range smaller and faster. A query on the most recent data is likely to operate on a hot index suitable for memory. This speeds up the read.

Inserts also have smaller indexes to update, so they are also faster. nn

Time-based partitioning makes most sense when:

  1. Most queries access only a very small subset of the most recent data
  2. Old data expires periodically (delete/discard)

Keep in mind that reading all these partitions hurts overhead more than it helps in the wrong case. But, in the right circumstances, it can be very helpful. For example, keep a year’s time series data and periodically query only the most recent week.

Extend time series data on Citus

We can combine the single-node table partitioning technique with Citus’s distributed sharding to form an extensible time series database. It’s the best of both worlds. It is particularly elegant on top of Postgres’ declarative table partitioning.

For example, let’s distribute and partition a table that contains historical GitHub event data.

  • GitHub event data
    • examples.citusdata.com/events.csv

Each record in this GitHub dataset represents the event created in GitHub, along with key information about the event, such as the event type, the date it was created, and the user who created the event.

The first step is to create and partition a table by time, just as we would in a single-node PostgreSQL database:

-- declaratively partitioned table
CREATE TABLE github_events (
  event_id bigint,
  event_type text,
  event_public boolean,
  repo_id bigint,
  payload jsonb,
  repo jsonb,
  actor jsonb,
  org jsonb,
  created_at timestamp
) PARTITION BY RANGE (created_at);
Copy the code

Note PARTITION BY RANGE (created_at). This tells Postgres that the table will be partitioned in the ordered range by the CREATED_AT column. However, we have not created any partitions for a specific scope.

Before creating a specific partition, let’s distribute the tables in Citus. We will shard by rePO_id, which means events will be aggregated into shards for each repository.

SELECT create_distributed_table('github_events'.'repo_id');
Copy the code

Citus has now created shards for this table across the working nodes. Internally, each shard is a table, and each shard identifier N is named github_events_N. In addition, Citus propagates Partition information, declaring Partition key: RANGE (created_AT) for each shard.

A partitioned table cannot contain data directly; it is more like a view across partitions. Therefore, shards are not ready to save data. We need to create partitions and specify their time ranges, after which we can insert data that matches the ranges.

Automatic Partition Creation

Citus provides helper functions for partition management. We can create a batch of monthly partitions using create_time_partitions() :

SELECT create_time_partitions(
  table_name         := 'github_events',
  partition_interval := '1 month',
  end_at             := now() + '12 months'
);
Copy the code

Citus also includes a view, time_partitions, to easily investigate partitions it creates.

Over time, you will need to do some maintenance to create new partitions and remove old ones. It is best to set up a regular job to run extended maintenance functions such as PG_cron:

  • pg_cron
    • Github.com/citusdata/p…
-- set two monthly cron jobs:

-- 1. ensure we have partitions for the next 12 months

SELECT cron.schedule('create-partitions'.'0 0 1 * *'$$,SELECT create_time_partitions(
      table_name         := 'github_events',
      partition_interval := '1 month',
      end_at             := now() + '12 months'
  )
$$);

-- 2. (optional) ensure we never have more than one year of data

SELECT cron.schedule('drop-partitions'.'0 0 1 * *'$$,CALL drop_old_time_partitions(
      'github_events',
      now() - interval '12 months' /* older_than */
  );
$$);
Copy the code

Once you set up regular maintenance, you don’t have to worry about partitions anymore, and they work just fine.

Note that native partitions in Postgres are still very new and have some quirks. Maintenance operations on partitioned tables acquire aggressive locks that may temporarily stop queries. There is a lot of work going on in the Postgres community to address these issues, so it is expected that time partitioning in Postgres will only get better.

Use column storage for archives

Some applications’ data is logically divided into a small updatable portion and a larger “frozen” portion. Examples include logs, clickstreams, or sales records. In this case, we can combine partitioning with columnar table storage (introduced in Citus 10) to compress the historical partitions on disk. Citus columnar tables are currently appending only, which means they do not support updates or deletions, but we can use them for immutable history partitions.

  • Column table storage
    • Docs.citusdata.com/en/v10.2/ad…

A partitioned table can be composed of any combination of row and column partitions. With range partitioning on a TIMESTAMP key, we can make a row table of the latest partition and periodically scroll the latest partition into another historical column partition.

Let’s look at an example using GitHub events again. We will create a new table named github_columnar_events to disambiguate the previous example. To fully focus on the columnar storage aspect, we will not distribute this table.

Next, download the sample data:

wget http://examples.citusdata.com/github_archive/github_events-2015-01-01-{0.. 5}.csv.gz gzip -c -d github_events-2015-01-01-*.gz >> github_events.csvCopy the code
-- our new table, same structure as the example in
-- the previous section

CREATE TABLE github_columnar_events ( LIKE github_events )
PARTITION BY RANGE (created_at);

-- create partitions to hold two hours of data each

SELECT create_time_partitions(
  table_name         := 'github_columnar_events',
  partition_interval := '2 hours',
  start_from         := '2015-01-01 00:00:00',
  end_at             := 'the 2015-01-01 08:00:00'
);

-- fill with sample data
-- (note that this data requires the database to have UTF8 encoding)

\COPY github_columnar_events FROM 'github_events.csv' WITH (format CSV)

-- list the partitions, and confirm they're
-- using row-based storage (heap access method)

SELECT partition, access_method
  FROM time_partitions
 WHERE parent_table = 'github_columnar_events'::regclass;
Copy the code

-- convert older partitions to use columnar storage

CALL alter_old_partitions_set_access_method(
  'github_columnar_events'.'the 2015-01-01 06:00:00' /* older_than */.'columnar'
);

-- the old partitions are now columnar, while the
-- latest uses row storage and can be updated

SELECT partition, access_method
  FROM time_partitions
 WHERE parent_table = 'github_columnar_events'::regclass;
Copy the code

To see the compression rate of a columnar table, use VACUUM VERBOSE. The compression ratio for our three columnar partitions is pretty good:

VACUUM VERBOSE github_columnar_events;
Copy the code
INFO:  statistics for "github_columnar_events_p2015_01_01_0000":
storage id: 10000000003
total file size: 4481024, total data size: 4444425
compression rate: 8.31x
total row count: 15129, stripe count: 1, average rows per stripe: 15129
chunk count: 18, containing data for dropped columns: 0, zstd compressed: 18

INFO:  statistics for "github_columnar_events_p2015_01_01_0200":
storage id: 10000000004
total file size: 3579904, total data size: 3548221
compression rate: 8.26x
total row count: 12714, stripe count: 1, average rows per stripe: 12714
chunk count: 18, containing data for dropped columns: 0, zstd compressed: 18

INFO:  statistics for "github_columnar_events_p2015_01_01_0400":
storage id: 10000000005
total file size: 2949120, total data size: 2917407
compression rate: 8.51x
total row count: 11756, stripe count: 1, average rows per stripe: 11756
chunk count: 18, containing data for dropped columns: 0, zstd compressed: 18
Copy the code

One of the strengths of the partitioned table Github_Columnar_events is that it can be queried completely just like a normal table.

SELECT COUNT(DISTINCT repo_id)
  FROM github_columnar_events;
Copy the code

Entries can be updated or deleted as long as there is a WHERE clause on the partitioning key that can be fully filtered into the row table partition.

Archive row partitions to column storage

When the row partition has filled its scope, you can archive it into compressed column storage. We can do this automatically using pg_cron, as follows:

-- a monthly cron job

SELECT cron.schedule('compress-partitions'.'0 0 1 * *'$$,CALL alter_old_partitions_set_access_method(
    'github_columnar_events',
    now() - interval '6 months' /* older_than */.'columnar'
  );
$$);
Copy the code

For more information, see column storage.

  • The column type storage
    • Docs.citusdata.com/en/v10.2/ad…

More and more

  • Official Example of distributed PostgreSQL Cluster (Citus) – Multi-tenant Application in action
  • Official example of distributed PostgreSQL Cluster (Citus) – Real-time dashboard