This is the 14th day of my participation in the August More Text Challenge. For details, see: August More Text Challenge

About generating columns, generating columns, or calculating columns

The redundant columns from the generic processing in the previous article lead to generated, derived, or computed columns, because the generated columns meet the requirements of creating a new redundant column on top of the original column.

Of course, generating columns does more than that, but has a broader application.

The following sections refer mainly to generated/computed columns from SQL and mainstream database implementations.

The article on the generation of columns/calculation of the column is quite complete and complete, in addition to the SQL Server part of the example is not correct, other parts are very detailed, a good article, so reference to the original and made small modifications, while removing the generation of column considerations or restrictions, one can directly refer to the original; Second, many of the concepts or terms introduced in this section require an in-depth understanding of the database.

What is a generated column?

In an SQL database, a Generated Column is a Column Computed from other columns in a table. Therefore, it is also called a Computed Column.

There are two types of generated columns: stored and virtual generated columns.

Storage A generated column is similar to a normal column. When data is inserted or updated, the value of the column is automatically calculated and stored, requiring storage space. Virtually generated columns do not require storage space and are only computed when read.

The virtual generated column is like a view (a view of the field), while the stored generated column is like a materialized view (updated in real time).

⚠️ cannot directly insert or update the value of the generated column; its value is automatically generated and updated by the database.

Common uses for generating columns

  • Virtual generated columns can be used to simplify and unify queries. We can define a complex query condition as a generation column and then use it when querying the table to ensure that all queries use the same criteria.

  • Storing generated columns can serve as a materialized cache for query conditions, reducing the computational cost of queries.

  • Generated columns can mimic functional indexes: define a generated column based on a functional expression and create an index. For storage-type generated columns, this approach requires more storage.

Column generation/computed column support in various major databases

The generated column Oracle MySQL SQL Server PostgreSQL SQLite
Store generated columns ✔ ️ ✔ ️ ✔ ️ ✔ ️
Virtual generated column ✔ ️ ✔ ️ ✔ ️ ✔ ️
Constraint support Primary key constraint

NOT NULL

UNIQUE

CHECK

Foreign key constraints
Primary key constraint

NOT NULL

UNIQUE

CHECK

Foreign key constraints
Primary key constraint

NOT NULL

UNIQUE

CHECK

Foreign key constraints
Primary key constraint

NOT NULL

UNIQUE

CHECK

Foreign key constraints
NOT NULL

UNIQUE

CHECK

Foreign key constraints
Index support ✔ ️ ✔ ️ ✔ ️ ✔ ️ ✔ ️

All generated columns do not support default value constraints.

Virtual columns in Oracle

Oracle 11g began to support virtual generated columns, or virtual columns for short. The syntax is as follows:

column [ datatype [ COLLATE column_collation_name ] ]
  [ GENERATED ALWAYS ] AS (column_expression) [ VIRTUAL ]
Copy the code

The type of the virtual column is determined by the expression column_expression, which corresponds to datatype. GENERATED ALWAYS AS specifies the GENERATED column. Expressions can only contain fields in the current table, constants, and deterministic functions; VIRTUAL indicates the VIRTUAL column and can be omitted.

Here is an example of creating a virtual column:

CREATE TABLE t_circle(
   id INTEGER PRIMARY KEY,
   x NUMERIC(20.10) NOT NULL,
   y NUMERIC(20.10) NOT NULL,
   radius NUMERIC(20.10) NOT NULL,
   perimeter NUMERIC(20.10) GENERATED ALWAYS AS (2 * 3.14159265 * radius) VIRTUAL
);

ALTER TABLE t_circle ADD (area AS (3.14159265 * radius * radius));
Copy the code

First, a virtual column perimeter was created for TABLE T_circle using the CREATE TABLE statement. We then add a virtual column area to it using the ALTER TABLE statement.

Example:

INSERT INTO t_circle(id, x, y, radius) VALUES (1.2.2.5);
SELECT * FROM t_circle;
ID|X|Y|RADIUS|PERIMETER |AREA       |
--|-|-|------|----------|-----------|
 1|2|2|     5|31.4159265|78.53981625|

INSERT INTO t_circle(id, x, y, radius ,perimeter) VALUES (2.0.0.1.6.28318530);
SQL Error [54013] [99999]: ORA- 54013.: INSERT operation disallowed on virtual columns
Copy the code

Virtual columns in Oracle support indexes. Create two indexes for virtual columns in T_circle:

CREATE UNIQUE INDEX idx11 ON t_circle(perimeter);
CREATE INDEX idx12 ON t_circle(area);
Copy the code

Detailed reference: Oracle official documentation.

Generate columns in MySQL

MySQL 5.7 introduced generated columns, supporting both virtual and stored types of generated columns. The syntax for defining a generated column is as follows:

col_name data_type [GENERATED ALWAYS] AS (expr)
  [VIRTUAL | STORED] [NOT NULL | NULL]
  [UNIQUE [KEY]] [[PRIMARY] KEY]
  [COMMENT 'string']
Copy the code

Where, AS defines an expression that generates columns; VIRTUAL means creating a VIRTUAL generated column whose values are not stored but computed immediately after the BEFORE trigger when read; STORED means to store generated columns. By default, the VIRTUAL generated column is created.

Create table T_circle with generated columns as follows:

CREATE TABLE t_circle(
   id INTEGER PRIMARY KEY,
   x NUMERIC(20.10) NOT NULL,
   y NUMERIC(20.10) NOT NULL,
   radius NUMERIC(20.10) NOT NULL,
   perimeter NUMERIC(20.10) AS (2 * 3.14159265 * radius)
);

ALTER TABLE t_circle ADD area NUMERIC(20.10) AS (3.14159265 * radius * radius) STORED;
Copy the code

Perimeter is a virtual generation column; An area is a generated column for storage.

If the result type of the expression is different from the data type in the field definition, an implicit cast is performed.

The data operation is performed as follows: The first insert statement does not specify the value of the generated column, which is automatically calculated by the database; The second insert statement, which provided data for perimeter, failed to execute; Instead, use the DEFAULT keyword.

MariaDB [test]> INSERT INTO t_circle(id, x, y, radius) VALUES (1.2.2.5);
Query OK, 1 row affected (0.007 sec)

MariaDB [test]> SELECT * FROM t_circle;
+----+--------------+--------------+--------------+---------------+---------------+
| id | x            | y            | radius       | perimeter     | area          |
+----+--------------+--------------+--------------+---------------+---------------+
|  1 | 2.0000000000 | 2.0000000000 | 5.0000000000 | 31.4159265000 | 78.5398162500 |
+----+--------------+--------------+--------------+---------------+---------------+
1 row in set (0.000 sec)

MariaDB [test]> INSERT INTO t_circle(id, x, y, radius ,perimeter) VALUES (2.0.0.1.6.28318530);
ERROR 1906 (HY000): The value specified for generated column 'perimeter' in table 't_circle' ignored
Copy the code

MySQL supports storing indexes for generated columns, and InnoDB also supports secondary indexes for virtual generated columns. For details, see MySQL’s official documentation.

Create two indexes for each of the two generated columns of the T_circle table using the following statement:

CREATE UNIQUE INDEX idx1 ON t_circle(perimeter);
CREATE INDEX idx2 ON t_circle(area);
Copy the code

Reference documentation: Official MySQL documentation.

Calculated columns in SQL Server

SQL Server 2005 added support for generating columns, called computed columns. The full definition of a computed column is as follows:

Calculated columns in SQL Server do not specify data types.

<computed_column_definition>: :=
column_name AS computed_column_expression
[ PERSISTED [ NOT NULL]] [[CONSTRAINT constraint_name ]
    { PRIMARY KEY | UNIQUE }
        [ CLUSTERED | NONCLUSTERED ]
        [
            WITH FILLFACTOR = fillfactor
          | WITH ( <index_option> [ , ...n ] )
        ]
        [ ON { partition_scheme_name ( partition_column_name )
        | filegroup | "default" } ]
  
    | [ FOREIGN KEY ]
        REFERENCES referenced_table_name [ ( ref_column ) ]
        [ ON DELETE { NO ACTION | CASCADE } ]
        [ ON UPDATE { NO ACTION } ]
        [ NOT FOR REPLICATION ]
  
    | CHECK [ NOT FOR REPLICATION ] ( logical_expression )
]
Copy the code

AS means to define a computed column; PERSISTED means that the values of the column need to be stored, the stored compute column; Expressions that evaluate columns can use other non-evaluated columns, constants, functions, variables, but cannot use subqueries or alias data types.

Computed columns in SQL Server support primary key and UNIQUE constraints. Stored computed columns also support NOT NULL, foreign key, and CHECK constraints.

Create table T_circle

CREATE TABLE t_circle(
   id INTEGER PRIMARY KEY,
   x NUMERIC NOT NULL,
   y NUMERIC NOT NULL,
   radius NUMERIC NOT NULL,
   perimeter AS (2 * 3.14159265 * radius),
   area AS (3.14159265 * radius * radius) PERSISTED
);

ALTER TABLE t_circle ADD dt AS GETDATE();
Copy the code

Perimeter is a virtual computing column; An area is a stored computation column; The ALTER TABLE statement adds a third computed column, using an indeterminate function GETDATE().

Insert data and query results as follows:

INSERT INTO t_circle VALUES (1.2.2.5);

Cannot return calculated column
SELECT * FROM t_circle;
----
id	x	y	radius	perimeter	area	dt
1	2	2	5	31.41592650	78.539816	2021- 08- 13 11:42:49.733

INSERT INTO t_circle(id, x, y, radius ,perimeter) VALUES (2.0.0.1.6.28318530);
 Msg 271, Level 16, State 1, Line 1
 The column "perimeter" cannot be modified because it is either a computed column or is the result of a UNION operator.
Copy the code

The DT field will return different dates if you run it multiple times.

SQL Server supports indexes based on computed columns, but it is not possible to create indexes on computed columns with uncertain values:

create unique index idx1 on t_circle(perimeter);
create index idx2 on t_circle(area);

create index idx3 on t_circle(dt);
SQL Error [2729] [S0001]: Column 'dt' in table 't_circle' cannot be used in an index or statistics or as a partition key because it is non-deterministic.
Copy the code

The DT column does not support indexes because it contains indeterminate functions whose values may change each time it is called.

Reference documents: SQL Server official documents.

Generated columns in PostgreSQL

PostgreSQL 12 is starting to support generated columns, both virtual and stored.

PostgreSQL currently implements only stored generated columns.

column_name data_type [ COLLATE collation ]
[ CONSTRAINT constraint_name ]
GENERATED ALWAYS AS ( generation_expr ) STORED
Copy the code

GENERATED ALWAYS AS indicates that a GENERATED column is created. Generation_expr specifies the expression to generate the column; STORED represents generated columns of a STORED type and cannot be omitted.

CREATE TABLE t_circle(
   id INTEGER PRIMARY KEY,
   x NUMERIC NOT NULL,
   y NUMERIC NOT NULL,
   radius NUMERIC NOT NULL,
   perimeter NUMERIC GENERATED ALWAYS AS (2 * 3.14159265 * radius) STORED
);

ALTER TABLE t_circle ADD area NUMERIC GENERATED ALWAYS AS (3.14159265 * radius * radius) STORED;
ALTER TABLE t_circle ADD area2 NUMERIC CONSTRAINT generated_col GENERATED ALWAYS AS (3.14159265 * radius * radius) STORED;
Copy the code

The CREATE TABLE statement defines a generation column perimeter for TABLE T_circle, which represents the perimeter of the circle. Then, use the ALTER TABLE statement to add a generated column area to represent the area of the circle.

Insert data tests into the table as follows:

shop=# INSERT INTO t_circle VALUES (1.2.2.5);
INSERT 0 1
shop=# SELECT * FROM t_circle;
 id | x | y | radius |  perimeter  |    area     |    area2
----+---+---+--------+-------------+-------------+-------------
  1 | 2 | 2 |      5 | 31.41592650 | 78.53981625 | 78.53981625
(1Line record) shop=# INSERT INTO t_circle(id, x, y, radius ,perimeter) VALUES (2.0.0.1.6.28318530); Error: Unable to insert into column "perimeter"SQL Error [42601]: ERROR: cannot insert into column"Perimeter" description: Column "perimeter" is the generated column. Detail:Column "perimeter" is a generated column.
Copy the code

The first insert statement does not specify the value of the generated column and is automatically computed by the database. The second insert statement, which provided data for perimeter, failed to execute; INSERT and UPDATE statements cannot specify values for generated columns, although the DEFAULT keyword can be used.

PostgreSQL supports indexes for generated columns. Use the following statement to create two indexes for each of the generated columns of the T_circle table:

shop=# create unique index idx1 on t_circle(perimeter);
CREATE INDEX
shop=# create index idx2 on t_circle(area);
CREATE INDEX
Copy the code

PostgreSQL can generate columns for external tables. For details, see CREATE FOREIGN TABLE.

The access control for the generated column is independent of the underlying column referenced in its expression. Thus, a user may not be able to read the data in the underlying columns, but can read the data in the generated columns for specific data security access.

Reference document: PostgreSQL generates columns.

Generated columns in SQLite

SQLite 3.31.0 supports GENERATED columns. Syntax:

GENERATED ALWAYS; STORED indicates the generated columns of the storage type, VIRTUAL indicates the generated columns of the VIRTUAL type, and the default is VIRTUAL generated columns.

For example, the following example creates a table that generates columns:

CREATE TABLE t_circle(
   id INTEGER PRIMARY KEY,
   x NUMERIC NOT NULL,
   y NUMERIC NOT NULL,
   radius NUMERIC NOT NULL,
   perimeter NUMERIC GENERATED ALWAYS AS (2 * 3.14159265 * radius) VIRTUAL,
   area NUMERIC GENERATED ALWAYS AS (3.14159265 * radius * radius) STORED
);
Copy the code

The ALTER TABLE ADD COLUMN command in ⚠️SQLite can only ADD VIRTUAL generated columns, but does not support STORED generated columns.

Insert some data tests:

sqlite> INSERT INTO t_circle VALUES (1.2.2.5);
sqlite> SELECT * FROM t_circle;
1|2|2|5|31.4159265|78.53981625

sqlite> INSERT INTO t_circle(id, x, y, radius ,perimeter) VALUES (2.0.0.1.6.28318530);
Error: cannot INSERT into generated column "perimeter"
Copy the code

The first insert statement executes successfully, and the query returns the values of the two generated columns; The second insert statement attempts to specify the value of the generated column and returns an error.

Create two indexes for each of the generated columns of the T_circle table:

sqlite> create unique index idx1 on t_circle(perimeter);
sqlite> create index idx2 on t_circle(area);
Copy the code

Reference documentation: SQLite official documentation.