Background environment

As a kind of simple and easy to use unstructured data, JSON format is widely used. In the current big data environment, the need to deal with unstructured data is more and more frequent. Do we have to use NoSQL solutions like MongoDB? The PostgreSQL database provides complete support for JSON on the basis of RDBMS. JSON can be played without MongoDB.

Postgresql-9.2 introduced JSON support for numeric types. After several major versions of postgresqL-9.2, the support for numeric types is now complete. CRUD operations on JSON-formatted information and index specific nodes can be easily implemented in PG.

This time we tested the common scenario of using JSON in PG. The software environment is as follows

CentOS 7 x64

PostgreSQL 11.1

Two data types

Two different data types are provided in PG, JSON and JSONB. As the name implies, JSON is the raw format for storing strings, while JSONB is the binary encoded version. JSON needs to store raw formats, including whitespace, so there is parsing at every query. JSONB queries are more efficient because they do not require real-time parsing.

In short, JSON is fast to insert and slow to query for accurate storage; JSONB for efficient query, slow insertion and fast retrieval.

If there is no special reason, it is best to use the JSONB type.

-- Use JSONB field types (do not use JSON types without special requirements)
drop table if exists demo.j_waybill;
create table demo.j_waybill (id int primary key.data jsonb);

insert into demo.j_waybill(id.data) values(1.'{" waybill ": 2019000000," the project ":" test project ", "pay_org_name" : "ABC factory", "driver" : {" name ":" zhang ", "mobile" : 13800000000}, "line" : {" from ": {" province", "hebei province", "city" : "tangshan", "district" : "rich area"}, "to" : {" province ", "sichuan province", "city" : "mianyang city", "district", "municipal districts"}}, "payment" : {" oil_amount: "1234," cash_amount ": 5678}} '
);
Copy the code

Data query

Formatted output

Jsonb_pretty () select jsonb_pretty(w.ata) from demo.j_waybill w where w.id = 1; Jsonb_pretty -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- {+ "line" : {+ "to" : {+ "city" : "mianyang city," + "district" : "Municipal district," + "province", "sichuan province" +}, + "from" : {+ "city" : "tangshan," + "district" : "rich area," + "province" : In hebei province ", "+ +}}, +" driver ": {+" name ":" zhang SAN, "+" mobile ": 13800000000 +}, +" payment ": {+" oil_amount ": 1234, + "cash_AMOUNT ": 5678 +}, + "project":" test project", + "waybill": 2019000000, + "pay_org_name": "ABC manufacturer "+} (1 row)Copy the code

Extract object members

PG provides two types of query syntax, a -> syntax for extracting top-level members and a #> syntax for extracting nested members. To extract only text, use ->> or #>>.

-- Extracts the top-level member, noting the difference between -> and ->>, which extracts the text value
select
     w.data->'waybill' as waybill,
     w.data->'project' as project,
     w.data->>'project' as project_text
 from demo.j_waybill w where w.id = 1;

waybill   |  project   | project_text
------------+------------+--------------2019000000 | | "test project" test project (1 row)Copy the code
-- Specify the path of the node to extract the nested members, Select w.da #>'{driver}' as driver, w.da #>>'{driver, name}' as driver_name, w.da #>'{driver, w.da #>'. mobile}' as mobile from demo.j_waybill w where w.id = 1; driver | driver_name | mobile -----------------------------------------+-------------+------------- {"name": "Zhang", "mobile" : 13800000000} | 13800000000 | zhang SAN (1 row)Copy the code

Conditions for screening

PG provides special existential notation, right? . This syntax is equivalent to is not null.

Determines whether a specified top-level key exists
select count(1) from demo.j_waybill w where w.data ? 'waybill';
 count
-------
     1
(1 row)

The equivalent of the previous statement is as follows
select count(1) from demo.j_waybill w where w.data->'waybill' is not null ;


-- Determines whether a nested key exists
select count(1) from demo.j_waybill w where w.data->'driver' ? 'mobile';
 count
-------
     1
(1 row)

Copy the code

? | and? & right? Equivalent to the OR and AND operations.

-- Multiple criteria? | says the or,? & said the and
select count(1) from demo.j_waybill w where w.data->'driver'? |'{"mobile", "addr"}';
Copy the code

In addition to checking for the presence of a key, you can also check for key:value with the @> symbol.

-? It is used only to check the presence of the key, so @> can check the function of the substring
select count(1) from demo.j_waybill w where w.data @> '{"waybill":2019000000, "project":" test project"}';
 count
-------
     1
(1 row)

The equivalent of the previous statement is as follows
-- PS: numeric arguments are to_jsonb() and strings are extracted with ->>
select count(1) from demo.j_waybill w 
	where w.data->'waybill' = to_jsonb(2019000000) 
	and w.data->>'project' = 'Test items' ;
	
You can also use type conversions
select count(1) from demo.j_waybill w 
	where (w.data->'waybill') : :numeric = 2019000000 
	and w.data->>'project' = 'Test items' ;
Copy the code

Data update

Add/Merge

- merge operator | | is used to add new nodes, Demonstrate the following select jsonb_pretty (w.d ata# > '{line}' | | '{" new_line ":" add "}') as new_line, Jsonb_pretty (w.d ata | | '{" new_key ":" increases "}') as new_key from demo. J_waybill w where w.i d = 1; new_line | new_key -------------------------------+----------------------------------- { +| { + "to": { +| "line": {+ "city", "mianyang city", + | "to" : {+ "district", "municipal district," + | "city" : "mianyang city," + "province", "sichuan province", "+ |" district ": "Municipal district," +}, + | "province", "sichuan province", "+" from ": {+ |}, +" city ":" tangshan ", + | "from" : {+ "district" : "rich area," + | "city" : "Tangshan," + "province", "hebei province", "+ |" district ":" rich area, "+}, + |" province ", "hebei province", "+" new_line ": "Add" + +} | |}}, + | "driver" : {+ | "name" : "zhang", + | "mobile" : 13800000000 + |}, + | "new_key" : "+ | increase", "payment" : {+ | "oil_amount" : 1234, + | "cash_amount" : 5678 + |}, + | "project" : "test project", + | "waybill" : 2019000000, + | "pay_org_name" : "ABC factory" + |} (1 row)Copy the code
The -- operator can be used in the UPDATE syntax
update demo.j_waybill 
	set data = data || '{"new_key":" increment "}' ;
Copy the code

delete

Delete the entire top-level member
update demo.j_waybill 
	set data = data-'driver'  ;
	
-- Deletes the member in the specified path
update demo.j_waybill 
	set data = data# -'{driver, mobile}'  ;	
	
Delete multiple members simultaneously
update demo.j_waybill 
	set data = data# -'{driver, mobile}'# -'{line, to}'  ;		
Copy the code

Modify the

Jsonb_set () is designed to update single-path node values. The meanings of the parameters are as follows:

  1. The first is the JSONB datatype field you want to modify;
  2. The second is a text array that specifies the path to be modified.
  3. The third argument is to replace the value (which can be JSON);
  4. If the given path does not exist, json_set() creates it by default; If you want to disable this behavior, set the fourth parameter to false;
-- String, using double quotation marks
update demo.j_waybill set data = jsonb_set(data.'{"project"}'.'Changed' );

-- numbers, to_jsonb()
update demo.j_waybill set data = jsonb_set(data.'{"waybill"}', to_jsonb(100));-- Add simple elements
update demo.j_waybill set data = jsonb_set(data.'{"new_simple"}', to_jsonb(999));-- Add complexity
update demo.j_waybill set data = jsonb_set(data.'{"new_complex"}'.'{"foo":"bar", "foo1": 123}');
Copy the code

The index

PG comes with gin type indexes that support all JSON operations except range queries. Let’s use some examples to illustrate.

Create a sample table
drop table if exists demo.j_cargo;
create table demo.j_cargo (id int primary key.data jsonb);

insert into demo.j_cargo(id.data)
select v.waybill_id, to_jsonb(v)
from (
	select b.waybill_create_time, c.*
		from dwd_lhb.wb_cargo_info as c, dwd_lhb.wb_base_info as b 
	where c.waybill_id = b.waybill_id 
	limit 100000
) as v
;
Copy the code

The default mode

Gin has two usage modes, with no parameters by default. Create index as follows

All queries except range queries are supported
drop index if exists idx_jc_non_ops ;
create index idx_jc_non_ops on demo.j_cargo using gin (data);
Copy the code

Does the specified KEY exist? Operations, as follows

Explain select * from demo.j_cargo j where j.data? 'cargo_name'; QUERY PLAN -------------------------------------------------------------------------------- Bitmap Heap Scan on j_cargo J (cost = 16.77.. 389.25 rows=100 width=803) Recheck Cond: (data? 'Cargo_name '::text) -> Bitmap Index Scan on idx_jc_non_OPS (cost=0.00.. 16.75 rows=100 width=0) Index Cond: (data? 'cargo_name'::text) (4 rows)Copy the code

The @> operation that determines whether the specified Key:Value is equal is as follows

Explain select * from demo.j_cargo j where j.dat@ > '{"cargo_name":" cargo_name"}'; QUERY PLAN -------------------------------------------------------------------------------- Bitmap Heap Scan on j_cargo J (cost = 28.77.. 401.25 rows=100 width=803) Recheck Cond: (data @> '{"cargo_name": '::jsonb) -> Bitmap Index Scan on idx_jc_non_OPS (cost=0.00.. 28.75 rows=100 width=0) Index Cond: (data @> '{"cargo_name": "cargo_name"}'::jsonb) (4 rows)Copy the code

The value equality judgment of the OR operation

- PS: Explain select * from demo.j_cargo J where j.dat@ > '{"cargo_name":" cargo_name"}' or j.dat@ > '{" cargo_name ":" white "} "; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on J_cargo J (cost=57.60.. 775.81 rows=200 width=803) Recheck Cond: ((dat@ > '{"cargo_name": "cargo_name"}'::jsonb) OR (dat@ > '{"cargo_name": }'::jsonb)) -> BitmapOr (cost=57.60.. Rows =200 width=0) -> Bitmap Index Scan on idx_jc_non_OPS (cost=0.00.. 28.75 rows=100 width=0) Index Cond: (data @> '{"cargo_name": '::jsonb) -> Bitmap Index Scan on idx_jc_non_OPS (cost=0.00.. 28.75 rows=100 width=0) Index Cond: (data @> '{"cargo_name": "jsonb "}'::jsonb) (7 rows)Copy the code

Jsonb_path_ops mode

Gin index with JSONB_PATH_OPS, more efficient than the default.

-- jsonb_PATH_ops only supports the @> operator, but is efficient
drop index if exists idx_jc_ops ;
create index idx_jc_ops on demo.j_cargo using gin (data jsonb_path_ops);
Copy the code

Look at the execution plan and make sure that the more efficient index idX_JC_OPS is used

Explain select * from demo.j_cargo J WHERE j.dat@ > '{"cargo_name":" cargo_name"}'; QUERY PLAN ---------------------------------------------------------------------------- Bitmap Heap Scan on j_cargo j (cost = 16.77.. 389.25 rows=100 width=803) Recheck Cond: (data @> '{"cargo_name": }'::jsonb) -> Bitmap Index Scan on idx_jc_OPS (cost=0.00.. 16.75 rows=100 width=0) Index Cond: (data @> '{"cargo_name": "cargo_name"}'::jsonb) (4 rows) 16.75 rows=100 width=0) Index Cond: (data @> '{"cargo_name": "cargo_name"}'::jsonb)Copy the code

Btree index – number

Since gin indexes do not support range queries, we create btree indexes for fields that require this. At creation time, an explicit conversion must be performed, as follows

Support range query, extract the type of range query, create btree expression index
drop index if exists idx_jc_btree_num ;
create index idx_jc_btree_num on demo.j_cargo ( ((data->>'price') : :numeric));Copy the code

Type conversions are also required when using indexes, as follows

explain select * from demo.j_cargo j where (j.data->>'price')::numeric between 10 and 100; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Bitmap Heap Scan on j_cargo j (cost = 13.42.. 1673.22 rows=500 width=803) Recheck Cond: ((((data ->> 'price'::text))::numeric >= '10'::numeric) AND (((data ->> 'price'::text))::numeric <= '100'::numeric)) -> Bitmap Index Scan on IDx_jC_btree_num (cost=0.00.. Rows =500 width=0) Index Cond: ((((data ->> 'price'::text))::numeric >= '10'::numeric) AND (((data ->> 'price'::text))::numeric <= '100'::numeric)) (4 rows)Copy the code

Btree index – Timestamp

Important: If you create a btree index of timestamp type, an error is reported because the default string to timestamp function does not meet the IMMUTABLE feature

-- Timestamp error!! The default string-to-timestamp function is not immutable
create index idx_jc_btree_ts on demo.j_cargo ( ((data->>'waybill_create_time') : :timestamp)); ERROR: functions in index expression must be marked IMMUTABLECopy the code

Instead, create an IMMUTABLE function for type conversions as follows

-- Customizes immutable function to handle timestamp
drop function if exists demo.to_timestamp  ;
create or replace function demo.to_timestamp(text) returns timestamp as ?  
  select $1: :timestamp;  
? language sql strict immutable;  

--
drop index if exists idx_jc_btree_ts ;
create index idx_jc_btree_ts on demo.j_cargo ( demo.to_timestamp(data->>'waybill_create_time'));Copy the code

You also need to use custom functions to use indexes in SQL, as shown below

Explain select * from demo.j_cargo j where demo.to_timestamp(j.data->>'waybill_create_time') between '2015-06-27' and '2015-06-28'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ ---------------------------------------------------------------------------------- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Bitmap Heap Scan on j_cargo j (cost = 13.42.. 1918.22 rows=500 width=803) Recheck Cond: ((demo.to_timestamp((data ->> 'waybill_create_time'::text)) >= '2015-06-27 00:00:00'::timestamp without time zone) AND (demo.to_timestamp((data ->> 'waybill_create_time'::text)) <= '201 5-06-28 00:00:00'::timestamp without time zone)) -> Bitmap Index Scan on IDx_jC_btree_TS (cost=0.00.. Rows =500 width=0) Index Cond: ((demo.to_timestamp((data ->> 'waybill_create_time'::text)) >= '2015-06-27 00:00:00'::timestamp without time zone) AND (demo.to_timestamp((data ->> 'waybill_create_time'::text)) <= '2015-06-28 00:00:00'::timestamp without time zone)) (4 rows)Copy the code
-- When not using custom functions, Explain select * from demo.j_cargo J WHERE (j.data->>'waybill_create_time')::timestamp between '2015-06-27'  and '2015-06-28'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ ---------------------------------------------------------------------------------- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Gather (cost = 1000.00.. 13167.00 ROWS =500 width=803) Workers Planned: 2 -> Parallel Seq Scan on J_cargo J (cost=0.00.. 12117.00 rows = 208 width = 803) Filter: ((((data ->> 'waybill_create_time'::text))::timestamp without time zone >= '2015-06-27 00:00:00'::timestamp without time  zone) AND (((data ->> 'waybill_create_time'::text))::timestamp w ithout time zone <= '2015-06-28 00:00:00'::timestamp without time zone)) (4 rows)Copy the code