The development of big data field is accompanied by the increasing diversification of data sources, which makes it more and more difficult for data warehouse to manage and understand the collected data, and the demand for query and management of external data sources is also more and more intense. Postgres’ solution to this problem is a Foreign Data Wrapper (FDW), and Greenplum 6.0 has excellent support for FDW.

What is a FDW

FDW is the Postgres implementation of SQL standard SQL/MED (SQL Management of External Data). FDW provides a set of unified common interfaces that make it easy for extenders to deeply integrate with Postgres in the core areas of optimization, execution, scanning, updating, and statistics, and to query and manipulate external data sources directly with SQL statements. For example, FDW for MySQL allows users to query, sort, group, filter, Join, and even insert and update MySQL data as if they were working with local tables. In addition to diy implementations, many FDW extensions already exist in the community. The official Postgres Wiki page lists some FDW extensions for common data sources.

Greenplum 6.0 support status

All Postgres 9.4 based FDW extensions are perfectly compatible with Greenplum 6.0. By default, FDW runs on the Master node, but Greenplum also has the option to run FDW on any node or on all Segment nodes. Create FDW (postgres_fdw as an example) :

gpadmin=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION

gpadmin=# CREATE SERVER s0 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '192.0.2.0', port '15432', dbname 'gpadmin');
CREATE SERVER

gpadmin=# CREATE USER MAPPING FOR gpadmin SERVER s0 OPTIONS (user 'gpadmin');
CREATE USER MAPPING
Copy the code

Create external table on Master node by default

gpadmin=# CREATE FOREIGN TABLE ft1 (c1 text) SERVER s0 OPTIONS (schema_name 'public', table_name 't1');
CREATE FOREIGN TABLE

gpadmin=# EXPLAIN VERBOSE SELECT * FROM ft1 WHERE c1 = 'foo';QUERY PLAN -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Foreign Scan on public. The ft1 (cost = 100.00.. 183.25 ROWS =6 width=32) Output: c1 Remote SQL: SELECT C1 FROM public'foo'::text))
 Optimizer: Postgres query optimizer
(4 rows)
Copy the code

Query:

gpadmin=# SELECT \* FROM ft1;  
 c1  
\-----  
 foo  
 bar  
(2 rows)  
  
gpadmin=# SELECT \* FROM ft1 WHERE c1 = 'foo';  
 c1  
\-----  
 foo  
(1 row)
Copy the code

Instead of running on all Segment nodes (note that there is a Gather Motion) :

gpadmin=# ALTER TABLE ft1 OPTIONS (mpp_execute 'all segments');
ALTER TABLE

gpadmin=# EXPLAIN VERBOSE SELECT * FROM ft1 WHERE c1 = 'foo';QUERY PLAN --------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; Segments: (3) cost = 100.00.. 183.25 rows=6 width=32) Output: c1 -> Foreign Scan on public.ft1 (cost=100.00.. 183.25 ROWS =2 width=32) Output: c1 Remote SQL: SELECT C1 FROM public. T1 WHERE ((c1 = 1) rows=2 width=32) Output: c1 Remote SQL: SELECT C1 FROM public'foo'::text))
 Optimizer: Postgres query optimizer
(6 rows)
Copy the code

Differences between FDW and External Tables

  • External Tables supports a limited number of External data sources and is unique to Greenplum. FDW can easily take advantage of the community’s hundreds of FDW extensions.
  • External Tables is essentially copy-based, and its data source needs to be processed into CSV or TEXT format, which is inefficient. FDW gets the data directly.
  • The integration of External Tables and the kernel is limited, with little optimization and no push-down at all. FDW and kernel deep integration, can provide optimization, push-down and statistics functions. (Greenplum 6.0 is currently limited to Postgres kernel versions that only do conditional push-downs, and other push-downs will be supported in future versions.)
  • External Tables is increasingly implemented in Greenplum with FDW.