GreenPlum is introduced

PostgreSql

PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions. This distribution also contains C language bindings.

GreenPlum

The Greenplum Database (GPDB) is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based Query Optimizer delivering high Analytical Query Greenplum is based on PostgreSQL and has almost the same syntax as PostgreSQL. Greenplum is essentially a relational database cluster, a logical database composed of multiple independent database services. Greenplum is no different from Oracle or PostgreSQL in that it can access data through standard SQL statements.

GreenPlum source code compiled and installed

CentOS 6.9 is used as an example. The download address is Aliccloud Image.

CentOS – 6.9 – x86_64 – bin – DVD1. Iso CentOS – 6.9 – x86_64 – bin – DVD2. Iso

It is strongly recommended that the user name gpadmin be used.

The preparatory work

Add the permission of user root

Vim opens the /etc/sudoers file (as root) and finds the following two lines

## Allow root to run any commands anywhere
root    ALL=(ALL)       ALL
Copy the code

Add it on the next line

gpadmin ALL=(ALL)       ALL # Notice alignment
Copy the code

Save and return

Install GCC – c + +

sudo yum install gcc-c++
Copy the code

Install git

sudo yum install git
Copy the code

Install cmake

cd/ home/gpadmin wget https://cmake.org/files/v3.5/cmake-3.5.2.tar.gz tar - ZXVF cmake - 3.5.2. Tar. GzcdCmake - 3.5.2. / configure -- prefix = / usr /local
make
sudo make install
Copy the code

Install the PIP

sudo yum -y install epel-release
sudo yum -y install python-pip
pip install --upgrade The PIP version must be at least 7.x.x
Copy the code

Make sure you have the necessary Python modules installed

  • psutil
  • Lockfile (> = 0.9.1)
  • paramiko
  • setuptools

Ensure dynamic link library sharing

Add /usr/local/lib and /usr/local/lib64 to /etc/ld.so.conf and run the command

sudo ldconfig
Copy the code

Support for c + + 11

sudo yum install -y centos-release-scl
sudo yum install -y devtoolset-6-toolchain
echo 'source scl_source enable devtoolset-6' >> ~/.bashrc
source ~/.bashrc
Copy the code

Install dependencies

sudo yum -y install gcc git apr bison flex readline gcc-c++ curl-devel bzip2-devel python-devel readline-devel apr-devel  libevent-devel openssl-devel perl-ExtUtils-Embed libxml2-devel openldap-devel pam pam-develCopy the code

Install GPORCA

clone GPORCA

cd /home/gpadmin
git clone https://github.com/greenplum-db/gporca.git
Copy the code

Install the gp – xerces

git clone https://github.com/greenplum-db/gp-xerces.git
cd gp-xerces
mkdir build
cdbuild .. /configure --prefix=/usr/local
make
sudo make install
Copy the code

Install the ninja

cd /home/gpadmin
git clone git://github.com/ninja-build/ninja.git && cd ninja
./configure.py --bootstrap
sudo cp ninja /usr/bin
Copy the code

Compile and install GPORCA

cd /home/gpadmin/gporca
cmake -GNinja -H. -Bbuild
sudo ninja install -C build
Copy the code

Compile the GreenPlum

cd /home/gpadmin
git clone https://github.com/greenplum-db/gpdb.git
cd gpdb
./configure --with-perl --with-python --with-libxml --with-gssapi --prefix=/usr/local/gpdb

# Compile and install
make -j8
sudo make -j8 install

# Bring in greenplum environment into your running shell
source /usr/local/gpdb/greenplum_path.sh
Copy the code

If appear

Greenplum Database installation complete.
Copy the code

If yes, the installation is successful

The deployment of

Take a master node and a segment node on the same host as an example

Configuring the Host File

Check the IP

ip addr show eth0
Select IP from inet
Copy the code

Modifying the hosts file

sudo vim /etc/hosts
Copy the code

Add a line or two

<ip> mdw
<ip> sdw1
The IP addresses are the same
Copy the code

Modifying a Network file

sudo vim /etc/sysconfig/network
# change the HOSTNAME = MDW
Copy the code

Create host list file and segment file

cd /home/gpadmin
mkdir conf
cd conf
sudo vim hostlist
Copy the code

Add the following

mdw
sdw1
Copy the code
sudo vim seg_hosts
Copy the code

Add the following

sdw1
Copy the code

Shared memory and network parameters

sudo vim /etc/sysctl.conf
Copy the code

Replace it with the following

net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 1
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.sem = 250 64000 100 512
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 64000 100 512
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.core.netdev_max_backlog = 10000
vm.overcommit_memory = 2
net.ipv4.conf.all.arp_filter = 1
Copy the code
sudo sysctl -p
Copy the code

Add limit parameter

sudo vim /etc/security/limits.conf
Copy the code

Add the following

* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
The # asterisk cannot be removed
Copy the code

Node deployment

Setting environment variables

source /usr/local/gpdb/greenplum_path.sh 
Copy the code

Configure password-free login

sudo pip install psutil
gpssh-exkeys -f /home/gpadmin/conf/hostlist 

Copy the code

Creating a data directory

cd /home/gpadmin
mkdir gpdata
cd gpdata
mkdir segmentdata1 masterdata
Copy the code

Environment Variable Configuration

vim ~/.bash_profile 
Copy the code

Add the following

source /usr/local/gpdb/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/home/gpadmin/gpdata/masterdata/gpseg-1
export PGPORT=2345
export PGDATABASE=testDB
Copy the code
source ~/.bash_profile
Copy the code

Write a database startup parameter file

cd /home/gpadmin/conf
mkdir gpconfigs
cd gpconfigs
vim gpinitsystem_config
Copy the code

Add the following

Start parameter file contents
ARRAY_NAME="Greenplum"
SEG_PREFIX=gpseg
PORT_BASE=40000
declare -a DATA_DIRECTORY=(/home/gpadmin/gpdata/segmentdata1 /home/gpadmin/gpdata/segmentdata1)
MASTER_HOSTNAME=mdw
MASTER_DIRECTORY=/home/gpadmin/gpdata/masterdata
##### Port number for the master instance.
MASTER_PORT=2345
# #### Shell utility used to connect to remote hosts.
TRUSTED_SHELL=/usr/bin/ssh
##### Maximum log file segments between automatic WAL checkpoints.
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODE
#MIRROR_PORT_BASE=50000
REPLICATION_PORT_BASE=41000
#MIRROR_REPLICATION_PORT_BASE=51000
#declare -a MIRROR_DATA_DIRECTORY=(/home/gpadmin/gpdata/segmentmirror1 #/home/gpadmin/gpdata/segmentmirror)
MACHINE_LIST_FILE=/home/gpadmin/conf/seg_hosts
Copy the code

Initializing the database

Gpinitsystem - c/home/gpadmin/conf/gpconfigs/gpinitsystem_config - aCopy the code

Database usage

Gpstart # Start the database service gpStop # Stop the database service gprecoverseg # Restore the failed node gpState -m # Check the node status createdb testDB # Create database PSQL # Connect to database PSQL –help see more options

TPC -h test

Run Greenplum TPC -h test reference https://yq.aliyun.com/articles/93?commentId=29