What is high availability

High Availability (HA) is one of the factors that must be considered in the architecture design of distributed systems. It usually refers to the design to reduce the time when the system cannot provide services. Assuming that the system is always in service, let’s say that the system is 100% available. Many companies have a high availability target of 4 nys, which is 99.99%. This means that the system has an annual downtime of 0.876 hours.

How to be highly available

The biggest enemy of high availability systems is single points of failure. Any single point of failure is inevitable. If the system is a single point of architecture, the failure of a single point will cause the system to become unavailable. So, the essence of implementing high availability is redundancy. The redundancy deployment service detects and replaces a single point of failure with a backup system.

Greenplum is a highly available implementation

Greenplum is a highly available database system that also implements high availability through redundant deployment.

The Greenplum system contains a primary node and several child nodes. Greenplum achieves high availability precisely by providing redundancy for each node. In database system, redundancy is realized by replication technology.

copy

Greenplum achieves redundancy through replication. An Xlog transaction log is generated when a write operation is performed. Xlog can be used as a basis for data recovery in the event of a database Crash, and can also be passed to slave nodes as incremental updates. Xlogs are stored in the pg_xlog directory of the data directory.

There is a wal- Sender process on the primary node and a wal- Receiver process on the secondary node. The WAL – Sender process synchronizes write logs to the wal- Receiver process for redundancy.

The Wal Sender and Wal Receiver processes on the Primary and Mirror nodes can be seen using the ps command

Synchronous or asynchronous replication

Replication is simply copying xlogs to secondary nodes. Replication can be either synchronous or asynchronous.

For synchronous replication, the COMMIT operation does not return until the Xlog is synchronized to the slave node.

For asynchronous replication, xlog is flushed to the local disk and returned without waiting for the slave node to commit, so xlog synchronization to the slave node is asynchronous and can be done immediately or with a significant delay.

Synchronous replication ensures consistency between master and slave, but increases the delay of transaction commit. In addition, if the slave node fails, the transaction will hang and cannot commit for a while. Asynchronous replication reduces latency, but there may be master/slave inconsistencies.

To ensure high availability, Greenplum currently uses synchronous replication. Controlled by GUC synchronous_COMMIT and synchronous_standby_NAMES. The synchronous_standby_NAMES configuration sets the child node information for synchronization, which is usually set to ‘*’ to indicate synchronous replication on all slave nodes (currently, a maximum of one slave node is supported). Synchronous_commit must be set to on.

Greenplum High availability deployment diagram

In summary, for a highly available Greenplum cluster, the master node corresponds to a standby node that is backed up, and each primary child has a mirror child that corresponds to it. Currently, Greenplum only supports one master, one slave, and currently does not support one master, multiple slave.

Gp_segment_configuration (Catalog table of node metadata)

Greenplum uses the catalog table GP_segment_configuration to maintain information about all nodes, including master and standby nodes. This is also the most intuitive way for dbAs to know the state of the cluster.

Here is an example of a cluster with three nodes.

Fault recovery of primary-mirror – FTS

Fault Tolerance Service (FTS) is a Fault detection and recovery Service for child nodes provided by Greenplum. FTS is a subprocess affiliated to the master that periodically polls the state of each primary to obtain the state of each primary-mirror group. The ftsprobe process exists only on the master.

Note that FTS is not directly connected to the mirror, and FTS is also used to obtain the state of the mirror through the primary. Primary retrives the mirror survival and synchronization status from the state of the wal-sender process.

FTS triggers polling when the following three conditions are met

1. The time is up. Gp_fts_probe_interval

2. Manually run the select gp_request_fts_probe_SCAN () command.

3. A node exception is found during query

The polling process is shown as follows:

There are generally several states for each primary-mirror group.

1. The primary and mirror are normal

2. The primary is normal, but the mirror is abnormal

3. The primary is abnormal and the mirror is normal

4. The primary or mirror is abnormal

Greenplum is currently a master-slave architecture, so there is no way to resolve the fourth primary-mirror failure, in which case it can only be resolved by human intervention. There is no need to do anything in the first case. So let’s talk about two or three cases in detail.

Fault 1: The primary fails

This is the most common problem that high availability solves. Once FTS discovers that a primary is down, if the Mirror is synchronized, it promotes the corresponding Mirror to the primary and updates the catalog.

You can see the following update after Promote catalog.

Preferred_role indicates that the mirror is the primary, the preferred_role indicates that the primary is the mirror, and the status is d. Mode is also marked with n.

Fault 2: The mirror is disconnected

If the mirror hangs, it cannot synchronize with the mirror, so the primary hangs until FTS notifies the primary to Sync off the mirror. To turn off synchronous replication, set synchronous_standby_NAMES to null.

As you can see, the catalog table has changed, mode has become unsynchronized, and the mirror state is marked down.

Note: During FTS polling, if the Primary node finds that the Mirror is alive and the replication mode is asynchronous, synchronization is forcibly set to synchronous, that is, synchronous_standby_NAMES is updated to ‘*’.

FTS related GUC

Master fault recovery

FTS and realize automatic recovery of segments node failure, so how to recover Master? In Greenplum6, there is no automatic Master failover mechanism. If a Master node fails, it can only manually Promote a standby node by running GPActivatestAndy.

Meanwhile, on the Master node, the synchronous_standby_NAMES configuration remains empty so that requests are not blocked even if Standby fails or a network failure causes synchronization delays. There is no automatic notification for Master to disable synchronous replication.

Highly available o&M tools

  • gpactivatestandby

Gpactivatestandby activates the Standby to become master.

Such as:

gpactivatestandby -d /gpdata/standby

This command activates the standby running in the /gpdata/standby directory as master.

  • gpinitstandby

Gpinitstandby initializes a new standby based on the current master. A new standby is created as a backup of the master when the master is suspended and the standby is promoted to master.

Such as:

gpinitstandby -s myhost -S /gpdata/standby -P 2222

This command will generate a standby node running port 2222 in the/gpData /standby directory on the myhost machine.

You can also run the following command to restart the standby node after it breaks down

gpinitstandby -n

  • gprecoverseg

The gprecoverseg tool can restore a mirror that has gone down.

You can completely rebuild the mirror that has gone down by executing gprecoverseg -f.