Skip to main content
Background Image
  1. PostgreSQL Posts/

Warm Standby: Using pg_receivewal

·766 words·4 mins· ·
Ruohang Feng
Author
Ruohang Feng
Pigsty Founder, @Vonng
Table of Contents

Author: Vonng (@Vonng)

Backup is the foundation of a DBA’s livelihood and one of the most critical tasks in database management. There are various types of backups, but the backups discussed here are all physical backups. Physical backups can usually be divided into the following four types:

  • Hot Standby: Identical to the primary database. When the primary fails, it takes over the primary’s work and also handles online read-only traffic.
  • Warm Standby: Similar to hot standby but doesn’t handle online traffic. Database clusters usually need a delayed standby to quickly recover when errors occur (such as accidental data deletion). In this case, because the delayed standby is inconsistent with the primary, it can’t serve online queries.
  • Cold Backup: The cold backup database exists as static files of the data directory, essentially a binary backup of the database directory. Easy to create, simple to manage, convenient for placing in other AZs for disaster recovery. It’s the ultimate insurance for databases.
  • Remote Standby: The so-called multi-site, multi-center usually refers to hot standby instances placed in other AZs.

Usually when we talk about backups, we mean cold and warm backups. Their important difference from hot standby is: they’re usually not the latest. When serving online queries, this lag is a deficiency, but for failure recovery, this is a very important feature. Synchronized standby isn’t sufficient to handle all problems. Imagine this scenario: some human error or software bug deletes an entire table or database - such changes would immediately apply to synchronous standby. This situation can only be recovered by querying from delayed warm standby or replaying logs from cold backup. Therefore, cold/warm backups are necessary regardless of whether you have standby servers.

Reference: PostgreSQL Replication Solutions

Warm Standby Solutions
#

I usually recommend using delayed log transport standby for warm backups to quickly respond to failures, and using remote cloud storage cold backups for disaster recovery.

Warm standby solutions have some significant advantages:

  • Reliable: Warm standby actually performs continuous “recovery testing” during operation. So as long as warm standby works normally without errors, you can always trust it as a usable backup. Cold backups may not be so reliable. Additionally, using synchronous commit pg_receivewal with log transport offline instances can both reduce the risk of primary failure due to single synchronous standby failures and eliminate the risk of standby activities affecting the primary.
  • Simple Management: Warm standby management is basically similar to regular standby, so if you already have master-standby configuration, deploying warm standby is simple. Additionally, the tools used are all officially provided by PostgreSQL: pg_basebackup and pg_receivewal. The warm standby delay window can be easily adjusted through parameters.
  • Quick Response: Failures occurring within the delayed standby’s delay window (database deletion) can be quickly recovered: query from the delayed standby and pour back into the primary, or directly advance the delayed standby to a specific time point and promote it as the new primary. Additionally, using warm standby means you don’t need to pull full backups from the primary daily or weekly, saving bandwidth and executing faster.

Process Overview
#

Log Archiving
#

How to archive WAL logs generated by the primary is traditionally implemented by configuring archive_command on the primary. However, recent PostgreSQL versions provide a quite practical tool: pg_receivewal (called pg_receivexlog in versions before 10). To the primary, this client application looks like a standby, and the primary continuously sends the latest WAL logs while pg_receivewal writes them to a local directory. A significant advantage of this approach over archive_command is that pg_receivewal doesn’t wait until PostgreSQL fills a complete WAL segment before archiving, so it can achieve zero data loss on failure with synchronous commit.

pg_receivewal is also very simple to use:

# create a replication slot named walarchiver
pg_receivewal --slot=walarchiver --create-slot --if-not-exists

# add replicator credential to /home/postgres/.pgpass 0600
# start archiving (with proper supervisor/init scripts)
pg_receivewal \
  -D /pg/arcwal \
  --slot=walarchiver \
  --compress=9\
  -d'postgres://replicator@master.csq.tsa.md/postgres'

Of course, in actual production environments, for more robust archiving, we usually register it as a service and save some command status. Here’s a pg_receivewal command wrapper used in production: walarchiver

Related Scripts#

Here’s a script for initializing PostgreSQL Offline Instance for reference:

pg/test/bin/offline.sh

Backup Testing
#

How to be confident when facing failures? As long as backups exist, even the biggest problems can be recovered. But how to ensure your backup solution is truly effective requires thorough testing beforehand.

Let’s imagine some failure scenarios and how to respond to these failures under this solution:

  • pg_receive process termination
  • Offline node restart
  • Primary node restart
  • Clean failover
  • Split-brain failover
  • Accidental table deletion
  • Accidental database deletion

To be continued

Related

Backup and Recovery Methods Overview
·4009 words·19 mins
Backup is the foundation of a DBA’s livelihood. With backups, there’s no need to panic.
PgBackRest2 Documentation
·3129 words·15 mins
PgBackRest is a set of PostgreSQL backup tools written in Perl
Relation Bloat Monitoring and Management
·3182 words·15 mins
PostgreSQL uses MVCC as its primary concurrency control technology. While it has many benefits, it also brings other effects, such as relation bloat.
PgAdmin Installation and Configuration
·220 words·2 mins
PgAdmin is a GUI program for managing PostgreSQL, written in Python, but it’s quite dated and requires some additional configuration.
Incident Report: Uneven Load Avalanche
·1342 words·7 mins
Recently there was a perplexing incident where a database had half its data volume and load migrated away, but ended up being overwhelmed due to increased load.
Bash and psql Tips
·1819 words·9 mins
Some tips for interacting between PostgreSQL and Bash.