Prepare a backup using the pgBackRest tool on an external volume mounted to the machine

Piotr Bracha
6 min readJun 17, 2024

Today, effectively managing data and ensuring its security is critical for any organization. Backups are the foundation of any data protection strategy, offering various levels of protection and efficiency. In the context of PostgreSQL databases, the pgBackRest tool stands out as a powerful and comprehensive backup and management solution.

pgBackRest provides advanced features such as compression, encryption, replication and retention management that are essential for maintaining data integrity and availability. One of the most efficient ways to store backups is to use external volumes mounted to the machine, which allows for flexible disk space management and makes it easy to move data between different environments.

In this article, I will take you step-by-step through the process of configuring pgBackRest to create full and differential backups on an external volume created in DigitalOcean. I will discuss the configuration of the tool, the process of mounting an external volume, and best practices related to backup management. With this guide, you will be able to effectively protect your data and ensure the continuity of your PostgreSQL database, regardless of any failures or other unpredictable events.

Copy space, that is an external volume

The default created volume, which in the panel is attached to a given instance of the machine, also automatically mounts itself in the file system. The volumes, thanks to the systemd and udev configuration, are automatically mounted when the machine is rebooted and you don’t need to modify /etc/fstab for this purpose. Additional information is available in the documentation.

Implementing the pgBackRest tool and performing backups

  1. Installation of pgBackRest:
apt install pgbackrest

2. Configuration in the /etc/pgbackrest.conf file:

[global]
process-max=2
repo1-block=y
repo1-bundle=y
repo1-path=/mnt/volume_jonas_test_purposes/pg_backup
repo1-retention-full=1
repo1-retention-diff=2
#archive-timeout=120
start-fast=y

[main]
pg1-path=/var/lib/postgresql/12/main

#
[global:archive-push]
#compress-level=3

process-max — the number of parallel processes creating a backup. The main purpose — to speed up the creation of backups. Tune to number of cpu cores.
repo1-block — backup data is stored as a series of data blocks instead of full files. This allows more efficient data management, including deduplication and restoration of only changed data blocks.
repo1-bundle — backup files are grouped and packaged together. This can improve the performance of I/O operations, especially for file systems that handle fewer, larger files better than a large number of small files.
repo1-path — the path to the directory where the backups will be stored, requires permissions 750 and postgres:postgres on the pg_backup directory
repo1-retention-full — storing of a given number of full backups
repo1-retention-diff — storing a given number of differential backups
start-fast — does not wait for checkpoint creation but forces it and starts the backup. Crucial in production environments where minimizing backup execution time is critical to maintaining high system availability.
pg1-path — path to the postgres data directory
compress-level — archive compression level
archive-timeout — set maximum time, in seconds, to wait for each WAL segment to reach the pgBackRest archive repository. Default is 60s. Modify only when you see timeout in log file related with wal archiving.

3. Configuration in /etc/postgresql/12/main/postgresql.conf:

archive_command = 'pgbackrest --stanza=main archive-push %p'
archive_mode = on
max_wal_senders = 3
wal_level = replica
max_wal_size = 4GB # default 1GB
min_wal_size = 1GB # default 80MB

archive_command — command executed by PostgreSQL to archive WAL. In this example “%p” is a special token used in PostgreSQL that denotes the full path to the Write-Ahead Log (WAL) file to be archived.
archive_mode — enable archive mode.
max_wal_senders — number of maximum WAL sending processes.
wal_level — the level of stored WAL data, i.e minimum, logical, replica.
max_wal_size and min_wal_size — maximum and minimum size of WAL segments.

4. Cluster restart:

systemctl restart postgresql@12-main

5. Initialize “stanza” from the postgres user and checking the configuration:

pgbackrest --stanza=main --log-level-console=info stanza-create
pgbackrest --stanza=main check

6. Create backups — use proper combination (in my case I chose full and differential):

pgbackrest --stanza=main --type=full backup --log-level-console=info # full backup
pgbackrest --stanza=main --type=diff backup --log-level-console=info # differential backup
pgbackrest --stanza=main --type=incr backup --log-level-console=info # incremental backup

7. Checking the status of backups:

pgbackrest --stanza=main info

8. Create a crontab entry in the postgres user account:

# Full backup on Sunday at midnight with logging
0 0 * * 0 pgbackrest --stanza=main --type=full backup > /var/log/pgbackrest/backup_full.log 2>&1

# Differential backup Monday to Saturday at midnight with logging
0 0 * * 1-6 pgbackrest --stanza=main --type=diff backup > /var/log/pgbackrest/backup_diff.log 2>&1

9. Create backup_diff and backup_full files with appropriate permissions:

touch /var/log/pgbackrest/backup_full.log
touch /var/log/pgbackrest/backup_diff.log

chmod 640 /var/log/pgbackrest/backup_diff.log
chmod 640 /var/log/pgbackrest/backup_full.log

chown postgres:postgres /var/log/pgbackrest/backup_diff.log
chown postgres:postgres /var/log/pgbackrest/backup_full.log

Restoring from backup

Here we have to stop for a moment, because there are three approaches related to data recovery. The first takes into consideration deleting data from the /var/lib/postgresql/12/main directory, and the second does not. Third is Point-in-Time Recovery or PITR. What is the difference and which option should you choose?

The first, classic way requires an empty cluster directory before running the recovery option.

The second way is to use the “delta” parameter, which does not require an empty cluster directory. It involves restoring only those files that have been changed. The sha-1 hash function for each file in the cluster is calculated, and if it differs from that for a particular file in the backup then such a file is restored. This is clearly written in the documentation in section 10.2 “Delta option”.

The third is a Point in Time Recovery. PITR allows the WAL to be played from a backup to a specified lsn, time, transaction id, or recovery point. For common recovery scenarios time-based recovery is arguably the most useful. A typical recovery scenario is to restore a table that was accidentally dropped or data that was accidentally deleted.

In addition, it is worth using the option process-max=X, where you can try to substitute the number of CPU cores for X. At the time of recovery from the backup, the postgres cluster is disabled, so the machine has additional resources. Besides, we want to restore it to full capacity as soon as possible.

First way: Delete data from the /var/lib/postgresql/12/main directory

  1. Stop the cluster:
systemctl stop postgresql@12-main

2. Delete the contents from the data directory:

find /var/lib/postgresql/12/main -mindepth 1 -delete

3. Run the command to recover the cluster:

pgbackrest --stanza=main restore --log-level-console=info

4. Start the cluster:

systemctl start postgresql@12-main

5. Cluster status verification:

pg_lsclusters
systemctl status postgresql@12-main

Second way: Using the “delta” parameter

  1. Stop the cluster:
systemctl stop postgresql@12-main

2. Run the recovery command with the “delta” parameter:

pgbackrest --stanza=main restore --log-level-console=info --delta

3. Start the cluster:

systemctl start postgresql@12-main

4. Cluster status verification:

pg_lsclusters
systemctl status postgresql@12-main

Third way: PITR

1. Check in the postgresql logs what is the time of occurrence of the failure. The more accurate the time obtained the better.

2. Stop the cluster:

systemctl stop postgresql@12-main

3. Run the command to recover the data:

pgbackrest --stanza=main --delta --type=time "--target=2024-06-17 16:13:29" --target-action=promote restore --log-level-console=info

type — specifies the type of restore. “time” means that the restore will be performed to a specific point in time (timestamp).
target — specifies the exact point in time to which the database is to be restored. Used format from postgresql log file.
target-action — defines the action to be taken when the restore point is reached. “promote” means that after restoring to a certain point in time, the database will be promoted to the primary role. This completes the restoration process and makes the database ready for use.

4. Start the cluster:

systemctl start postgresql@12-main

5. Cluster status verification:

pg_lsclusters
systemctl status postgresql@12-main

Which option to use?

Classically, “it depends.” You should certainly pay attention to several issues:

  1. Multiplicity of data changes

If the data directory contains most of the current data and only a small number of files have been changed or corrupted, using “delta” will allow you to restore the copy faster, as only the changed files will be restored.

2. Size of the database

A large database, such as several hundred GB, could make it take too long to delete and fully restore.

3. Downtime

If downtime must be minimized “delta” can reduce the time needed to restore data.

On the other hand, each case should be considered individually and tested which option is better.

Summary

From the article, you could learn how to configure and use the pgbackrest tool to be able to take care of automated efficient database backup and restore from it.

More definitely interesting/useful information in the documentation → https://pgbackrest.org/user-guide.html

List of available commands of the pgBackRest tool → https://pgbackrest.org/command.html

Configuration information in /etc/pgbackrest.conf → https://pgbackrest.org/configuration.html

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response