pgbackrest after a network outage unable to perform backup [fails always]

Started by KK CHNabout 1 month ago3 messagesgeneral

kkchn.in@gmail.com

about 1 month ago

list,

After a n/w link outage my pgbackrest to a remote repo server down for a
few days. Once the link is established, my pgbackrest always fails for
diff, full backups it starts then fails with error "unable to archive
before 600000ms timeout. "

I have copied the already existing archive to a safe location (another
folder )on the reposerver, Then I stopped the stanza from the reposerver,
and done a stanza-delete --force on the reposerver.

Then I recreated the stanza again with the same stanza name and did the
info check command, but it also fails with the 60000ms time out.

I am checking the Repo-archive-push-async.log it says

[root@db1 ~]# tail -f /var/log/pgbackrest/TM_Repo-archive-push-async.log
2026-02-24 12:29:37.826 P00 WARN: local-2 process terminated unexpectedly
on signal 11
2026-02-24 12:29:37.827 P00 WARN: unable to wait on child process: [10]
No child processes
2026-02-24 12:29:37.827 P00 WARN: unable to wait on child process: [10]
No child processes
2026-02-24 12:29:37.827 P00 WARN: local-4 process terminated unexpectedly
on signal 6
2026-02-24 12:29:37.827 P00 WARN: local-5 process terminated unexpectedly
on signal 11
2026-02-24 12:29:37.827 P00 WARN: local-6 process terminated unexpectedly
on signal 11

-------------------PROCESS START-------------------
2026-02-24 12:43:59.302 P00 INFO: archive-push:async command begin
2.52.1: [/data/postgres/data/pg_wal] --archive-async --compress-type=zst
--exec-id=2537881-b2a35ac0 --log-level-console=off --log-level-stderr=off
--pg1-path= /data/postgres/data --pg-version-force=16 --process-max=6
--repo1-host=10.25.0.202 --repo1-host-user=pgbackrest
--spool-path=/var/spool/pgbackrest --stanza=TM_Repo
2026-02-24 12:43:59.325 P00 INFO: push 10141 WAL file(s) to archive:
0000000100000BD9000000F9...0000000100000C0100000097

This goes for hours now, not yet finished. Is this normal behaviour ? [
My bandwidth is limited btw DBServer and repo server is only 20Mbps )

How can I overcome this copying of all the old piled up WAL files to the
reposerver (becoz it takes long hours, maybe a day / two ? by the time the
new transactional WALs grew ?) .

My goal is to initiate a full backup afresh on the reposerver , so it
doesn't matter all the old piled up WAL files to async to my repo server
right [ I know I am going to lose the database transaction consistency by
this act. any other way ? ]

But before a full backup when I do the info check
$ sudo -u pgbackrest pgbackrest --stanza=TM_Repo --log-level-console=info
check
it does not succeed, always fails with 60000 ms timeout error[82] ..

Any hints to solve this much appreciated ..

Thank you,
Krishane

More info below.. .

[root@db1 data]# cat /etc/pgbackrest/pgbackrest.conf
[TM_Repo]
pg1-path=/data/postgres/data
pg1-port=5444
pg1-user=postgres
pg-version-force=16
pg1-database=postgres

[global]
repo1-host=10.25.0.202
repo1-host-user=pgbackrest
archive-async=y
spool-path=/var/spool/pgbackrest
log-level-console=info
#log-level-file=debug
log-level-stderr=info
delta=y
compress-type=zst

[global:archive-get]
process-max= 4

[global:archive-push]
process-max= 6

[root@db1 data]#

------------

pgBackRest 2.52.1
OS RHEL 9.4
Postgres 16

Greg Sabino Mullane

greg@turnstep.com

about 1 month ago

In reply to: KK CHN (#1)

Re: pgbackrest after a network outage unable to perform backup [fails always]

On Tue, Feb 24, 2026 at 5:18 AM KK CHN <kkchn.in@gmail.com> wrote:

This goes for hours now, not yet finished. Is this normal behaviour ?

Yes, if there is a lot of WAL

My goal is to initiate a full backup afresh on the reposerver , so it

doesn't matter all the old piled up WAL files

You will need to (carefully!) disable pgbackrest archiving, clean up the
old WAL, then start it up again. Basic sequence:

1. Set archive_command to '/bin/true'
2. Kill any existing pgbackrest processes, empty out the spool directory
3. Wait for Postgres to cleanup / recycle the WAL (speed up with a manual
CHECKPOINT)
4. Restore your archive_command to the pgbackrest version
5. Run pgbackrest check to verify WALs are being archived again
6. Run a full backup

Ideally, test these steps on a dev system, and understand why each step and
why in that order. :)

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

KK CHN

kkchn.in@gmail.com

about 1 month ago

In reply to: Greg Sabino Mullane (#2)

Re: pgbackrest after a network outage unable to perform backup [fails always]

On Tue, Feb 24, 2026 at 9:20 PM Greg Sabino Mullane <htamfids@gmail.com>
wrote:

On Tue, Feb 24, 2026 at 5:18 AM KK CHN <kkchn.in@gmail.com> wrote:

This goes for hours now, not yet finished. Is this normal behaviour ?

Yes, if there is a lot of WAL

My goal is to initiate a full backup afresh on the reposerver , so it

doesn't matter all the old piled up WAL files

You will need to (carefully!) disable pgbackrest archiving, clean up the
old WAL, then start it up again. Basic sequence:

1. Set archive_command to '/bin/true'
2. Kill any existing pgbackrest processes, empty out the spool directory
3. Wait for Postgres to cleanup / recycle the WAL (speed up with a manual
CHECKPOINT)
4. Restore your archive_command to the pgbackrest version
5. Run pgbackrest check to verify WALs are being archived again
6. Run a full backup

Ideally, test these steps on a dev system, and understand why each step
and why in that order. :)

Thank you Greg .

Show quoted text

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support