pg_basebackup

Started by Matthias Apitzover 2 years ago3 messagesgeneral

guru@unixarea.de

over 2 years ago

Hello,

We're facing in a customer installation (PostgreSQL 13.1 on Linux) the
following problem for the first time and not reproducible:

The effective part of our backup script contains:
...
test -d ${BACKUPWAL}-${DATE}-${NUM}/ || mkdir -p ${BACKUPWAL}-${DATE}-${NUM}/

# kick to archive the current log; use a DB which will exist;
#
psql -U ${DBSUSER} -dpostgres -c "select pg_switch_wal();" > /dev/null

# backup the cluster
#
printf "%s: pg_basebackup the cluster to %s ... " "`date "+%d.%m.%Y-%H:%M:%S"`" ${BACKUPDIR}-${DATE}-${NUM}
${BINDIR}/pg_basebackup -U ${DBSUSER} -Ft -z -D ${BACKUPDIR}-${DATE}-${NUM}

...

The resulting stdout/stderr of the script:

16.11.2023-20:20:02: pg_basebackup the cluster to /Backup/postgres/sisis-20231116-1 ...
pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: child process exited with error 1

pg-error.log:

2023-11-16 20:34:13.538 CET [6250] LOG: terminating walsender process due to replication timeout

Why the PostgreSQL server says something about "replication", we do
pg_basebackup?

Some more information:

- wal_sender_timeout has default value (60s)
- backup target is a local file, not a network storage
- the Linux SLES 15 server is good equipped
- nothing is logged in /var/log/messages

Any ideas? Thanks.

matthias

--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

Laurenz Albe

laurenz.albe@cybertec.at

over 2 years ago

In reply to: Matthias Apitz (#1)

Re: pg_basebackup

On Mon, 2023-11-20 at 07:30 +0100, Matthias Apitz wrote:

We're facing in a customer installation (PostgreSQL 13.1 on Linux) the
following problem for the first time and not reproducible:

13.1? Your immediate reaction should be "update to the latest minor release".

${BINDIR}/pg_basebackup -U ${DBSUSER} -Ft -z -D ${BACKUPDIR}-${DATE}-${NUM}

The resulting stdout/stderr of the script:

16.11.2023-20:20:02: pg_basebackup the cluster to /Backup/postgres/sisis-20231116-1 ...
pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: child process exited with error 1

pg-error.log:

2023-11-16 20:34:13.538 CET [6250] LOG: terminating walsender process due to replication timeout

Why the PostgreSQL server says something about "replication", we do
pg_basebackup?

Because "pg_basebackup" uses a replication connection.

Some more information:

- wal_sender_timeout has default value (60s)

Increase "wal_sender_timeout", perhaps to 0 (which means "infinite").

Yours,
Laurenz Albe

Christoph Moench-Tegeder

cmt@burggraben.net

over 2 years ago

In reply to: Matthias Apitz (#1)

Re: pg_basebackup

## Matthias Apitz (guru@unixarea.de):

2023-11-16 20:34:13.538 CET [6250] LOG: terminating walsender process due to replication timeout

Besides "what Lauenz said" (especially about the horribly ooutdated
PostgreSQL version): check IO speed and saturation during backup
and make sure you're not stalling. I've seen this beaviour a few
times, mostly in conjunction with btrfs - using a suitably proven
filesystem usually solved the problem (overloaded hardware can
be a problem, too - but modern systems can take quite a bit more
than in the olden days of spinning rust).

Regards,
Christoph

--
Spare Space.