postgres replication without pg_basebackup? postgres 13.3

Started by Pilar de Teodoroover 3 years ago4 messagesgeneral
Jump to latest
#1Pilar de Teodoro
pilar.deteodoro@gmail.com

Dear all,

We have a very large database of 37TB and we had to promote our standby to
primary due to some disk failures. Now the issues are solved, we would like
to make standby the old primary from a copy of the new primary which is
already in place. Is it possible without using pg_basebackup? We have
already a copy created than can be started but if we open as standby
(standby.signal created) the log says when started:

[2022-10-28 21:27:12 CEST-]LOG: starting PostgreSQL 13.3 on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat
8.5.0-10), 64-bit

[2022-10-28 21:27:12 CEST-]LOG: listening on IPv4 address "0.0.0.0", port
8300

[2022-10-28 21:27:12 CEST-]LOG: could not create IPv6 socket for address
"::": Address family not supported by protocol

[2022-10-28 21:27:12 CEST-]LOG: listening on Unix socket
"/tmp/.s.PGSQL.8300"

[2022-10-28 21:27:12 CEST-]LOG: database system was interrupted; last
known up at 2022-10-28 20:31:08 CEST

[2022-10-28 21:28:06 CEST-]LOG: restored log file "00000003.history" from
archive

cp: cannot stat '/PostgresWalLogArchive/new/00000004.history': No such file
or directory

[2022-10-28 21:28:06 CEST-]LOG: entering standby mode

[2022-10-28 21:28:06 CEST-]LOG: restored log file "00000003.history" from
archive

cp: cannot stat '/PostgresWalLogArchive/new/000000030000BF720000002F': No
such file or directory

[2022-10-28 21:28:06 CEST-]LOG: restored log file
"000000020000BF720000002F" from archive

[2022-10-28 21:28:06 CEST-]FATAL: requested timeline 3 is not a child of
this server's history

*[2022-10-28 21:28:06 CEST-]DETAIL: Latest checkpoint is at BF72/2F0309C0
on timeline 2, but in the history of the requested timeline, the server
forked off from that timeline at B1D6/60000000.*

*[2022-10-28 21:28:06 CEST-]LOG: startup process (PID 1298266) exited with
exit code 1*

[2022-10-28 21:28:06 CEST-]LOG: aborting startup due to startup process
failure

[2022-10-28 21:28:07 CEST-]LOG: database system is shut down

Do you know if we can make it work without having to run pg_basebackup for
37TB which will take about 3 days to copy? Why it is requesting a different
timeline if it is a copy of the primary?

Primary replica parameters:

wal_level = replica

archive_mode = on

archive_command = 'cp -i %p /PostgresWalLogArchive/new/%f'

standby replica parameters:

primary_conninfo = 'host=XXX port=8300 user=postgres password=***'

restore_command = 'cp /PostgresWalLogArchive/new/%f %p'

archive_cleanup_command =
'/home/postgres/software/postgresql/bin/pg_archivecleanup
/PostgresWalLogArchive/new/%f %r'

Thank you very much for your help in advance.

Pilar de Teodoro

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Pilar de Teodoro (#1)
Re: postgres replication without pg_basebackup? postgres 13.3

On Mon, 2022-11-07 at 11:02 +0100, Pilar de Teodoro wrote:

We have a very large database of 37TB and we had to promote our standby to primary due to
some disk failures. Now the issues are solved, we would like to make standby  the old primary
from a copy of the new primary which is already in place.  Is it possible without using pg_basebackup?

That's exactly what "pg_rewind" is for. It is a fast version of "pg_basebackup" for exactly
that case.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#3Pilar de Teodoro
pilar.deteodoro@gmail.com
In reply to: Laurenz Albe (#2)
Re: postgres replication without pg_basebackup? postgres 13.3

Dear Laurenz,
Thank you very much for the idea.
We ran pg_rewind correctly:

[postgres@gacsdb05 data-13.3]$ pg_rewind -c -R
--target-pgdata=/PostgresDB/sas_hdd/data-13.3/ --source-server="host=XXXX
port=XXXX user=postgres password=XXX"

pg_rewind: source and target cluster are on the same timeline

pg_rewind: no rewind required

but then we got the following error:

[2022-11-07 22:57:55 CET-]LOG: starting PostgreSQL 13.3 on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat
8.5.0-10), 64-bit

[2022-11-07 22:57:55 CET-]LOG: listening on IPv4 address "0.0.0.0", port
XXXX

[2022-11-07 22:57:55 CET-]LOG: could not create IPv6 socket for address
"::": Address family not supported by protocol

[2022-11-07 22:57:55 CET-]LOG: listening on Unix socket
"/tmp/.s.PGSQL.XXXX"

[2022-11-07 22:57:55 CET-]LOG: database system was shut down in recovery
at 2022-11-07 22:57:47 CET

[2022-11-07 22:57:55 CET-]LOG: restored log file "00000003.history" from
archive

cp: cannot stat '/PostgresWalLogArchive/new/00000004.history': No such file
or directory

[2022-11-07 22:57:55 CET-]LOG: entering standby mode

[2022-11-07 22:57:55 CET-]LOG: restored log file "00000003.history" from
archive

[2022-11-07 22:57:55 CET-]LOG: invalid primary checkpoint record

[2022-11-07 22:57:55 CET-]PANIC: could not locate a valid checkpoint record

[2022-11-07 22:57:55 CET-]LOG: startup process (PID 3011860) was
terminated by signal 6: Aborted

[2022-11-07 22:57:55 CET-]LOG: aborting startup due to startup process
failure

[2022-11-07 22:57:55 CET-]LOG: database system is shut down

We have read we can run pg_resetwal but the Wal folder is the folder where
the Wals are archived in the primary. Would that be correct to reset them?

Any suggestion?

Thank you so much,

Pilar

El lun, 7 nov 2022 a las 12:21, Laurenz Albe (<laurenz.albe@cybertec.at>)
escribió:

Show quoted text

On Mon, 2022-11-07 at 11:02 +0100, Pilar de Teodoro wrote:

We have a very large database of 37TB and we had to promote our standby

to primary due to

some disk failures. Now the issues are solved, we would like to make

standby the old primary

from a copy of the new primary which is already in place. Is it

possible without using pg_basebackup?

That's exactly what "pg_rewind" is for. It is a fast version of
"pg_basebackup" for exactly
that case.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Pilar de Teodoro (#3)
Re: postgres replication without pg_basebackup? postgres 13.3

On Mon, 2022-11-07 at 23:11 +0100, Pilar de Teodoro wrote:

Thank you very much for the idea. [of running pg_rewind]
We ran pg_rewind correctly:
[postgres@gacsdb05 data-13.3]$ pg_rewind -c -R --target-pgdata=/PostgresDB/sas_hdd/data-13.3/  --source-server="host=XXXX port=XXXX user=postgres password=XXX" 
pg_rewind: source and target cluster are on the same timeline
pg_rewind: no rewind required

I cannot verify that you ran it correctly.

"Target" should be the old server with the extra transactions.
"Source" should be the promoted standby server. Promotion switches to a new timeline,
so it looks like you did something wrong.

but then we got the following error:

[2022-11-07 22:57:55 CET-]LOG:  starting PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
[2022-11-07 22:57:55 CET-]LOG:  listening on IPv4 address "0.0.0.0", port XXXX
[2022-11-07 22:57:55 CET-]LOG:  could not create IPv6 socket for address "::": Address family not supported by protocol
[2022-11-07 22:57:55 CET-]LOG:  listening on Unix socket "/tmp/.s.PGSQL.XXXX"
[2022-11-07 22:57:55 CET-]LOG:  database system was shut down in recovery at 2022-11-07 22:57:47 CET
[2022-11-07 22:57:55 CET-]LOG:  restored log file "00000003.history" from archive
cp: cannot stat '/PostgresWalLogArchive/new/00000004.history': No such file or directory
[2022-11-07 22:57:55 CET-]LOG:  entering standby mode
[2022-11-07 22:57:55 CET-]LOG:  restored log file "00000003.history" from archive
[2022-11-07 22:57:55 CET-]LOG:  invalid primary checkpoint record
[2022-11-07 22:57:55 CET-]PANIC:  could not locate a valid checkpoint record
[2022-11-07 22:57:55 CET-]LOG:  startup process (PID 3011860) was terminated by signal 6: Aborted
[2022-11-07 22:57:55 CET-]LOG:  aborting startup due to startup process failure
[2022-11-07 22:57:55 CET-]LOG:  database system is shut down

We have read we can run pg_resetwal but the Wal folder is the folder where the Wals are archived in the primary. Would that be correct to reset them?

Don't run "pg_resetwal". At the very least, it will break your standby.

At this point, your standby seems to be broken. I don't know what exactly you
did, but it leeks like you should run a "pg_basebackup" after all.

Yours,
Laurenz Albe