Trouble using pg_rewind to undo standby promotion

Started by Craig McIlweeover 1 year ago4 messagesgeneral

craigm@vt.edu

over 1 year ago

I have a primary -> standby 1 -> standby 2 setup with all servers running
PG 13.8 (this effort is part of getting on to a newer version, but I think
those details aren't relevant to this problem). The first standby uses
streaming replication from the primary and the second standby is using a
WAL archive with a restore_command. To make this standby chain work,
standby 1 is set to archive_mode = always with a command that populates the
WAL archive.

I would like to be able to promote standby 2 (hereon referred to just as
'standby'), perform some writes, then rewind it back to the point before
promotion so it can become a standby again. The documentation for
pg_rewind says that this is supported and it seems like it should be
straightforward, but I'm not having any luck getting this to work so I'm
hoping someone can point out what I'm doing wrong. Here's what I did:

First, observe that WAL is properly being applied from the archive. Note
that we are currently on timeline 1.

2024-11-06 09:51:23.286 EST [5438] LOG: restored log file
"0000000100007711000000F9" from archive
2024-11-06 09:51:23.434 EST [5438] LOG: restored log file
"0000000100007711000000FA" from archive
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0000000100007711000000FB.gz: No such file or
directory
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/00000002.history.gz: No such file or directory

Next, stop postgres, set wal_log_hints = on as required by pg_rewind, and
restart postgres. I also make a copy of the data directory while the
postgres is not running so I can repeat my test, which works fine on a
small test database but won't be possible for the multi TB database that I
will eventually be doing this on.

Now promote the standby using "select pg_promote()" and see that it
switches to a new timeline. You can also see that the last WAL applied
from the archive is 7718/BF.

2024-11-06 12:10:10.831 EST [4336] LOG: restored log file
"0000000100007718000000BD" from archive
2024-11-06 12:10:10.996 EST [4336] LOG: restored log file
"0000000100007718000000BE" from archive
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0000000100007718000000BF.gz: No such file or
directory
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/00000002.history.gz: No such file or directory
2024-11-06 12:10:15.384 EST [4336] LOG: restored log file
"0000000100007718000000BF" from archive
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0000000100007718000000C0.gz: No such file or
directory
2024-11-06 12:10:15.457 EST [4336] LOG: received promote request
2024-11-06 12:10:15.457 EST [4336] LOG: redo done at 7718/BFFFFF30
2024-11-06 12:10:15.457 EST [4336] LOG: last completed transaction was at
log time 2024-11-06 12:10:22.627074-05
2024-11-06 12:10:15.593 EST [4336] LOG: restored log file
"0000000100007718000000BF" from archive
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/00000002.history.gz: No such file or directory
2024-11-06 12:10:15.611 EST [4336] LOG: selected new timeline ID: 2
2024-11-06 12:10:15.640 EST [4336] LOG: archive recovery complete
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/00000001.history.gz: No such file or directory
2024-11-06 12:10:17.028 EST [4329] LOG: database system is ready to accept
connections

Next, insert a record into just to make some changes that I want to
rollback later. (What I will eventually be doing is creating a publication
to ship data to a newer version, but again, that's not what's important
here.)

Finally, shutdown postgres and attempt a rewind. The address used in the
--source-server connection string is the address of the primary.

2024-11-06 12:11:11.139 EST [4329] LOG: received fast shutdown request
2024-11-06 12:11:11.143 EST [4329] LOG: aborting any active transactions
2024-11-06 12:11:11.144 EST [4329] LOG: background worker "logical
replication launcher" (PID 5923) exited with exit code 1
2024-11-06 12:11:40.933 EST [4342] LOG: shutting down
2024-11-06 12:11:41.753 EST [4329] LOG: database system is shut down

/usr/pgsql-13/bin/pg_rewind --target-pgdata=/data/pgsql/operational
--source-server="host=x.x.x.x dbname=postgres user=xxx password=xxx"
--dry-run --progress --restore-target-wal

pg_rewind: connected to server
pg_rewind: servers diverged at WAL location 7718/BFFFFFE8 on timeline 1
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0000000200007718000000BF.gz: No such file or
directory
pg_rewind: error: could not restore file "0000000200007718000000BF" from
archive
pg_rewind: fatal: could not find previous WAL record at 7718/BFFFFFE8

pg_rewind shows the point of divergence as 7718/BF on timeline 1, but when
it tries to replay WAL using the restore command it is trying to find WAL
from timeline 2 rather than picking back up on timeline 1. I tried
setting recovery_target_timeline on the target database to 'current' and
'1' but that gave the same result. Searching the archives, [1]/messages/by-id/e7b16ddea93a92575cb6d143b6ef602cab22432e.camel@cybertec.at mentions the
need to force a checkpoint after promotion which I tried even though the
problem description isn't the same. [2]/messages/by-id/CALp3DH1fLZmPvkOteAbUo4TOLZP-LstKOs6Gcw3Bm7acmJqk=w@mail.gmail.com mentions a problem that looks more
like the one I am facing but has no responses. At this point I don't know
what to do next and hope someone can point me in the right direction.

[1]: /messages/by-id/e7b16ddea93a92575cb6d143b6ef602cab22432e.camel@cybertec.at
/messages/by-id/e7b16ddea93a92575cb6d143b6ef602cab22432e.camel@cybertec.at
[2]: /messages/by-id/CALp3DH1fLZmPvkOteAbUo4TOLZP-LstKOs6Gcw3Bm7acmJqk=w@mail.gmail.com
/messages/by-id/CALp3DH1fLZmPvkOteAbUo4TOLZP-LstKOs6Gcw3Bm7acmJqk=w@mail.gmail.com

Craig

Torsten Förtsch

tfoertsch123@gmail.com

over 1 year ago

In reply to: Craig McIlwee (#1)

Re: Trouble using pg_rewind to undo standby promotion

Your point of divergence is in the middle of the 7718/000000BF file. So,
you should have 2 such files eventually, one on timeline 1 and the other on
timeline 2.

Are you archiving WAL on the promoted machine in a way that your
restore_command can find it? Check archive_command and archive_mode on the
promoted machine.

Also, do your archive/restore scripts work properly for history files?

On Wed, Nov 6, 2024 at 7:48 PM Craig McIlwee <craigm@vt.edu> wrote:

Show quoted text

I have a primary -> standby 1 -> standby 2 setup with all servers running
PG 13.8 (this effort is part of getting on to a newer version, but I think
those details aren't relevant to this problem). The first standby uses
streaming replication from the primary and the second standby is using a
WAL archive with a restore_command. To make this standby chain work,
standby 1 is set to archive_mode = always with a command that populates the
WAL archive.

I would like to be able to promote standby 2 (hereon referred to just as
'standby'), perform some writes, then rewind it back to the point before
promotion so it can become a standby again. The documentation for
pg_rewind says that this is supported and it seems like it should be
straightforward, but I'm not having any luck getting this to work so I'm
hoping someone can point out what I'm doing wrong. Here's what I did:

First, observe that WAL is properly being applied from the archive. Note
that we are currently on timeline 1.

2024-11-06 09:51:23.286 EST [5438] LOG: restored log file
"0000000100007711000000F9" from archive
2024-11-06 09:51:23.434 EST [5438] LOG: restored log file
"0000000100007711000000FA" from archive
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0000000100007711000000FB.gz: No such file or
directory
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/00000002.history.gz: No such file or directory

Next, stop postgres, set wal_log_hints = on as required by pg_rewind, and
restart postgres. I also make a copy of the data directory while the
postgres is not running so I can repeat my test, which works fine on a
small test database but won't be possible for the multi TB database that I
will eventually be doing this on.

Now promote the standby using "select pg_promote()" and see that it
switches to a new timeline. You can also see that the last WAL applied
from the archive is 7718/BF.

2024-11-06 12:10:10.831 EST [4336] LOG: restored log file
"0000000100007718000000BD" from archive
2024-11-06 12:10:10.996 EST [4336] LOG: restored log file
"0000000100007718000000BE" from archive
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0000000100007718000000BF.gz: No such file or
directory
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/00000002.history.gz: No such file or directory
2024-11-06 12:10:15.384 EST [4336] LOG: restored log file
"0000000100007718000000BF" from archive
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0000000100007718000000C0.gz: No such file or
directory
2024-11-06 12:10:15.457 EST [4336] LOG: received promote request
2024-11-06 12:10:15.457 EST [4336] LOG: redo done at 7718/BFFFFF30
2024-11-06 12:10:15.457 EST [4336] LOG: last completed transaction was at
log time 2024-11-06 12:10:22.627074-05
2024-11-06 12:10:15.593 EST [4336] LOG: restored log file
"0000000100007718000000BF" from archive
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/00000002.history.gz: No such file or directory
2024-11-06 12:10:15.611 EST [4336] LOG: selected new timeline ID: 2
2024-11-06 12:10:15.640 EST [4336] LOG: archive recovery complete
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/00000001.history.gz: No such file or directory
2024-11-06 12:10:17.028 EST [4329] LOG: database system is ready to
accept connections

Next, insert a record into just to make some changes that I want to
rollback later. (What I will eventually be doing is creating a publication
to ship data to a newer version, but again, that's not what's important
here.)

Finally, shutdown postgres and attempt a rewind. The address used in the
--source-server connection string is the address of the primary.

2024-11-06 12:11:11.139 EST [4329] LOG: received fast shutdown request
2024-11-06 12:11:11.143 EST [4329] LOG: aborting any active transactions
2024-11-06 12:11:11.144 EST [4329] LOG: background worker "logical
replication launcher" (PID 5923) exited with exit code 1
2024-11-06 12:11:40.933 EST [4342] LOG: shutting down
2024-11-06 12:11:41.753 EST [4329] LOG: database system is shut down

/usr/pgsql-13/bin/pg_rewind --target-pgdata=/data/pgsql/operational
--source-server="host=x.x.x.x dbname=postgres user=xxx password=xxx"
--dry-run --progress --restore-target-wal

pg_rewind: connected to server
pg_rewind: servers diverged at WAL location 7718/BFFFFFE8 on timeline 1
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0000000200007718000000BF.gz: No such file or
directory
pg_rewind: error: could not restore file "0000000200007718000000BF" from
archive
pg_rewind: fatal: could not find previous WAL record at 7718/BFFFFFE8

pg_rewind shows the point of divergence as 7718/BF on timeline 1, but when
it tries to replay WAL using the restore command it is trying to find WAL
from timeline 2 rather than picking back up on timeline 1. I tried
setting recovery_target_timeline on the target database to 'current' and
'1' but that gave the same result. Searching the archives, [1] mentions the
need to force a checkpoint after promotion which I tried even though the
problem description isn't the same. [2] mentions a problem that looks more
like the one I am facing but has no responses. At this point I don't know
what to do next and hope someone can point me in the right direction.

[1]
/messages/by-id/e7b16ddea93a92575cb6d143b6ef602cab22432e.camel@cybertec.at
[2]
/messages/by-id/CALp3DH1fLZmPvkOteAbUo4TOLZP-LstKOs6Gcw3Bm7acmJqk=w@mail.gmail.com

Craig

Craig McIlwee

craigm@vt.edu

over 1 year ago

In reply to: Torsten Förtsch (#2)

Re: Trouble using pg_rewind to undo standby promotion

On Thu, Nov 7, 2024 at 4:47 AM Torsten Förtsch <tfoertsch123@gmail.com>
wrote:

Your point of divergence is in the middle of the 7718/000000BF file. So,
you should have 2 such files eventually, one on timeline 1 and the other on
timeline 2.

Are you archiving WAL on the promoted machine in a way that your
restore_command can find it? Check archive_command and archive_mode on the
promoted machine.

No, the promoted machine is not archiving. How should that work? Is it OK
for a log shipping standby that uses restore_command to also push to the
same directory with an archive_command or would that cause issues of trying
to read and write the same file simultaneously during WAL replay? Or
should I be setting up an archive_command that pushes to a separate
directory and have a restore_command that knows to check both locations?

Hmm, as I write that out, I realize that I could use archive_mode = on
instead of archive_mode = always to avoid the potential for read/write
conflicts during WAL replay. I can try this later and report back.

Also, do your archive/restore scripts work properly for history files?

The scripts don't do anything special with history files. They are based
on the continuous archive docs [1]https://www.postgresql.org/docs/13/continuous-archiving.html#COMPRESSED-ARCHIVE-LOGS and this [2]https://www.rockdata.net/tutorial/admin-archive-command/#compressing-and-archiving article the with slight
modification to include a throttled scp since the log shipping server is
located in a different data center from the promoted standby and there is
limited bandwidth between the two. (Also note that the archive script from
[2]: https://www.rockdata.net/tutorial/admin-archive-command/#compressing-and-archiving
article will use the exit code of the rm command so postgres won't be
informed the file transfer fails resulting in missing WAL in the archive.)

Archive script:
---
#!/bin/bash

# $1 = %p
# $2 = %f

limit=10240 # 10Mbps

gzip < /var/lib/pgsql/13/data/$1 > /tmp/archive/$2.gz

scp -l $limit /tmp/archive/$2.gz postgres@x.x.x.x
:/data/wal_archive/operational/$2.gz
exit_code=$?

rm /tmp/archive/$2.gz

exit $exit_code
---

Restore script:
---
gunzip < /data/wal_archive/operational/$2.gz > $1
---

[1]: https://www.postgresql.org/docs/13/continuous-archiving.html#COMPRESSED-ARCHIVE-LOGS
https://www.postgresql.org/docs/13/continuous-archiving.html#COMPRESSED-ARCHIVE-LOGS
[2]: https://www.rockdata.net/tutorial/admin-archive-command/#compressing-and-archiving
https://www.rockdata.net/tutorial/admin-archive-command/#compressing-and-archiving

Craig

Show quoted text

Craig McIlwee

craigm@vt.edu

over 1 year ago

In reply to: Craig McIlwee (#3)

Re: Trouble using pg_rewind to undo standby promotion

Are you archiving WAL on the promoted machine in a way that your
restore_command can find it? Check archive_command and archive_mode on the
promoted machine.

No, the promoted machine is not archiving. How should that work? Is it
OK for a log shipping standby that uses restore_command to also push to the
same directory with an archive_command or would that cause issues of trying
to read and write the same file simultaneously during WAL replay? Or
should I be setting up an archive_command that pushes to a separate
directory and have a restore_command that knows to check both locations?

Hmm, as I write that out, I realize that I could use archive_mode = on
instead of archive_mode = always to avoid the potential for read/write
conflicts during WAL replay. I can try this later and report back.

Setting archive_mode = on and a restore_command that reads from the WAL
archive did the trick. With those changes in place, I was able to
successfully run pg_rewind and get the promoted standby back onto timeline
1. Thanks for the tips.

Craig