BUG #19432: recovery fails at invalid checkpoint record
The following bug has been logged on the website:
Bug reference: 19432
Logged by: Felix Hamme
Email address: felix.hamme@ionos.com
PostgreSQL version: 17.9
Operating system: Debian 13
Description:
Hi, I'm trying to restore from a pg_basebackup at timeline 1 to a
restore_target_time in timeline 2.
It fails at "invalid checkpoint record", "could not locate required
checkpoint record at 0/3000080".
All relevant wal files are in the archive, the restore_command works and
backup_label, pg_controldata, pg_waldump and 00000002.history look like
everything should work.
Recovering only timeline 1 works, but it fails as soon as it should proceed
in timeline 2.
A 6.7MB tar of the basebackup and the wal archive is available at
https://get.hidrive.com/i/PwMejRQG . This link expires on 2026-03-13, I can
provide a new link if needed.
Why does this recovery fail?
On Thu, 2026-03-12 at 16:20 +0000, PG Bug reporting form wrote:
Hi, I'm trying to restore from a pg_basebackup at timeline 1 to a
restore_target_time in timeline 2.
It fails at "invalid checkpoint record", "could not locate required
checkpoint record at 0/3000080".
All relevant wal files are in the archive, the restore_command works and
backup_label, pg_controldata, pg_waldump and 00000002.history look like
everything should work.
Recovering only timeline 1 works, but it fails as soon as it should proceed
in timeline 2.
A 6.7MB tar of the basebackup and the wal archive is available at
https://get.hidrive.com/i/PwMejRQG . This link expires on 2026-03-13, I can
provide a new link if needed.
Why does this recovery fail?
Funny. I unpacked your data directory and reduced your postgresql.auto.conf
to something that fits my system:
log_min_messages = 'DEBUG5'
restore_command = 'cp /home/laurenz/hamme/fakearchive/%f %p'
recovery_target_time = '2026-03-11 14:51:28 UTC'
recovery_target_action = 'promote'
hot_standby_feedback = 'on'
log_destination = 'csvlog'
log_directory = '/home/laurenz/hamme/log'
logging_collector = 'on'
wal_level = 'logical'
port = 5433
unix_socket_directories = '/home/laurenz/hamme'
max_connections = 300
Recovery worked like a charm. pg_waldump shows the checkpoint record in
000000010000000000000003 at the correct position.
Not sure what you did wrong.
Yours,
Laurenz Albe
Thank you for checking, now I found what I did wrong: "mv" doesn't
work as a restore_command because the same .history file is restored
multiple times during recovery.
I successfully recovered using a restore_command which does a cp for
history files and mv for wal files. It logged this:
cp /DBDATA/test/fakearchive/00000002.history pg_wal/RECOVERYHISTORY success
cp /DBDATA/test/fakearchive/00000003.history pg_wal/RECOVERYHISTORY not found
cp /DBDATA/test/fakearchive/00000002.history pg_wal/RECOVERYHISTORY success
mv /DBDATA/test/fakearchive/000000010000000000000003 pg_wal/RECOVERYXLOG success
mv /DBDATA/test/fakearchive/000000010000000000000004 pg_wal/RECOVERYXLOG success
mv /DBDATA/test/fakearchive/000000020000000000000005 pg_wal/RECOVERYXLOG success
mv /DBDATA/test/fakearchive/000000020000000000000006 pg_wal/RECOVERYXLOG success
mv /DBDATA/test/fakearchive/000000020000000000000007 pg_wal/RECOVERYXLOG success
cp /DBDATA/test/fakearchive/00000003.history pg_wal/RECOVERYHISTORY not found
cp /DBDATA/test/fakearchive/00000002.history pg_wal/RECOVERYHISTORY success
Is it safe in general to use mv for wal files? In other words, do the
currently supported postgres versions run restore_command only once
per wal file?
Best regards
Felix Hamme
Show quoted text
On Thu, Mar 12, 2026 at 8:29 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2026-03-12 at 16:20 +0000, PG Bug reporting form wrote:
Hi, I'm trying to restore from a pg_basebackup at timeline 1 to a
restore_target_time in timeline 2.
It fails at "invalid checkpoint record", "could not locate required
checkpoint record at 0/3000080".
All relevant wal files are in the archive, the restore_command works and
backup_label, pg_controldata, pg_waldump and 00000002.history look like
everything should work.
Recovering only timeline 1 works, but it fails as soon as it should proceed
in timeline 2.
A 6.7MB tar of the basebackup and the wal archive is available at
https://get.hidrive.com/i/PwMejRQG . This link expires on 2026-03-13, I can
provide a new link if needed.
Why does this recovery fail?Funny. I unpacked your data directory and reduced your postgresql.auto.conf
to something that fits my system:log_min_messages = 'DEBUG5'
restore_command = 'cp /home/laurenz/hamme/fakearchive/%f %p'
recovery_target_time = '2026-03-11 14:51:28 UTC'
recovery_target_action = 'promote'
hot_standby_feedback = 'on'
log_destination = 'csvlog'
log_directory = '/home/laurenz/hamme/log'
logging_collector = 'on'
wal_level = 'logical'
port = 5433
unix_socket_directories = '/home/laurenz/hamme'
max_connections = 300Recovery worked like a charm. pg_waldump shows the checkpoint record in
000000010000000000000003 at the correct position.Not sure what you did wrong.
Yours,
Laurenz Albe
On Fri, 2026-03-13 at 09:35 +0100, Felix Hamme wrote:
Is it safe in general to use mv for wal files? In other words, do the
currently supported postgres versions run restore_command only once
per wal file?
As you found out, no...
Yours,
Laurenz Albe
Timeline history files can be needed multiple times, ok. My question
was about WAL files only.
I'm tempted to use a restore_command which does cp for history files
and mv for WAL files, to optimize performance and disk usage.
An AI told me that a second restore attempt for the same WAL file
could only happen if recovery is resumed after a crash.
Kind regards
Felix Hamme
Show quoted text
On Fri, Mar 13, 2026 at 5:37 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2026-03-13 at 09:35 +0100, Felix Hamme wrote:
Is it safe in general to use mv for wal files? In other words, do the
currently supported postgres versions run restore_command only once
per wal file?As you found out, no...
Yours,
Laurenz Albe
On Mon, 2026-03-16 at 14:56 +0100, Felix Hamme wrote:
I'm tempted to use a restore_command which does cp for history files
and mv for WAL files, to optimize performance and disk usage.
An AI told me that a second restore attempt for the same WAL file
could only happen if recovery is resumed after a crash.
Don't do that. Make the restore_command idempotent.
Trying to optimize for storage space often causes problems elsewhere.
Yours,
Laurenz Albe