BUG #19432: recovery fails at invalid checkpoint record

Started by PG Bug reporting form3 months ago6 messagesbugs

noreply@postgresql.org

3 months ago

The following bug has been logged on the website:

Bug reference: 19432
Logged by: Felix Hamme
Email address: felix.hamme@ionos.com
PostgreSQL version: 17.9
Operating system: Debian 13
Description:

Hi, I'm trying to restore from a pg_basebackup at timeline 1 to a
restore_target_time in timeline 2.
It fails at "invalid checkpoint record", "could not locate required
checkpoint record at 0/3000080".
All relevant wal files are in the archive, the restore_command works and
backup_label, pg_controldata, pg_waldump and 00000002.history look like
everything should work.
Recovering only timeline 1 works, but it fails as soon as it should proceed
in timeline 2.
A 6.7MB tar of the basebackup and the wal archive is available at
https://get.hidrive.com/i/PwMejRQG . This link expires on 2026-03-13, I can
provide a new link if needed.
Why does this recovery fail?

Laurenz Albe

laurenz.albe@cybertec.at

3 months ago

In reply to: PG Bug reporting form (#1)

Re: BUG #19432: recovery fails at invalid checkpoint record

On Thu, 2026-03-12 at 16:20 +0000, PG Bug reporting form wrote:

Hi, I'm trying to restore from a pg_basebackup at timeline 1 to a
restore_target_time in timeline 2.
It fails at "invalid checkpoint record", "could not locate required
checkpoint record at 0/3000080".
All relevant wal files are in the archive, the restore_command works and
backup_label, pg_controldata, pg_waldump and 00000002.history look like
everything should work.
Recovering only timeline 1 works, but it fails as soon as it should proceed
in timeline 2.
A 6.7MB tar of the basebackup and the wal archive is available at
https://get.hidrive.com/i/PwMejRQG . This link expires on 2026-03-13, I can
provide a new link if needed.
Why does this recovery fail?

Funny. I unpacked your data directory and reduced your postgresql.auto.conf
to something that fits my system:

log_min_messages = 'DEBUG5'
restore_command = 'cp /home/laurenz/hamme/fakearchive/%f %p'
recovery_target_time = '2026-03-11 14:51:28 UTC'
recovery_target_action = 'promote'
hot_standby_feedback = 'on'
log_destination = 'csvlog'
log_directory = '/home/laurenz/hamme/log'
logging_collector = 'on'
wal_level = 'logical'
port = 5433
unix_socket_directories = '/home/laurenz/hamme'
max_connections = 300

Recovery worked like a charm. pg_waldump shows the checkpoint record in
000000010000000000000003 at the correct position.

Not sure what you did wrong.

Yours,
Laurenz Albe

Felix Hamme

felix.hamme@ionos.com

3 months ago

In reply to: Laurenz Albe (#2)

Re: BUG #19432: recovery fails at invalid checkpoint record

Thank you for checking, now I found what I did wrong: "mv" doesn't
work as a restore_command because the same .history file is restored
multiple times during recovery.
I successfully recovered using a restore_command which does a cp for
history files and mv for wal files. It logged this:

cp /DBDATA/test/fakearchive/00000002.history pg_wal/RECOVERYHISTORY success
cp /DBDATA/test/fakearchive/00000003.history pg_wal/RECOVERYHISTORY not found
cp /DBDATA/test/fakearchive/00000002.history pg_wal/RECOVERYHISTORY success
mv /DBDATA/test/fakearchive/000000010000000000000003 pg_wal/RECOVERYXLOG success
mv /DBDATA/test/fakearchive/000000010000000000000004 pg_wal/RECOVERYXLOG success
mv /DBDATA/test/fakearchive/000000020000000000000005 pg_wal/RECOVERYXLOG success
mv /DBDATA/test/fakearchive/000000020000000000000006 pg_wal/RECOVERYXLOG success
mv /DBDATA/test/fakearchive/000000020000000000000007 pg_wal/RECOVERYXLOG success
cp /DBDATA/test/fakearchive/00000003.history pg_wal/RECOVERYHISTORY not found
cp /DBDATA/test/fakearchive/00000002.history pg_wal/RECOVERYHISTORY success

Is it safe in general to use mv for wal files? In other words, do the
currently supported postgres versions run restore_command only once
per wal file?

Best regards
Felix Hamme

Show quoted text

On Thu, Mar 12, 2026 at 8:29 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2026-03-12 at 16:20 +0000, PG Bug reporting form wrote:

Hi, I'm trying to restore from a pg_basebackup at timeline 1 to a
restore_target_time in timeline 2.
It fails at "invalid checkpoint record", "could not locate required
checkpoint record at 0/3000080".
All relevant wal files are in the archive, the restore_command works and
backup_label, pg_controldata, pg_waldump and 00000002.history look like
everything should work.
Recovering only timeline 1 works, but it fails as soon as it should proceed
in timeline 2.
A 6.7MB tar of the basebackup and the wal archive is available at
https://get.hidrive.com/i/PwMejRQG . This link expires on 2026-03-13, I can
provide a new link if needed.
Why does this recovery fail?

Funny. I unpacked your data directory and reduced your postgresql.auto.conf
to something that fits my system:

log_min_messages = 'DEBUG5'
restore_command = 'cp /home/laurenz/hamme/fakearchive/%f %p'
recovery_target_time = '2026-03-11 14:51:28 UTC'
recovery_target_action = 'promote'
hot_standby_feedback = 'on'
log_destination = 'csvlog'
log_directory = '/home/laurenz/hamme/log'
logging_collector = 'on'
wal_level = 'logical'
port = 5433
unix_socket_directories = '/home/laurenz/hamme'
max_connections = 300

Recovery worked like a charm. pg_waldump shows the checkpoint record in
000000010000000000000003 at the correct position.

Not sure what you did wrong.

Yours,
Laurenz Albe

Laurenz Albe

laurenz.albe@cybertec.at

3 months ago

In reply to: Felix Hamme (#3)

Re: BUG #19432: recovery fails at invalid checkpoint record

On Fri, 2026-03-13 at 09:35 +0100, Felix Hamme wrote:

Is it safe in general to use mv for wal files? In other words, do the
currently supported postgres versions run restore_command only once
per wal file?

As you found out, no...

Yours,
Laurenz Albe

Felix Hamme

felix.hamme@ionos.com

3 months ago

In reply to: Laurenz Albe (#4)

Re: BUG #19432: recovery fails at invalid checkpoint record

Timeline history files can be needed multiple times, ok. My question
was about WAL files only.
I'm tempted to use a restore_command which does cp for history files
and mv for WAL files, to optimize performance and disk usage.
An AI told me that a second restore attempt for the same WAL file
could only happen if recovery is resumed after a crash.

Kind regards
Felix Hamme

Show quoted text

On Fri, Mar 13, 2026 at 5:37 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Fri, 2026-03-13 at 09:35 +0100, Felix Hamme wrote:

Is it safe in general to use mv for wal files? In other words, do the
currently supported postgres versions run restore_command only once
per wal file?

As you found out, no...

Yours,
Laurenz Albe

Laurenz Albe

laurenz.albe@cybertec.at

3 months ago

In reply to: Felix Hamme (#5)

Re: BUG #19432: recovery fails at invalid checkpoint record

On Mon, 2026-03-16 at 14:56 +0100, Felix Hamme wrote:

I'm tempted to use a restore_command which does cp for history files
and mv for WAL files, to optimize performance and disk usage.
An AI told me that a second restore attempt for the same WAL file
could only happen if recovery is resumed after a crash.

Don't do that. Make the restore_command idempotent.
Trying to optimize for storage space often causes problems elsewhere.

Yours,
Laurenz Albe