BUG #14326: Unexpected status after crash during exclusive backup

Started by Marco Nenciariniover 9 years ago3 messagesbugs

marco.nenciarini@2ndquadrant.it

over 9 years ago

The following bug has been logged on the website:

Bug reference: 14326
Logged by: Marco Nenciarini
Email address: marco.nenciarini@2ndquadrant.it
PostgreSQL version: 9.6rc1
Operating system: Any
Description:

I was investigating a Postgres standby that was never reaching the
consistent recovery state and I discovered something unexpected in the
pg_controldata output:

Backup start location: 5E4/7C000028
Backup end location: 0/0

The standby was built using a cold backup of the master data directory, so I
was surprised to find "Backup start location" different from 0/0.

The replication was working correctly and the standby was perfectly aligned
with the master, moreover, the position 5E4/7C000028 was very old compared
to the latest checkpoint location, which was 675/78329748.

After further investigation I discovered that the cause of the issue was a
system crash which happened a month ago. Unfortunately when the system
crashed, an exclusive backup was running, so at restart it found a valid
backup_label and, given that the WAL file containing the backup start point
was still available, it started a backup recovery.

There is the issue: Postgres will never find the XLOG_BACKUP_END record
corresponding to the backupStartPoint recorded in control data, because it
was never written, so it will never reach the consistency point.

This has no user-visible effects unless the Postgres instance enters the
archive recovery state, in that case hot standby will never be activated.
Also, it doesn't impact any backup eventually taken from the instance
because to recover from the backup, the instance will go through a backup
recovery that will reset the backupStartPoint value.

The workaround I found to reset this state is to force the instance through
another backup recovery, by starting an exclusive backup, saving the
backup_label, stopping the backup and restarting the instance with the saved
backup_label in place.

I don't know the best way to handle this situation, but at least, I'd like a
warning message when the instance exits from the crash recovery while
backupStartPoint is still set.

This behaviour is present in every supported Postgres release and on master
as well.

Regards,
Marco

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Michael Paquier

michael@paquier.xyz

over 9 years ago

In reply to: Marco Nenciarini (#1)

Re: BUG #14326: Unexpected status after crash during exclusive backup

On Fri, Sep 16, 2016 at 6:54 PM, <marco.nenciarini@2ndquadrant.it> wrote:

The workaround I found to reset this state is to force the instance through
another backup recovery, by starting an exclusive backup, saving the
backup_label, stopping the backup and restarting the instance with the saved
backup_label in place.

That's not user-friendly.

I don't know the best way to handle this situation, but at least, I'd like a
warning message when the instance exits from the crash recovery while
backupStartPoint is still set.

So you would get such a warning even when you restore from a backup
willingly, no? That may confuse users. Now, the case you are referring
to is unfortunately a known problem with exclusive backups... There is
no way to make the difference between a node restored from a backup
and a node that crashed while a backup is taken. And that may be a
reason to make non-exclusive backups more popular because they are
more reliable.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Marco Nenciarini

marco.nenciarini@2ndquadrant.it

over 9 years ago

In reply to: Michael Paquier (#2)

Re: BUG #14326: Unexpected status after crash during exclusive backup

On 21/09/16 08:50, Michael Paquier wrote:

On Fri, Sep 16, 2016 at 6:54 PM, <marco.nenciarini@2ndquadrant.it> wrote:

The workaround I found to reset this state is to force the instance through
another backup recovery, by starting an exclusive backup, saving the
backup_label, stopping the backup and restarting the instance with the saved
backup_label in place.

That's not user-friendly.

I agree, it isn't. But it's the only way to reset that state with the
current available tools. Probably, the pg_resetxlog tool could be
modified to allow the user to reset that value only.

I don't know the best way to handle this situation, but at least, I'd like a
warning message when the instance exits from the crash recovery while
backupStartPoint is still set.

So you would get such a warning even when you restore from a backup
willingly, no? That may confuse users. Now, the case you are referring
to is unfortunately a known problem with exclusive backups... There is
no way to make the difference between a node restored from a backup
and a node that crashed while a backup is taken.
And that may be a
reason to make non-exclusive backups more popular because they are
more reliable.

You are right, an eventual solution to this issue must not interfere
with normal recovery from a backup.

To mitigate the effect we could to reset the state of the
backupStartPoint field during the pg_start_backup invocation. So, if a
backup will be interrupted by a reboot, the instance state will be
cleaned during the next backup.

Another possibility could be to emit a warning (and maybe reset
backupStartPoint value) during the shutdown of an instance that is fully
"in production".

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it