pgsql: Fix an assertion failure related to an exclusive backup.

Started by Fujii Masaoabout 9 years ago4 messageshackers
Jump to latest
#1Fujii Masao
masao.fujii@gmail.com

Fix an assertion failure related to an exclusive backup.

Previously multiple sessions could execute pg_start_backup() and
pg_stop_backup() to start and stop an exclusive backup at the same time.
This could trigger the assertion failure of
"FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)".
This happend because, even while pg_start_backup() was starting
an exclusive backup, other session could run pg_stop_backup()
concurrently and mark the backup as not-in-progress unconditionally.

This patch introduces ExclusiveBackupState indicating the state of
an exclusive backup. This state is used to ensure that there is only
one session running pg_start_backup() or pg_stop_backup() at
the same time, to avoid the assertion failure.

Back-patch to all supported versions.

Author: Michael Paquier
Reviewed-By: Kyotaro Horiguchi and me
Reported-By: Andreas Seltenreich
Discussion: <87mvktojme.fsf@credativ.de>

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/974ece58bbb3c0ef185a9d44b1cedae51cd56b04

Modified Files
--------------
src/backend/access/transam/xlog.c | 223 +++++++++++++++++++++++++-------------
1 file changed, 150 insertions(+), 73 deletions(-)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#2Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#1)
Re: [COMMITTERS] pgsql: Fix an assertion failure related to an exclusive backup.

On Tue, Jan 17, 2017 at 5:40 PM, Fujii Masao <fujii@postgresql.org> wrote:

Fix an assertion failure related to an exclusive backup.

Previously multiple sessions could execute pg_start_backup() and
pg_stop_backup() to start and stop an exclusive backup at the same time.
This could trigger the assertion failure of
"FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)".
This happend because, even while pg_start_backup() was starting
an exclusive backup, other session could run pg_stop_backup()
concurrently and mark the backup as not-in-progress unconditionally.

This patch introduces ExclusiveBackupState indicating the state of
an exclusive backup. This state is used to ensure that there is only
one session running pg_start_backup() or pg_stop_backup() at
the same time, to avoid the assertion failure.

Please note that this commit message is not completely exact. This fix
does not only avoid triggerring this assertion failure, it also makes
sure that no manual on-disk intervention is needed by the user to
remove a backup_label file after a failure of pg_stop_backup(). Before
this patch, what happened is that the exclusive backup counter in
XLogCtl got decremented before removing backup_label. However, after
the counter was decremented, if an error occurred, the shared memory
counter would have been at 0 with a backup_label file on disk.
Subsequent attempts to start pg_start_backup() would have failed, and
putting the system backup into a consistent state would have required
an operator to remove by hand the backup_label file. The heart of the
logic here is in the callback of pg_stop_backup() when an error
happens during the deletion of the backup_label file.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#2)
Re: Re: [COMMITTERS] pgsql: Fix an assertion failure related to an exclusive backup.

On Tue, Jan 17, 2017 at 10:37 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Jan 17, 2017 at 5:40 PM, Fujii Masao <fujii@postgresql.org> wrote:

Fix an assertion failure related to an exclusive backup.

Previously multiple sessions could execute pg_start_backup() and
pg_stop_backup() to start and stop an exclusive backup at the same time.
This could trigger the assertion failure of
"FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)".
This happend because, even while pg_start_backup() was starting
an exclusive backup, other session could run pg_stop_backup()
concurrently and mark the backup as not-in-progress unconditionally.

This patch introduces ExclusiveBackupState indicating the state of
an exclusive backup. This state is used to ensure that there is only
one session running pg_start_backup() or pg_stop_backup() at
the same time, to avoid the assertion failure.

Please note that this commit message is not completely exact. This fix
does not only avoid triggerring this assertion failure, it also makes
sure that no manual on-disk intervention is needed by the user to
remove a backup_label file after a failure of pg_stop_backup(). Before
this patch, what happened is that the exclusive backup counter in
XLogCtl got decremented before removing backup_label. However, after
the counter was decremented, if an error occurred, the shared memory
counter would have been at 0 with a backup_label file on disk.
Subsequent attempts to start pg_start_backup() would have failed, and
putting the system backup into a consistent state would have required
an operator to remove by hand the backup_label file. The heart of the
logic here is in the callback of pg_stop_backup() when an error
happens during the deletion of the backup_label file.

With the patch, what happens if pg_stop_backup exits with an error
after removing backup_label file before resetting the backup state
to none?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#3)
Re: Re: [COMMITTERS] pgsql: Fix an assertion failure related to an exclusive backup.

On Tue, Jan 17, 2017 at 11:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

With the patch, what happens if pg_stop_backup exits with an error
after removing backup_label file before resetting the backup state
to none?

Removing the backup_label file is the last error that can happen
during the time the callback is set. And the counter is reset
immediately after.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers