Crash after a call to pg_backup_start()
While playing with a proposed patch, I noticed that a session crashes
after a failed call to pg_backup_start().
postgres=# select pg_backup_start(repeat('x', 1026));
ERROR: backup label too long (max 1024 bytes)
postgres=# \q
TRAP: failed Assert("during_backup_start ^ (sessionBackupState == SESSION_BACKUP_RUNNING)"), File: "xlog.c", Line: 8846, PID: 165835
Surprisingly this happens by a series of succussful calls to
pg_backup_start and stop.
postgres=# select pg_backup_start('x');
postgres=# select pg_backup_top();
postgres=# \q
TRAP: failed Assert("durin..
do_pg_abort_backup(int code, Datum arg)
/* Only one of these conditions can be true */
Assert(during_backup_start ^
(sessionBackupState == SESSION_BACKUP_RUNNING));
It seems to me that the comment is true and the condition is a thinko.
This is introduced by df3737a651 so it is master only.
Please find the attached fix.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
fix_crash_after_backup_start.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dea978a962..888f5b1bff 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8842,8 +8842,8 @@ do_pg_abort_backup(int code, Datum arg)
bool during_backup_start = DatumGetBool(arg);
/* Only one of these conditions can be true */
- Assert(during_backup_start ^
- (sessionBackupState == SESSION_BACKUP_RUNNING));
+ Assert(!(during_backup_start &&
+ (sessionBackupState == SESSION_BACKUP_RUNNING)));
if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE)
{
On Fri, Oct 21, 2022 at 3:10 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:
It seems to me that the comment is true and the condition is a thinko.
Yeah, the two conditions could be both false. How about we update the
comment a bit to emphasize this? Such as
/* At most one of these conditions can be true */
or
/* These conditions can not be both true */
Please find the attached fix.
+1
Thanks
Richard
On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
Yeah, the two conditions could be both false. How about we update the
comment a bit to emphasize this? Such as/* At most one of these conditions can be true */
or
/* These conditions can not be both true */
If you do that, it would be a bit easier to read as of the following
assertion instead? Like:
Assert(!during_backup_start ||
sessionBackupState == SESSION_BACKUP_NONE);
Please note that I have not checked in details all the interactions
behind register_persistent_abort_backup_handler() before entering in
do_pg_backup_start() and the ERROR_CLEANUP block used in this
routine (just a matter of some elog(ERROR)s put here and there, for
example). Anyway, yes, both conditions can be false, and that's easy
to reproduce:
1) Do pg_backup_start().
2) Do pg_backup_stop().
3) Stop the session to kick do_pg_abort_backup()
4) Assert()-boom.
--
Michael
On Fri, Oct 21, 2022 at 7:06 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
Yeah, the two conditions could be both false. How about we update the
comment a bit to emphasize this? Such as/* At most one of these conditions can be true */
or
/* These conditions can not be both true */If you do that, it would be a bit easier to read as of the following
assertion instead? Like:
Assert(!during_backup_start ||
sessionBackupState == SESSION_BACKUP_NONE);Please note that I have not checked in details all the interactions
behind register_persistent_abort_backup_handler() before entering in
do_pg_backup_start() and the ERROR_CLEANUP block used in this
routine (just a matter of some elog(ERROR)s put here and there, for
example). Anyway, yes, both conditions can be false, and that's easy
to reproduce:
1) Do pg_backup_start().
2) Do pg_backup_stop().
3) Stop the session to kick do_pg_abort_backup()
4) Assert()-boom.
I'm wondering if we need the assertion at all. We know that when the
arg is true or the sessionBackupState is SESSION_BACKUP_RUNNING, the
runningBackups would've been incremented and we can just go ahead and
decrement it, like the attached patch. This is a cleaner approach IMO
unless I'm missing something here.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-Fix-assertion-failure-in-do_pg_abort_backup.patchapplication/x-patch; name=v1-0001-Fix-assertion-failure-in-do_pg_abort_backup.patchDownload
From b147720ca68a120efdf8f20c58cb3499901e6d61 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 21 Oct 2022 14:29:27 +0000
Subject: [PATCH v1] Fix assertion failure in do_pg_abort_backup()
---
src/backend/access/transam/xlog.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dea978a962..104309c56f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8841,11 +8841,12 @@ do_pg_abort_backup(int code, Datum arg)
{
bool during_backup_start = DatumGetBool(arg);
- /* Only one of these conditions can be true */
- Assert(during_backup_start ^
- (sessionBackupState == SESSION_BACKUP_RUNNING));
-
- if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE)
+ /*
+ * When either of these were true, we know that the runningBackups has
+ * already been incremented, hence decrement it.
+ */
+ if (during_backup_start ||
+ sessionBackupState == SESSION_BACKUP_RUNNING)
{
WALInsertLockAcquireExclusive();
Assert(XLogCtl->Insert.runningBackups > 0);
--
2.34.1
On 2022-Oct-21, Michael Paquier wrote:
On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
/* These conditions can not be both true */
If you do that, it would be a bit easier to read as of the following
assertion instead? Like:
Assert(!during_backup_start ||
sessionBackupState == SESSION_BACKUP_NONE);
My intention here was that the Assert should be inside the block, that
is, we already know that at least one is true, and we want to make sure
that they are not *both* true.
AFAICT the attached patch also fixes the bug without making the assert
weaker.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Attachments:
0001-Fix-misplaced-assertion.patchtext/x-diff; charset=us-asciiDownload
From 710eef9259f4286f8d8660ac1dd898323205a037 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Sat, 22 Oct 2022 09:54:24 +0200
Subject: [PATCH] Fix misplaced assertion
---
src/backend/access/transam/xlog.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dea978a962..fb3fbcd2be 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8841,12 +8841,12 @@ do_pg_abort_backup(int code, Datum arg)
{
bool during_backup_start = DatumGetBool(arg);
- /* Only one of these conditions can be true */
- Assert(during_backup_start ^
- (sessionBackupState == SESSION_BACKUP_RUNNING));
-
if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE)
{
+ /* We should be here only by one of these reasons, never both */
+ Assert(during_backup_start ^
+ (sessionBackupState == SESSION_BACKUP_RUNNING));
+
WALInsertLockAcquireExclusive();
Assert(XLogCtl->Insert.runningBackups > 0);
XLogCtl->Insert.runningBackups--;
--
2.30.2
On Sat, Oct 22, 2022 at 1:26 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Oct-21, Michael Paquier wrote:
On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
/* These conditions can not be both true */
If you do that, it would be a bit easier to read as of the following
assertion instead? Like:
Assert(!during_backup_start ||
sessionBackupState == SESSION_BACKUP_NONE);My intention here was that the Assert should be inside the block, that
is, we already know that at least one is true, and we want to make sure
that they are not *both* true.AFAICT the attached patch also fixes the bug without making the assert
weaker.
+ /* We should be here only by one of these reasons, never both */
+ Assert(during_backup_start ^
+ (sessionBackupState == SESSION_BACKUP_RUNNING));
+
What's the problem even if we're here when both of them are true? The
runningBackups is incremented anyways, right? Why can't we just get
rid of the Assert and treat during_backup_start as
backup_marked_active_in_shmem or something like that to keep things
simple?
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On 2022-Oct-22, Bharath Rupireddy wrote:
+ /* We should be here only by one of these reasons, never both */ + Assert(during_backup_start ^ + (sessionBackupState == SESSION_BACKUP_RUNNING)); +What's the problem even if we're here when both of them are true?
In what case should we be there with both conditions true?
The runningBackups is incremented anyways, right?
In the current code, yes, but it seems to be easier to reason about if
we know precisely why we're there and whether we should be running the
cleanup or not. Otherwise we might end up with a bug where we run the
function but it doesn't do anything because we failed to understand the
preconditions. At the very least, this forces a developer changing this
code to think through it.
Why can't we just get rid of the Assert and treat during_backup_start
as backup_marked_active_in_shmem or something like that to keep things
simple?
Why is that simpler?
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"La verdad no siempre es bonita, pero el hambre de ella sí"
On Sat, Oct 22, 2022 at 1:56 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Why can't we just get rid of the Assert and treat during_backup_start
as backup_marked_active_in_shmem or something like that to keep things
simple?Why is that simpler?
IMO, the assertion looks complex there and thinking if we can remove it.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
At Sat, 22 Oct 2022 09:56:06 +0200, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in
On 2022-Oct-21, Michael Paquier wrote:
On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
/* These conditions can not be both true */
If you do that, it would be a bit easier to read as of the following
assertion instead? Like:
Assert(!during_backup_start ||
sessionBackupState == SESSION_BACKUP_NONE);My intention here was that the Assert should be inside the block, that
is, we already know that at least one is true, and we want to make sure
that they are not *both* true.AFAICT the attached patch also fixes the bug without making the assert
weaker.
I'm fine with either of them, but..
The reason for that works the same way is that the if() block excludes
the case of (!during_backup_start && S_B_RUNNING)<*1>. In other words
the strictness is a kind of illusion [*a]. Actually the assertion does
not detect the case <*1>. In this regard, moving the current
assertion into the if() block might be confusing.
regards,
<*1>: It's evidently superfluous but "strictness" and "illusion" share
the exactly the same pronounciation in Japanese "Ghen-Ka-Ku".
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Mon, Oct 24, 2022 at 11:42:58AM +0900, Kyotaro Horiguchi wrote:
At Sat, 22 Oct 2022 09:56:06 +0200, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in
My intention here was that the Assert should be inside the block, that
is, we already know that at least one is true, and we want to make sure
that they are not *both* true.AFAICT the attached patch also fixes the bug without making the assert
weaker.
On the contrary, it seems to me that putting the assertion within the
if() block makes the assertion weaker, because we would never check
for an incorrect state after do_pg_abort_backup() is registered (aka
any pg_backup_start() call) when not entering in this if() block.
Saying that, if you feel otherwise I am fine with your conclusion as
well, so feel free to solve this issue as you see fit. :p
--
Michael
On 2022-Oct-24, Michael Paquier wrote:
On the contrary, it seems to me that putting the assertion within the
if() block makes the assertion weaker, because we would never check
for an incorrect state after do_pg_abort_backup() is registered (aka
any pg_backup_start() call) when not entering in this if() block.
Reading it again, I agree with your conclusion, so I'll push as you
proposed with some extra tests, after they complete running.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"La verdad no siempre es bonita, pero el hambre de ella sí"