Assertion failure in pg_copy_logical_replication_slot()

Started by Fujii Masaoover 5 years ago7 messages
#1Fujii Masao
masao.fujii@oss.nttdata.com
1 attachment(s)

Hi,

If restart_lsn of logical replication slot gets behind more than
max_slot_wal_keep_size from the current LSN, the logical replication
slot would be invalidated and its restart_lsn is reset to an invalid LSN.
If this logical replication slot with an invalid restart_lsn is specified
as the source slot in pg_copy_logical_replication_slot(), the function
causes the following assertion failure.

TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)

This assertion failure is caused by

/* Copying non-reserved slot doesn't make sense */
if (XLogRecPtrIsInvalid(src_restart_lsn))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot copy a replication slot that doesn't reserve WAL")));

I *guess* this assertion check was added because restart_lsn should
not be invalid before. But in v13, it can be invalid thanks to max_slot_wal_keep_size.
I think that this assertion check seems useless and should be removed in v13.
Patch attached. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

fix_assertion_in_pg_copy_logical_replication_slot.patchtext/plain; charset=UTF-8; name=fix_assertion_in_pg_copy_logical_replication_slot.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 06e4955de7..bc3fcbff4d 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -723,12 +723,9 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 
 	/* Copying non-reserved slot doesn't make sense */
 	if (XLogRecPtrIsInvalid(src_restart_lsn))
-	{
-		Assert(!logical_slot);
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("cannot copy a replication slot that doesn't reserve WAL")));
-	}
 
 	/* Overwrite params from optional arguments */
 	if (PG_NARGS() >= 3)
#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#1)
Re: Assertion failure in pg_copy_logical_replication_slot()

At Tue, 23 Jun 2020 00:17:47 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

Hi,

If restart_lsn of logical replication slot gets behind more than
max_slot_wal_keep_size from the current LSN, the logical replication
slot would be invalidated and its restart_lsn is reset to an invalid
LSN.
If this logical replication slot with an invalid restart_lsn is
specified
as the source slot in pg_copy_logical_replication_slot(), the function
causes the following assertion failure.

Good catch!

TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)

This assertion failure is caused by

/* Copying non-reserved slot doesn't make sense */
if (XLogRecPtrIsInvalid(src_restart_lsn))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot copy a replication slot that doesn't reserve
WAL")));

I *guess* this assertion check was added because restart_lsn should
not be invalid before. But in v13, it can be invalid thanks to
max_slot_wal_keep_size.
I think that this assertion check seems useless and should be removed
in v13.
Patch attached. Thought?

Your diagnosis looks correct to me. The assertion failure means that
copy_replication_slot was not exercised at least for a non-reserving
logical slots. Greping "pg_copy_logical_replication_slot" on src/test
showed nothing so I doubt we are exercising the function.

Don't we need some?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Kyotaro Horiguchi (#2)
Re: Assertion failure in pg_copy_logical_replication_slot()

On 2020/06/23 18:42, Kyotaro Horiguchi wrote:

At Tue, 23 Jun 2020 00:17:47 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

Hi,

If restart_lsn of logical replication slot gets behind more than
max_slot_wal_keep_size from the current LSN, the logical replication
slot would be invalidated and its restart_lsn is reset to an invalid
LSN.
If this logical replication slot with an invalid restart_lsn is
specified
as the source slot in pg_copy_logical_replication_slot(), the function
causes the following assertion failure.

Good catch!

TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)

This assertion failure is caused by

/* Copying non-reserved slot doesn't make sense */
if (XLogRecPtrIsInvalid(src_restart_lsn))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot copy a replication slot that doesn't reserve
WAL")));

I *guess* this assertion check was added because restart_lsn should
not be invalid before. But in v13, it can be invalid thanks to
max_slot_wal_keep_size.
I think that this assertion check seems useless and should be removed
in v13.
Patch attached. Thought?

Your diagnosis looks correct to me.

Thanks for the check! I will commit the patch later.

The assertion failure means that
copy_replication_slot was not exercised at least for a non-reserving
logical slots. Greping "pg_copy_logical_replication_slot" on src/test
showed nothing so I doubt we are exercising the function.

Don't we need some?

Yes, increasing the test coverage sounds helpful!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Fujii Masao (#1)
Re: Assertion failure in pg_copy_logical_replication_slot()

On 2020-Jun-23, Fujii Masao wrote:

If restart_lsn of logical replication slot gets behind more than
max_slot_wal_keep_size from the current LSN, the logical replication
slot would be invalidated and its restart_lsn is reset to an invalid LSN.
If this logical replication slot with an invalid restart_lsn is specified
as the source slot in pg_copy_logical_replication_slot(), the function
causes the following assertion failure.

TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)

Oops.

This assertion failure is caused by

/* Copying non-reserved slot doesn't make sense */
if (XLogRecPtrIsInvalid(src_restart_lsn))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot copy a replication slot that doesn't reserve WAL")));

Heh, you pasted the code after your patch rather than the original.

I think the errcode is a bit bogus considering the new case.
IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate.

One could argue that the error message could also be different for the
case of a logical slot (or even a physical slot that has the upcoming
"invalidated_at" LSN set), maybe "cannot copy a replication slot that
has been invalidated" but maybe that's a pointless distinction.
I don't object to the patch as presented.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Alvaro Herrera (#4)
1 attachment(s)
Re: Assertion failure in pg_copy_logical_replication_slot()

On 2020/06/24 9:38, Alvaro Herrera wrote:

On 2020-Jun-23, Fujii Masao wrote:

If restart_lsn of logical replication slot gets behind more than
max_slot_wal_keep_size from the current LSN, the logical replication
slot would be invalidated and its restart_lsn is reset to an invalid LSN.
If this logical replication slot with an invalid restart_lsn is specified
as the source slot in pg_copy_logical_replication_slot(), the function
causes the following assertion failure.

TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)

Oops.

This assertion failure is caused by

/* Copying non-reserved slot doesn't make sense */
if (XLogRecPtrIsInvalid(src_restart_lsn))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot copy a replication slot that doesn't reserve WAL")));

Heh, you pasted the code after your patch rather than the original.

oh.... sorry.

I think the errcode is a bit bogus considering the new case.
IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate.

Agreed. So I updated the patch so this errcode is used instead.
Patch attached.

One could argue that the error message could also be different for the
case of a logical slot (or even a physical slot that has the upcoming
"invalidated_at" LSN set), maybe "cannot copy a replication slot that
has been invalidated" but maybe that's a pointless distinction.
I don't object to the patch as presented.

I have no strong opinion about this, but for now I kept the message as it is.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

fix_assertion_in_pg_copy_logical_replication_slot_v2.patchtext/plain; charset=UTF-8; name=fix_assertion_in_pg_copy_logical_replication_slot_v2.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 06e4955de7..ce17d44227 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -723,12 +723,9 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 
 	/* Copying non-reserved slot doesn't make sense */
 	if (XLogRecPtrIsInvalid(src_restart_lsn))
-	{
-		Assert(!logical_slot);
 		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("cannot copy a replication slot that doesn't reserve WAL")));
-	}
 
 	/* Overwrite params from optional arguments */
 	if (PG_NARGS() >= 3)
#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Fujii Masao (#5)
Re: Assertion failure in pg_copy_logical_replication_slot()

On 2020-Jun-24, Fujii Masao wrote:

I think the errcode is a bit bogus considering the new case.
IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate.

Agreed. So I updated the patch so this errcode is used instead.
Patch attached.

LGTM.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Alvaro Herrera (#6)
Re: Assertion failure in pg_copy_logical_replication_slot()

On 2020/06/24 23:58, Alvaro Herrera wrote:

On 2020-Jun-24, Fujii Masao wrote:

I think the errcode is a bit bogus considering the new case.
IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate.

Agreed. So I updated the patch so this errcode is used instead.
Patch attached.

LGTM.

Thanks! Pushed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION