Assertion failure in pg_copy_logical_replication_slot()
Hi,
If restart_lsn of logical replication slot gets behind more than
max_slot_wal_keep_size from the current LSN, the logical replication
slot would be invalidated and its restart_lsn is reset to an invalid LSN.
If this logical replication slot with an invalid restart_lsn is specified
as the source slot in pg_copy_logical_replication_slot(), the function
causes the following assertion failure.
TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)
This assertion failure is caused by
/* Copying non-reserved slot doesn't make sense */
if (XLogRecPtrIsInvalid(src_restart_lsn))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot copy a replication slot that doesn't reserve WAL")));
I *guess* this assertion check was added because restart_lsn should
not be invalid before. But in v13, it can be invalid thanks to max_slot_wal_keep_size.
I think that this assertion check seems useless and should be removed in v13.
Patch attached. Thought?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
fix_assertion_in_pg_copy_logical_replication_slot.patchtext/plain; charset=UTF-8; name=fix_assertion_in_pg_copy_logical_replication_slot.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 06e4955de7..bc3fcbff4d 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -723,12 +723,9 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
/* Copying non-reserved slot doesn't make sense */
if (XLogRecPtrIsInvalid(src_restart_lsn))
- {
- Assert(!logical_slot);
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot copy a replication slot that doesn't reserve WAL")));
- }
/* Overwrite params from optional arguments */
if (PG_NARGS() >= 3)
At Tue, 23 Jun 2020 00:17:47 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
Hi,
If restart_lsn of logical replication slot gets behind more than
max_slot_wal_keep_size from the current LSN, the logical replication
slot would be invalidated and its restart_lsn is reset to an invalid
LSN.
If this logical replication slot with an invalid restart_lsn is
specified
as the source slot in pg_copy_logical_replication_slot(), the function
causes the following assertion failure.
Good catch!
TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)
This assertion failure is caused by
/* Copying non-reserved slot doesn't make sense */
if (XLogRecPtrIsInvalid(src_restart_lsn))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot copy a replication slot that doesn't reserve
WAL")));I *guess* this assertion check was added because restart_lsn should
not be invalid before. But in v13, it can be invalid thanks to
max_slot_wal_keep_size.
I think that this assertion check seems useless and should be removed
in v13.
Patch attached. Thought?
Your diagnosis looks correct to me. The assertion failure means that
copy_replication_slot was not exercised at least for a non-reserving
logical slots. Greping "pg_copy_logical_replication_slot" on src/test
showed nothing so I doubt we are exercising the function.
Don't we need some?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2020/06/23 18:42, Kyotaro Horiguchi wrote:
At Tue, 23 Jun 2020 00:17:47 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
Hi,
If restart_lsn of logical replication slot gets behind more than
max_slot_wal_keep_size from the current LSN, the logical replication
slot would be invalidated and its restart_lsn is reset to an invalid
LSN.
If this logical replication slot with an invalid restart_lsn is
specified
as the source slot in pg_copy_logical_replication_slot(), the function
causes the following assertion failure.Good catch!
TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)
This assertion failure is caused by
/* Copying non-reserved slot doesn't make sense */
if (XLogRecPtrIsInvalid(src_restart_lsn))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot copy a replication slot that doesn't reserve
WAL")));I *guess* this assertion check was added because restart_lsn should
not be invalid before. But in v13, it can be invalid thanks to
max_slot_wal_keep_size.
I think that this assertion check seems useless and should be removed
in v13.
Patch attached. Thought?Your diagnosis looks correct to me.
Thanks for the check! I will commit the patch later.
The assertion failure means that
copy_replication_slot was not exercised at least for a non-reserving
logical slots. Greping "pg_copy_logical_replication_slot" on src/test
showed nothing so I doubt we are exercising the function.Don't we need some?
Yes, increasing the test coverage sounds helpful!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020-Jun-23, Fujii Masao wrote:
If restart_lsn of logical replication slot gets behind more than
max_slot_wal_keep_size from the current LSN, the logical replication
slot would be invalidated and its restart_lsn is reset to an invalid LSN.
If this logical replication slot with an invalid restart_lsn is specified
as the source slot in pg_copy_logical_replication_slot(), the function
causes the following assertion failure.TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)
Oops.
This assertion failure is caused by
/* Copying non-reserved slot doesn't make sense */
if (XLogRecPtrIsInvalid(src_restart_lsn))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot copy a replication slot that doesn't reserve WAL")));
Heh, you pasted the code after your patch rather than the original.
I think the errcode is a bit bogus considering the new case.
IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate.
One could argue that the error message could also be different for the
case of a logical slot (or even a physical slot that has the upcoming
"invalidated_at" LSN set), maybe "cannot copy a replication slot that
has been invalidated" but maybe that's a pointless distinction.
I don't object to the patch as presented.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/06/24 9:38, Alvaro Herrera wrote:
On 2020-Jun-23, Fujii Masao wrote:
If restart_lsn of logical replication slot gets behind more than
max_slot_wal_keep_size from the current LSN, the logical replication
slot would be invalidated and its restart_lsn is reset to an invalid LSN.
If this logical replication slot with an invalid restart_lsn is specified
as the source slot in pg_copy_logical_replication_slot(), the function
causes the following assertion failure.TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)
Oops.
This assertion failure is caused by
/* Copying non-reserved slot doesn't make sense */
if (XLogRecPtrIsInvalid(src_restart_lsn))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot copy a replication slot that doesn't reserve WAL")));Heh, you pasted the code after your patch rather than the original.
oh.... sorry.
I think the errcode is a bit bogus considering the new case.
IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate.
Agreed. So I updated the patch so this errcode is used instead.
Patch attached.
One could argue that the error message could also be different for the
case of a logical slot (or even a physical slot that has the upcoming
"invalidated_at" LSN set), maybe "cannot copy a replication slot that
has been invalidated" but maybe that's a pointless distinction.
I don't object to the patch as presented.
I have no strong opinion about this, but for now I kept the message as it is.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
fix_assertion_in_pg_copy_logical_replication_slot_v2.patchtext/plain; charset=UTF-8; name=fix_assertion_in_pg_copy_logical_replication_slot_v2.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 06e4955de7..ce17d44227 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -723,12 +723,9 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
/* Copying non-reserved slot doesn't make sense */
if (XLogRecPtrIsInvalid(src_restart_lsn))
- {
- Assert(!logical_slot);
ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot copy a replication slot that doesn't reserve WAL")));
- }
/* Overwrite params from optional arguments */
if (PG_NARGS() >= 3)
On 2020-Jun-24, Fujii Masao wrote:
I think the errcode is a bit bogus considering the new case.
IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate.Agreed. So I updated the patch so this errcode is used instead.
Patch attached.
LGTM.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/06/24 23:58, Alvaro Herrera wrote:
On 2020-Jun-24, Fujii Masao wrote:
I think the errcode is a bit bogus considering the new case.
IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate.Agreed. So I updated the patch so this errcode is used instead.
Patch attached.LGTM.
Thanks! Pushed.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION