Forbid the use of invalidated physical slots in streaming replication.
Hi,
When testing streaming replication with a physical slot. I found an unexpected
behavior that the walsender could use an invalidated physical slot for
streaming.
This occurs when the primary slot is invalidated due to reaching the
max_slot_wal_keep_size before initializing the streaming replication
(e.g. before creating the standby). Attach a draft script(test_invalidated_slot.sh)
which can reproduce this.
Once the slot is invalidated, it can no longer protect the WALs and
Rows, as these invalidated slots are not considered in functions like
ReplicationSlotsComputeRequiredXmin().
Besides, the walsender could advance the restart_lsn of an invalidated slot,
then user won't be able to know that if the slot is actually validated or not,
because the 'conflicting' of view pg_replication_slot could be set back to
null.
So, I think it's a bug and one idea to fix is to check the validity of the physical slot in
StartReplication() after acquiring the slot like what the attachment does,
what do you think ?
Best Regards,
Hou Zhijie
Attachments:
0001-Forbid-the-use-of-invalidated-slot-in-streaming-repl.patchapplication/octet-stream; name=0001-Forbid-the-use-of-invalidated-slot-in-streaming-repl.patchDownload
From 6db46eed2c2d99beb1defc4470c598cfe796cb3a Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Wed, 6 Dec 2023 11:38:00 +0800
Subject: [PATCH] Forbid the use of invalidated slot in streaming replication
---
src/backend/replication/walsender.c | 9 +++++++++
src/test/recovery/t/019_replslot_limit.pl | 2 +-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3bc9c82389..53073dddaa 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -703,6 +703,15 @@ StartReplication(StartReplicationCmd *cmd)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot use a logical replication slot for physical replication")));
+ if (MyReplicationSlot->data.invalidated == RS_INVAL_WAL_REMOVED)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot use replication slot \"%s\" for physical replication",
+ NameStr(MyReplicationSlot->data.name)),
+ errdetail("This slot has been invalidated because it exceeded the maximum reserved size.")));
+
+ Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
/*
* We don't need to verify the slot's restart_lsn here; instead we
* rely on the caller requesting the starting point to use. If the
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 7d94f15778..7c350e86f7 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -236,7 +236,7 @@ my $failed = 0;
for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
{
if ($node_standby->log_contains(
- "requested WAL segment [0-9A-F]+ has already been removed",
+ "cannot use replication slot \"rep1\" for physical replication",
$logstart))
{
$failed = 1;
--
2.31.1
On Wed, Dec 6, 2023 at 6:25 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
Hi,
When testing streaming replication with a physical slot. I found an unexpected
behavior that the walsender could use an invalidated physical slot for
streaming.This occurs when the primary slot is invalidated due to reaching the
max_slot_wal_keep_size before initializing the streaming replication
(e.g. before creating the standby). Attach a draft script(test_invalidated_slot.sh)
which can reproduce this.
Interesting. Thanks for the script. It reproduces the problem for me easily.
Once the slot is invalidated, it can no longer protect the WALs and
Rows, as these invalidated slots are not considered in functions like
ReplicationSlotsComputeRequiredXmin().Besides, the walsender could advance the restart_lsn of an invalidated slot,
then user won't be able to know that if the slot is actually validated or not,
because the 'conflicting' of view pg_replication_slot could be set back to
null.
In this case, since the basebackup was taken after the slot was
invalidated, it does not require the WAL that was removed. But it
seems that once the streaming starts, the slot sprints to life again
and gets validated again. Here's pg_replication_slot output after the
standby starts
#select * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database |
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn | wal_status | safe_wal_size | two_phase | conflic
ting
-------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------+--------
-----
logicalslot | test_decoding | logical | 5 | postgres | f
| f | | | 739 | |
0/1513B08 | lost | | f | t
physical | | physical | | | f
| t | 341925 | 752 | | 0/404CB78 |
| unreserved | 16462984 | f |
(2 rows)
which looks quite similar to the output when slot was valid after creation
slot_name | plugin | slot_type | datoid | database |
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn | wal_status | safe_wal_size | two_phase | conflic
ting
-------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------+--------
-----
logicalslot | test_decoding | logical | 5 | postgres | f
| f | | | 739 | 0/1513AD0 |
0/1513B08 | unreserved | -1591888 | f | f
physical | | physical | | | f
| f | | | | 0/14F0DF0 |
| unreserved | -1591888 | f |
(2 rows)
So, I think it's a bug and one idea to fix is to check the validity of the physical slot in
StartReplication() after acquiring the slot like what the attachment does,
what do you think ?
I am not sure whether that's necessarily a bug. Of course, we don't
expect invalid slots to be used but in this case I don't see any harm.
The invalid slot has been revived and has all the properties set just
like a valid slot. So it must be considered in
ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it
myself though. In case the WAL is really lost and is requested by the
standby it will throw an error "requested WAL segment [0-9A-F]+ has
already been removed". So no harm there as well.
I haven't been able to locate the code which makes the slot valid
though. So I can't say whether the behaviour is intended or not.
Looking at StartReplication() comment
/*
* We don't need to verify the slot's restart_lsn here; instead we
* rely on the caller requesting the starting point to use. If the
* WAL segment doesn't exist, we'll fail later.
*/
it looks like the behaviour is not completely unexpected.
--
Best Wishes,
Ashutosh Bapat
On Thursday, December 7, 2023 7:43 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
Hi,
On Wed, Dec 6, 2023 at 6:25 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
wrote:When testing streaming replication with a physical slot. I found an
unexpected behavior that the walsender could use an invalidated
physical slot for streaming.This occurs when the primary slot is invalidated due to reaching the
max_slot_wal_keep_size before initializing the streaming replication
(e.g. before creating the standby). Attach a draft
script(test_invalidated_slot.sh) which can reproduce this.Interesting. Thanks for the script. It reproduces the problem for me easily.
Thanks for testing and replying!
Once the slot is invalidated, it can no longer protect the WALs and
Rows, as these invalidated slots are not considered in functions like
ReplicationSlotsComputeRequiredXmin().Besides, the walsender could advance the restart_lsn of an invalidated
slot, then user won't be able to know that if the slot is actually
validated or not, because the 'conflicting' of view
pg_replication_slot could be set back to null.In this case, since the basebackup was taken after the slot was invalidated, it
does not require the WAL that was removed. But it seems that once the
streaming starts, the slot sprints to life again and gets validated again. Here's
pg_replication_slot output after the standby starts.
Actually, It doesn't bring the invalidated slot back to life completely.
The slot's view data looks valid while the 'invalidated' flag of this slot is still
RS_INVAL_WAL_REMOVED (user are not aware of it.)
So, I think it's a bug and one idea to fix is to check the validity of
the physical slot in
StartReplication() after acquiring the slot like what the attachment
does, what do you think ?I am not sure whether that's necessarily a bug. Of course, we don't expect
invalid slots to be used but in this case I don't see any harm.
The invalid slot has been revived and has all the properties set just like a valid
slot. So it must be considered in
ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it myself
though. In case the WAL is really lost and is requested by the standby it will
throw an error "requested WAL segment [0-9A-F]+ has already been
removed". So no harm there as well.
Since the 'invalidated' field of the slot is still (RS_INVAL_WAL_REMOVED), even
if the walsender advances the restart_lsn, the slot will not be considered in the
ReplicationSlotsComputeRequiredXmin(), so the WALs and Rows are not safe
and that's why I think it's a bug.
After looking closer, it seems this behavior started from 15f8203 which introduced the
ReplicationSlotInvalidationCause 'invalidated', after that we check the invalidated enum
in Xmin/Lsn computation function.
If we want to go back to previous behavior, we need to revert/adjust the check
for invalidated in ReplicationSlotsComputeRequiredXmin(), but since the logical
decoding(walsender) disallow using invalidated slot, so I feel it's consistent
to do similar check for physical one. Besides, pg_replication_slot_advance()
also disallow passing invalidated slot to it as well. What do you think ?
Best Regards,
Hou zj
pg_replication_slot could be set back to null.
In this case, since the basebackup was taken after the slot was invalidated, it
does not require the WAL that was removed. But it seems that once the
streaming starts, the slot sprints to life again and gets validated again. Here's
pg_replication_slot output after the standby starts.Actually, It doesn't bring the invalidated slot back to life completely.
The slot's view data looks valid while the 'invalidated' flag of this slot is still
RS_INVAL_WAL_REMOVED (user are not aware of it.)
I was mislead by the code in pg_get_replication_slots(). I did not
read it till the following
--- code ----
case WALAVAIL_REMOVED:
/*
* If we read the restart_lsn long enough ago, maybe that file
* has been removed by now. However, the walsender could have
* moved forward enough that it jumped to another file after
* we looked. If checkpointer signalled the process to
* termination, then it's definitely lost; but if a process is
* still alive, then "unreserved" seems more appropriate.
*
* If we do change it, save the state for safe_wal_size below.
*/
--- code ---
I see now how an invalid slot's wal status can be reported as
unreserved. So I think it's a bug.
So, I think it's a bug and one idea to fix is to check the validity of
the physical slot in
StartReplication() after acquiring the slot like what the attachment
does, what do you think ?I am not sure whether that's necessarily a bug. Of course, we don't expect
invalid slots to be used but in this case I don't see any harm.
The invalid slot has been revived and has all the properties set just like a valid
slot. So it must be considered in
ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it myself
though. In case the WAL is really lost and is requested by the standby it will
throw an error "requested WAL segment [0-9A-F]+ has already been
removed". So no harm there as well.Since the 'invalidated' field of the slot is still (RS_INVAL_WAL_REMOVED), even
if the walsender advances the restart_lsn, the slot will not be considered in the
ReplicationSlotsComputeRequiredXmin(), so the WALs and Rows are not safe
and that's why I think it's a bug.After looking closer, it seems this behavior started from 15f8203 which introduced the
ReplicationSlotInvalidationCause 'invalidated', after that we check the invalidated enum
in Xmin/Lsn computation function.If we want to go back to previous behavior, we need to revert/adjust the check
for invalidated in ReplicationSlotsComputeRequiredXmin(), but since the logical
decoding(walsender) disallow using invalidated slot, so I feel it's consistent
to do similar check for physical one. Besides, pg_replication_slot_advance()
also disallow passing invalidated slot to it as well. What do you think ?
What happens if you run your script on a build prior to 15f8203?
Purely from reading the code, it looks like the physical slot would
sprint back to life since its restart_lsn would be updated. But I am
not able to see what happens to invalidated_at. It probably remains a
valid LSN and the slot would still not be considred in xmin
calculation. It will be good to be compatible to pre-15f8203
behaviour.
I think making logical and physical slot behaviour consistent would be
better but if the inconsistent behaviour is out there for some
releases, changing that now will break backward compatibility.
--
Best Wishes,
Ashutosh Bapat
On Fri, 8 Dec 2023 at 19:15, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
pg_replication_slot could be set back to null.
In this case, since the basebackup was taken after the slot was invalidated, it
does not require the WAL that was removed. But it seems that once the
streaming starts, the slot sprints to life again and gets validated again. Here's
pg_replication_slot output after the standby starts.Actually, It doesn't bring the invalidated slot back to life completely.
The slot's view data looks valid while the 'invalidated' flag of this slot is still
RS_INVAL_WAL_REMOVED (user are not aware of it.)I was mislead by the code in pg_get_replication_slots(). I did not
read it till the following--- code ---- case WALAVAIL_REMOVED:/* * If we read the restart_lsn long enough ago, maybe that file * has been removed by now. However, the walsender could have * moved forward enough that it jumped to another file after * we looked. If checkpointer signalled the process to * termination, then it's definitely lost; but if a process is * still alive, then "unreserved" seems more appropriate. * * If we do change it, save the state for safe_wal_size below. */ --- code ---I see now how an invalid slot's wal status can be reported as
unreserved. So I think it's a bug.So, I think it's a bug and one idea to fix is to check the validity of
the physical slot in
StartReplication() after acquiring the slot like what the attachment
does, what do you think ?I am not sure whether that's necessarily a bug. Of course, we don't expect
invalid slots to be used but in this case I don't see any harm.
The invalid slot has been revived and has all the properties set just like a valid
slot. So it must be considered in
ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it myself
though. In case the WAL is really lost and is requested by the standby it will
throw an error "requested WAL segment [0-9A-F]+ has already been
removed". So no harm there as well.Since the 'invalidated' field of the slot is still (RS_INVAL_WAL_REMOVED), even
if the walsender advances the restart_lsn, the slot will not be considered in the
ReplicationSlotsComputeRequiredXmin(), so the WALs and Rows are not safe
and that's why I think it's a bug.After looking closer, it seems this behavior started from 15f8203 which introduced the
ReplicationSlotInvalidationCause 'invalidated', after that we check the invalidated enum
in Xmin/Lsn computation function.If we want to go back to previous behavior, we need to revert/adjust the check
for invalidated in ReplicationSlotsComputeRequiredXmin(), but since the logical
decoding(walsender) disallow using invalidated slot, so I feel it's consistent
to do similar check for physical one. Besides, pg_replication_slot_advance()
also disallow passing invalidated slot to it as well. What do you think ?What happens if you run your script on a build prior to 15f8203?
Purely from reading the code, it looks like the physical slot would
sprint back to life since its restart_lsn would be updated. But I am
not able to see what happens to invalidated_at. It probably remains a
valid LSN and the slot would still not be considred in xmin
calculation. It will be good to be compatible to pre-15f8203
behaviour.
I have changed the patch status to "Waiting on Author", as some of the
queries requested by Ashutosh are not yet addressed.
Regards,
Vignesh
On Thu, Dec 7, 2023 at 8:00 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
After looking closer, it seems this behavior started from 15f8203 which introduced the
ReplicationSlotInvalidationCause 'invalidated', after that we check the invalidated enum
in Xmin/Lsn computation function.
Adding Andres in Cc, as that was his commit.
It's not entirely clear to me how this feature was intended to
interact with physical replication slots. I found seemingly-relevant
documentation in two places:
https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE
https://www.postgresql.org/docs/current/view-pg-replication-slots.html
In the latter, it says "unreserved means that the slot no longer
retains the required WAL files and some of them are to be removed at
the next checkpoint. This state can return to reserved or extended."
But if a slot becomes invalid in such a way that it cannot return to a
valid state later, then this isn't accurate.
I have a feeling that the actual behavior here has evolved and the
documentation hasn't kept up. And I wonder whether we need a more
comprehensive explanation of the intended behavior in this section:
https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION-SLOTS
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, 8 Jan 2024 at 10:25, vignesh C <vignesh21@gmail.com> wrote:
On Fri, 8 Dec 2023 at 19:15, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:pg_replication_slot could be set back to null.
In this case, since the basebackup was taken after the slot was invalidated, it
does not require the WAL that was removed. But it seems that once the
streaming starts, the slot sprints to life again and gets validated again. Here's
pg_replication_slot output after the standby starts.Actually, It doesn't bring the invalidated slot back to life completely.
The slot's view data looks valid while the 'invalidated' flag of this slot is still
RS_INVAL_WAL_REMOVED (user are not aware of it.)I was mislead by the code in pg_get_replication_slots(). I did not
read it till the following--- code ---- case WALAVAIL_REMOVED:/* * If we read the restart_lsn long enough ago, maybe that file * has been removed by now. However, the walsender could have * moved forward enough that it jumped to another file after * we looked. If checkpointer signalled the process to * termination, then it's definitely lost; but if a process is * still alive, then "unreserved" seems more appropriate. * * If we do change it, save the state for safe_wal_size below. */ --- code ---I see now how an invalid slot's wal status can be reported as
unreserved. So I think it's a bug.So, I think it's a bug and one idea to fix is to check the validity of
the physical slot in
StartReplication() after acquiring the slot like what the attachment
does, what do you think ?I am not sure whether that's necessarily a bug. Of course, we don't expect
invalid slots to be used but in this case I don't see any harm.
The invalid slot has been revived and has all the properties set just like a valid
slot. So it must be considered in
ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it myself
though. In case the WAL is really lost and is requested by the standby it will
throw an error "requested WAL segment [0-9A-F]+ has already been
removed". So no harm there as well.Since the 'invalidated' field of the slot is still (RS_INVAL_WAL_REMOVED), even
if the walsender advances the restart_lsn, the slot will not be considered in the
ReplicationSlotsComputeRequiredXmin(), so the WALs and Rows are not safe
and that's why I think it's a bug.After looking closer, it seems this behavior started from 15f8203 which introduced the
ReplicationSlotInvalidationCause 'invalidated', after that we check the invalidated enum
in Xmin/Lsn computation function.If we want to go back to previous behavior, we need to revert/adjust the check
for invalidated in ReplicationSlotsComputeRequiredXmin(), but since the logical
decoding(walsender) disallow using invalidated slot, so I feel it's consistent
to do similar check for physical one. Besides, pg_replication_slot_advance()
also disallow passing invalidated slot to it as well. What do you think ?What happens if you run your script on a build prior to 15f8203?
Purely from reading the code, it looks like the physical slot would
sprint back to life since its restart_lsn would be updated. But I am
not able to see what happens to invalidated_at. It probably remains a
valid LSN and the slot would still not be considred in xmin
calculation. It will be good to be compatible to pre-15f8203
behaviour.I have changed the patch status to "Waiting on Author", as some of the
queries requested by Ashutosh are not yet addressed.
The patch which you submitted has been awaiting your attention for
quite some time now. As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible. Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.
Regards,
Vignesh