Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

Started by Masahiko Sawadaover 2 years ago15 messages
#1Masahiko Sawada
sawada.mshk@gmail.com
1 attachment(s)

Hi,

We call pgstat_drop_subscription() at the end of DropSubscription()
but we could leave from this function earlier e.g. when no slot is
associated with the subscription. In this case, the statistics entry
for the subscription remains. To fix it, I think we need to call it
earlier, just after removing the catalog tuple. There is a chance the
transaction dropping the subscription fails due to network error etc
but we don't need to worry about it as reporting the subscription drop
is transactional.

I've attached the patch. Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

fix_drop_subscription_stats.patchapplication/octet-stream; name=fix_drop_subscription_stats.patchDownload
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 8b3de032ee..09971ae45a 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1644,6 +1644,12 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
 	ReplicationOriginNameForLogicalRep(subid, InvalidOid, originname, sizeof(originname));
 	replorigin_drop_by_name(originname, true, false);
 
+	/*
+	 * Tell the cumulative stats system that the subscription is getting
+	 * dropped.
+	 */
+	pgstat_drop_subscription(subid);
+
 	/*
 	 * If there is no slot associated with the subscription, we can finish
 	 * here.
@@ -1732,12 +1738,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
 	}
 	PG_END_TRY();
 
-	/*
-	 * Tell the cumulative stats system that the subscription is getting
-	 * dropped.
-	 */
-	pgstat_drop_subscription(subid);
-
 	table_close(rel, NoLock);
 }
 
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index d736246259..e9d6166d27 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -73,6 +73,8 @@ SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE s
  t
 (1 row)
 
+-- Remember the subscription oid for checking later if the statistics gets removed.
+SELECT oid as suboid FROM pg_subscription WHERE subname = 'regress_testsub' \gset
 -- fail - name already exists
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false);
 ERROR:  subscription "regress_testsub" already exists
@@ -240,6 +242,14 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 BEGIN;
 DROP SUBSCRIPTION regress_testsub;
 COMMIT;
+-- check if the statistics of the dropped subscription doesn't remain, by trying
+-- directly fetching the stats.
+SELECT stats_reset IS NULL stats_reset_is_null FROM pg_stat_get_subscription_stats(:'suboid');
+ stats_reset_is_null 
+---------------------
+ t
+(1 row)
+
 DROP SUBSCRIPTION IF EXISTS regress_testsub;
 NOTICE:  subscription "regress_testsub" does not exist, skipping
 DROP SUBSCRIPTION regress_testsub;  -- fail
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 55d7dbc9ab..68be70dce5 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -42,6 +42,9 @@ SELECT stats_reset as prev_stats_reset FROM pg_stat_subscription_stats WHERE sub
 SELECT pg_stat_reset_subscription_stats(oid) FROM pg_subscription WHERE subname = 'regress_testsub';
 SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
 
+-- Remember the subscription oid for checking later if the statistics gets removed.
+SELECT oid as suboid FROM pg_subscription WHERE subname = 'regress_testsub' \gset
+
 -- fail - name already exists
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false);
 
@@ -158,6 +161,10 @@ BEGIN;
 DROP SUBSCRIPTION regress_testsub;
 COMMIT;
 
+-- check if the statistics of the dropped subscription doesn't remain, by trying
+-- directly fetching the stats.
+SELECT stats_reset IS NULL stats_reset_is_null FROM pg_stat_get_subscription_stats(:'suboid');
+
 DROP SUBSCRIPTION IF EXISTS regress_testsub;
 DROP SUBSCRIPTION regress_testsub;  -- fail
 
#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

On Mon, May 08, 2023 at 04:23:15PM +0900, Masahiko Sawada wrote:

We call pgstat_drop_subscription() at the end of DropSubscription()
but we could leave from this function earlier e.g. when no slot is
associated with the subscription. In this case, the statistics entry
for the subscription remains. To fix it, I think we need to call it
earlier, just after removing the catalog tuple. There is a chance the
transaction dropping the subscription fails due to network error etc
but we don't need to worry about it as reporting the subscription drop
is transactional.

Looks reasonable to me. IIUC calling pgstat_drop_subscription() earlier
makes no real difference (besides avoiding this bug) because it is uѕing
pgstat_drop_transactional() behind the scenes.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#3Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Masahiko Sawada (#1)
RE: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

Dear Sawada-san,

Thank you for giving the patch! I confirmed that the problem you raised could be
occurred on the HEAD, and the test you added could reproduce that. When the stats
entry has been removed but pg_stat_get_subscription_stats() is called, the returned
values are set as 0x0.
Additionally, I have checked other pgstat_drop_* functions, and I could not find
any similar problems.

A comment:

```
+       /*
+        * Tell the cumulative stats system that the subscription is getting
+        * dropped.
+        */
+       pgstat_drop_subscription(subid);
```

Isn't it better to write down something you said as comment? Or is it quite trivial?

There is a chance the
transaction dropping the subscription fails due to network error etc
but we don't need to worry about it as reporting the subscription drop
is transactional.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#4Melih Mutlu
m.melihmutlu@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

Hi,

Masahiko Sawada <sawada.mshk@gmail.com>, 8 May 2023 Pzt, 10:24 tarihinde
şunu yazdı:

I've attached the patch. Feedback is very welcome.

Thanks for the patch, nice catch.
I can confirm that the issue exists on HEAD and gets resolved by this
patch. Also it looks like stats are really not affected if transaction
fails for some reason, as you explained.
IMO, the patch will be OK after commit message is added.

Thanks,
--
Melih Mutlu
Microsoft

#5Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Melih Mutlu (#4)
Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

On Wed, May 10, 2023 at 8:58 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:

Hi,

Masahiko Sawada <sawada.mshk@gmail.com>, 8 May 2023 Pzt, 10:24 tarihinde şunu yazdı:

I've attached the patch. Feedback is very welcome.

Thanks for the patch, nice catch.
I can confirm that the issue exists on HEAD and gets resolved by this patch. Also it looks like stats are really not affected if transaction fails for some reason, as you explained.
IMO, the patch will be OK after commit message is added.

Thank you for reviewing the patch. I'll push the patch early next
week, barring any objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#6Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#5)
1 attachment(s)
Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

On Thu, May 11, 2023 at 5:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, May 10, 2023 at 8:58 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:

Hi,

Masahiko Sawada <sawada.mshk@gmail.com>, 8 May 2023 Pzt, 10:24 tarihinde şunu yazdı:

I've attached the patch. Feedback is very welcome.

Thanks for the patch, nice catch.
I can confirm that the issue exists on HEAD and gets resolved by this patch. Also it looks like stats are really not affected if transaction fails for some reason, as you explained.
IMO, the patch will be OK after commit message is added.

Thank you for reviewing the patch. I'll push the patch early next
week, barring any objections.

After thinking more about it, I realized that this is not a problem
specific to HEAD. ISTM the problem is that by commit 7b64e4b3, we drop
the stats entry of subscription that is not associated with a
replication slot for apply worker, but we missed the case where the
subscription is not associated with both replication slots for apply
and tablesync. So IIUC we should backpatch it down to 15.

Since in pg15, since we don't create the subscription stats at CREATE
SUBSCRIPTION time but do when the first error is reported, we cannot
rely on the regression test suite. Also, to check if the subscription
stats is surely removed, using pg_stat_have_stats() is clearer. So I
added a test case to TAP tests (026_stats.pl).

On Tue, May 9, 2023 at 1:51 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

A comment:

```
+       /*
+        * Tell the cumulative stats system that the subscription is getting
+        * dropped.
+        */
+       pgstat_drop_subscription(subid);
```

Isn't it better to write down something you said as comment? Or is it quite trivial?

There is a chance the
transaction dropping the subscription fails due to network error etc
but we don't need to worry about it as reporting the subscription drop
is transactional.

I'm not sure it's worth mentioning as we don't have such a comment
around other pgstat_drop_XXX functions.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-pgstat-fix-subscription-stats-entry-leak.patchapplication/octet-stream; name=v2-0001-pgstat-fix-subscription-stats-entry-leak.patchDownload
From c2f5d79611bd97c5c7f5bc18e8ffdcb4c5ba3b11 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 16 May 2023 09:56:40 +0900
Subject: [PATCH v2] pgstat: fix subscription stats entry leak

Commit 7b64e4b3 taught DropSubscription() to drop stats entry of
subscription that is not associated with a replication slot for apply
worker at DROP SUBSCRIPTION but missed covering the case where the
subscription is not associated with replication slots for both apply
worker and tablesync worker.

Also add a test to verify that the stats for slot-less subscription is
removed at DROP SUBSCRIPTION time.

Backpatch down to 15.

Author: Masahiko Sawada
Reviewed-by: Nathan Bossart, Hayato Kuroda, Melih Mutlu
Discussion: https://postgr.es/m/CAD21AoB71zkP7uPT7JDPsZcvp0749ExEQnOJxeNKPDFisHar+w@mail.gmail.com
Backpatch-through: 15
---
 src/backend/commands/subscriptioncmds.c | 12 ++++++------
 src/test/subscription/t/026_stats.pl    | 20 ++++++++++++++++++++
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index e8b288d01c..128d03608f 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1646,6 +1646,12 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
 	ReplicationOriginNameForLogicalRep(subid, InvalidOid, originname, sizeof(originname));
 	replorigin_drop_by_name(originname, true, false);
 
+	/*
+	 * Tell the cumulative stats system that the subscription is getting
+	 * dropped.
+	 */
+	pgstat_drop_subscription(subid);
+
 	/*
 	 * If there is no slot associated with the subscription, we can finish
 	 * here.
@@ -1734,12 +1740,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
 	}
 	PG_END_TRY();
 
-	/*
-	 * Tell the cumulative stats system that the subscription is getting
-	 * dropped.
-	 */
-	pgstat_drop_subscription(subid);
-
 	table_close(rel, NoLock);
 }
 
diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index 96a6d686eb..f24b27b09c 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -267,6 +267,26 @@ is( $node_subscriber->safe_psql(
 	qq(f),
 	qq(Subscription stats for subscription '$sub1_name' should be removed.));
 
+# Get subscription 2 oid
+my $sub2_oid = $node_subscriber->safe_psql($db,
+	qq(SELECT oid FROM pg_subscription WHERE subname = '$sub2_name'));
+
+# Diassociate the subscription 2 from its replication slot and drop it
+$node_subscriber->safe_psql(
+	$db,
+	qq(
+ALTER SUBSCRIPTION $sub2_name DISABLE;
+ALTER SUBSCRIPTION $sub2_name SET (slot_name = NONE);
+DROP SUBSCRIPTION $sub2_name;
+			    ));
+
+# Subscription stats for sub2 should be gone
+is( $node_subscriber->safe_psql(
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid))),
+	qq(f),
+	qq(Subscription stats for subscription '$sub2_name' should be removed.));
+$node_publisher->safe_psql($db,
+	qq(SELECT pg_drop_replication_slot('$sub2_name')));
 
 $node_subscriber->stop('fast');
 $node_publisher->stop('fast');
-- 
2.31.1

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#6)
Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

On Tue, May 16, 2023 at 8:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 11, 2023 at 5:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

After thinking more about it, I realized that this is not a problem
specific to HEAD. ISTM the problem is that by commit 7b64e4b3, we drop
the stats entry of subscription that is not associated with a
replication slot for apply worker, but we missed the case where the
subscription is not associated with both replication slots for apply
and tablesync. So IIUC we should backpatch it down to 15.

I agree that it should be backpatched to 15.

Since in pg15, since we don't create the subscription stats at CREATE
SUBSCRIPTION time but do when the first error is reported,

AFAICS, the call to pgstat_create_subscription() is present in
CreateSubscription() in 15 as well, so, I don't get your point.

--
With Regards,
Amit Kapila.

#8Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#7)
Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

On Sat, Jun 17, 2023 at 6:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 16, 2023 at 8:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 11, 2023 at 5:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

After thinking more about it, I realized that this is not a problem
specific to HEAD. ISTM the problem is that by commit 7b64e4b3, we drop
the stats entry of subscription that is not associated with a
replication slot for apply worker, but we missed the case where the
subscription is not associated with both replication slots for apply
and tablesync. So IIUC we should backpatch it down to 15.

I agree that it should be backpatched to 15.

Since in pg15, since we don't create the subscription stats at CREATE
SUBSCRIPTION time but do when the first error is reported,

AFAICS, the call to pgstat_create_subscription() is present in
CreateSubscription() in 15 as well, so, I don't get your point.

IIUC in 15, pgstat_create_subscription() doesn't create the stats
entry. See commit e0b01429590.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#8)
Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

On Mon, Jun 19, 2023 at 6:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Jun 17, 2023 at 6:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 16, 2023 at 8:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 11, 2023 at 5:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

After thinking more about it, I realized that this is not a problem
specific to HEAD. ISTM the problem is that by commit 7b64e4b3, we drop
the stats entry of subscription that is not associated with a
replication slot for apply worker, but we missed the case where the
subscription is not associated with both replication slots for apply
and tablesync. So IIUC we should backpatch it down to 15.

I agree that it should be backpatched to 15.

Since in pg15, since we don't create the subscription stats at CREATE
SUBSCRIPTION time but do when the first error is reported,

AFAICS, the call to pgstat_create_subscription() is present in
CreateSubscription() in 15 as well, so, I don't get your point.

IIUC in 15, pgstat_create_subscription() doesn't create the stats
entry. See commit e0b01429590.

Thanks for the clarification. Your changes looks good to me though I
haven't tested it.

--
With Regards,
Amit Kapila.

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#9)
Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

On Mon, Jun 19, 2023 at 12:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 19, 2023 at 6:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Jun 17, 2023 at 6:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 16, 2023 at 8:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 11, 2023 at 5:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

After thinking more about it, I realized that this is not a problem
specific to HEAD. ISTM the problem is that by commit 7b64e4b3, we drop
the stats entry of subscription that is not associated with a
replication slot for apply worker, but we missed the case where the
subscription is not associated with both replication slots for apply
and tablesync. So IIUC we should backpatch it down to 15.

I agree that it should be backpatched to 15.

Since in pg15, since we don't create the subscription stats at CREATE
SUBSCRIPTION time but do when the first error is reported,

AFAICS, the call to pgstat_create_subscription() is present in
CreateSubscription() in 15 as well, so, I don't get your point.

IIUC in 15, pgstat_create_subscription() doesn't create the stats
entry. See commit e0b01429590.

Thanks for the clarification. Your changes looks good to me though I
haven't tested it.

Thanks for reviewing the patch. Pushed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#11Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#10)
RE: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

Thanks for reviewing the patch. Pushed.

My colleague Vignesh noticed that the newly added test cases were failing in BF animal sungazer[1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&amp;dt=2023-09-02%2002%3A17%3A01.

The test failed to drop the slot which is active on publisher.

--error-log--
This failure is because pg_drop_replication_slot fails with "replication slot "test_tab2_sub" is active for PID 55771638":
2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG: statement: SELECT pg_drop_replication_slot('test_tab2_sub')
2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR: replication slot "test_tab2_sub" is active for PID 55771638
2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT: SELECT pg_drop_replication_slot('test_tab2_sub')
-------------

I the reason is that the test DISABLEd the subscription before dropping the
slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for the walsender to
release the slot, so it's possible that the walsender is still alive when calling
pg_drop_replication_slot() to drop the slot on publisher(pg_drop_xxxslot()
doesn't wait for slot to be released).

I think we can wait for the slot to become inactive before dropping like:
$node_primary->poll_query_until('otherdb',
"SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots WHERE active_pid IS NOT NULL)"
)

Or we can just don't drop the slot as it’s the last testcase.

Another thing might be worth considering is we can add one parameter in
pg_drop_replication_slot() to let user control whether to wait or not, and the
test can be fixed as well by passing nowait=false to the func.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&amp;dt=2023-09-02%2002%3A17%3A01

Best Regards,
Hou zj

#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#11)
1 attachment(s)
Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

Hi,

On Mon, Sep 4, 2023 at 9:38 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

Thanks for reviewing the patch. Pushed.

My colleague Vignesh noticed that the newly added test cases were failing in BF animal sungazer[1].

Thank you for reporting!

The test failed to drop the slot which is active on publisher.

--error-log--
This failure is because pg_drop_replication_slot fails with "replication slot "test_tab2_sub" is active for PID 55771638":
2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG: statement: SELECT pg_drop_replication_slot('test_tab2_sub')
2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR: replication slot "test_tab2_sub" is active for PID 55771638
2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT: SELECT pg_drop_replication_slot('test_tab2_sub')
-------------

I the reason is that the test DISABLEd the subscription before dropping the
slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for the walsender to
release the slot, so it's possible that the walsender is still alive when calling
pg_drop_replication_slot() to drop the slot on publisher(pg_drop_xxxslot()
doesn't wait for slot to be released).

I agree with your analysis.

I think we can wait for the slot to become inactive before dropping like:
$node_primary->poll_query_until('otherdb',
"SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots WHERE active_pid IS NOT NULL)"
)

I prefer this approach but it would be better to specify the slot name
in the query.

Or we can just don't drop the slot as it’s the last testcase.

Since we might add other tests after that in the future, I think it's
better to drop the replication slot (and subscription).

Another thing might be worth considering is we can add one parameter in
pg_drop_replication_slot() to let user control whether to wait or not, and the
test can be fixed as well by passing nowait=false to the func.

While it could be useful in general we cannot use this approach for
this issue since it cannot be backpatched to older branches. We might
want to discuss it on a new thread.

I've attached a draft patch to fix this issue. Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

fix.patchapplication/octet-stream; name=fix.patchDownload
diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index 0bcda006cd..a033588008 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -285,6 +285,14 @@ is( $node_subscriber->safe_psql(
 		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub2_name' should be removed.));
+
+# Since disabling subscription doesn't wait for walsender to release the replication
+# slot and exit, wait for the slot to become inactive.
+$node_publisher->poll_query_until(
+	$db,
+	qq(SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = '$sub2_name' AND active_pid IS NULL))
+) or die "slot never became inactive";
+
 $node_publisher->safe_psql($db,
 	qq(SELECT pg_drop_replication_slot('$sub2_name')));
 
#13Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#12)
RE: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

On Monday, September 4, 2023 10:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

On Mon, Sep 4, 2023 at 9:38 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
wrote:

On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

Thanks for reviewing the patch. Pushed.

My colleague Vignesh noticed that the newly added test cases were failing in

BF animal sungazer[1].

Thank you for reporting!

The test failed to drop the slot which is active on publisher.

--error-log--
This failure is because pg_drop_replication_slot fails with "replication slot

"test_tab2_sub" is active for PID 55771638":

2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG: statement:
SELECT pg_drop_replication_slot('test_tab2_sub')
2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR:
replication slot "test_tab2_sub" is active for PID 55771638
2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT:
SELECT pg_drop_replication_slot('test_tab2_sub')
-------------

I the reason is that the test DISABLEd the subscription before
dropping the slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for
the walsender to release the slot, so it's possible that the walsender
is still alive when calling
pg_drop_replication_slot() to drop the slot on
publisher(pg_drop_xxxslot() doesn't wait for slot to be released).

I agree with your analysis.

I think we can wait for the slot to become inactive before dropping like:
$node_primary->poll_query_until('otherdb',
"SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots

WHERE active_pid IS NOT NULL)"

)

I prefer this approach but it would be better to specify the slot name in the
query.

Or we can just don't drop the slot as it’s the last testcase.

Since we might add other tests after that in the future, I think it's better to drop
the replication slot (and subscription).

Another thing might be worth considering is we can add one parameter
in
pg_drop_replication_slot() to let user control whether to wait or not,
and the test can be fixed as well by passing nowait=false to the func.

While it could be useful in general we cannot use this approach for this issue
since it cannot be backpatched to older branches. We might want to discuss it
on a new thread.

I've attached a draft patch to fix this issue. Feedback is very welcome.

Thanks, it looks good to me.

Best Regards,
Hou zj

#14Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#13)
1 attachment(s)
Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

On Tue, Sep 5, 2023 at 11:32 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Monday, September 4, 2023 10:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

On Mon, Sep 4, 2023 at 9:38 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
wrote:

On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

Thanks for reviewing the patch. Pushed.

My colleague Vignesh noticed that the newly added test cases were failing in

BF animal sungazer[1].

Thank you for reporting!

The test failed to drop the slot which is active on publisher.

--error-log--
This failure is because pg_drop_replication_slot fails with "replication slot

"test_tab2_sub" is active for PID 55771638":

2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG: statement:
SELECT pg_drop_replication_slot('test_tab2_sub')
2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR:
replication slot "test_tab2_sub" is active for PID 55771638
2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT:
SELECT pg_drop_replication_slot('test_tab2_sub')
-------------

I the reason is that the test DISABLEd the subscription before
dropping the slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for
the walsender to release the slot, so it's possible that the walsender
is still alive when calling
pg_drop_replication_slot() to drop the slot on
publisher(pg_drop_xxxslot() doesn't wait for slot to be released).

I agree with your analysis.

I think we can wait for the slot to become inactive before dropping like:
$node_primary->poll_query_until('otherdb',
"SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots

WHERE active_pid IS NOT NULL)"

)

I prefer this approach but it would be better to specify the slot name in the
query.

Or we can just don't drop the slot as it’s the last testcase.

Since we might add other tests after that in the future, I think it's better to drop
the replication slot (and subscription).

Another thing might be worth considering is we can add one parameter
in
pg_drop_replication_slot() to let user control whether to wait or not,
and the test can be fixed as well by passing nowait=false to the func.

While it could be useful in general we cannot use this approach for this issue
since it cannot be backpatched to older branches. We might want to discuss it
on a new thread.

I've attached a draft patch to fix this issue. Feedback is very welcome.

Thanks, it looks good to me.

Thank you for reviewing the patch.

I'll push the attached patch to master, v16, and v15 tomorrow, barring
any objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Stabilize-subscription-stats-test.patchapplication/octet-stream; name=v1-0001-Stabilize-subscription-stats-test.patchDownload
From ced53920dbdc77215c7121cde165c2d500f85253 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 7 Sep 2023 16:00:20 +0900
Subject: [PATCH v1] Stabilize subscription stats test.

The new test added by commits 68a59f9e9 et al disables the
subscription and manually drops the associated replication
slot. However, since disabling the subsubscription doesn't wait for a
walsender to release the replication slot and exit,
pg_drop_replication_slot() could fail. Avoid failure by adding a wait
for the replication slot to become inactive.

Reported-by: Hou Zhijie, as per buildfarm
Reviewed-by: Hou Zhijie
Discussion: https://postgr.es/m/OS0PR01MB571682316378379AA34854F694E9A%40OS0PR01MB5716.jpnprd01.prod.outlook.com
Backpatch-through: 15
---
 src/test/subscription/t/026_stats.pl | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index 0bcda006cd..61d9578b65 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -285,6 +285,13 @@ is( $node_subscriber->safe_psql(
 		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub2_name' should be removed.));
+
+# Since disabling subscription doesn't wait for walsender to release the replication
+# slot and exit, wait for the slot to become inactive.
+$node_publisher->poll_query_until($db,
+	qq(SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = '$sub2_name' AND active_pid IS NULL))
+) or die "slot never became inactive";
+
 $node_publisher->safe_psql($db,
 	qq(SELECT pg_drop_replication_slot('$sub2_name')));
 
-- 
2.31.1

#15Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#14)
Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

On Thu, Sep 7, 2023 at 10:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Sep 5, 2023 at 11:32 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Monday, September 4, 2023 10:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

On Mon, Sep 4, 2023 at 9:38 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
wrote:

On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

Thanks for reviewing the patch. Pushed.

My colleague Vignesh noticed that the newly added test cases were failing in

BF animal sungazer[1].

Thank you for reporting!

The test failed to drop the slot which is active on publisher.

--error-log--
This failure is because pg_drop_replication_slot fails with "replication slot

"test_tab2_sub" is active for PID 55771638":

2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG: statement:
SELECT pg_drop_replication_slot('test_tab2_sub')
2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR:
replication slot "test_tab2_sub" is active for PID 55771638
2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT:
SELECT pg_drop_replication_slot('test_tab2_sub')
-------------

I the reason is that the test DISABLEd the subscription before
dropping the slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for
the walsender to release the slot, so it's possible that the walsender
is still alive when calling
pg_drop_replication_slot() to drop the slot on
publisher(pg_drop_xxxslot() doesn't wait for slot to be released).

I agree with your analysis.

I think we can wait for the slot to become inactive before dropping like:
$node_primary->poll_query_until('otherdb',
"SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots

WHERE active_pid IS NOT NULL)"

)

I prefer this approach but it would be better to specify the slot name in the
query.

Or we can just don't drop the slot as it’s the last testcase.

Since we might add other tests after that in the future, I think it's better to drop
the replication slot (and subscription).

Another thing might be worth considering is we can add one parameter
in
pg_drop_replication_slot() to let user control whether to wait or not,
and the test can be fixed as well by passing nowait=false to the func.

While it could be useful in general we cannot use this approach for this issue
since it cannot be backpatched to older branches. We might want to discuss it
on a new thread.

I've attached a draft patch to fix this issue. Feedback is very welcome.

Thanks, it looks good to me.

Thank you for reviewing the patch.

I'll push the attached patch to master, v16, and v15 tomorrow, barring
any objections.

Pushed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com