Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Started by Shubham Khannaabout 1 year ago36 messages
#1Shubham Khanna
khannashubham1197@gmail.com
1 attachment(s)

Hi,

Currently, there is a risk that pg_createsubscriber may fail to
complete successfully when the max_slot_wal_keep_size value is set too
low. This can occur if the WAL is removed before the standby using the
replication slot is able to complete replication, as the required WAL
files are no longer available.

I was able to reproduce this issue using the following steps:
Set up a streaming replication environment.
Run pg_createsubscriber in a debugger.
Pause pg_createsubscriber at the setup_recovery stage.
Perform several operations on the primary node to generate a large
volume of WAL, causing older WAL segments to be removed due to the low
max_slot_wal_keep_size setting.
Once the necessary WAL segments are deleted, continue the execution of
pg_createsubscriber.
At this point, pg_createsubscriber fails with the following error:
2024-12-29 01:21:37.590 IST [427353] FATAL: could not receive data
from WAL stream: ERROR: requested WAL segment
000000010000000000000003 has already been removed
2024-12-29 01:21:37.592 IST [427345] LOG: waiting for WAL to become
available at 0/3000110
2024-12-29 01:21:42.593 IST [427358] LOG: started streaming WAL from
primary at 0/3000000 on timeline 1
2024-12-29 01:21:42.593 IST [427358] FATAL: could not receive data
from WAL stream: ERROR: requested WAL segment
000000010000000000000003 has already been removed

This issue was previously reported in [1]/messages/by-id/TY3PR01MB9889FEDBBF74A9F79CA7CC87F57A2@TY3PR01MB9889.jpnprd01.prod.outlook.com, with a suggestion to raise
a warning in [2]/messages/by-id/be92c57b-82e1-4920-ac31-a8a04206db7b@app.fastmail.com. I’ve implemented a patch that logs a warning in
dry-run mode. This will give users the opportunity to adjust the
max_slot_wal_keep_size value before running the command.

Thoughts?

[1]: /messages/by-id/TY3PR01MB9889FEDBBF74A9F79CA7CC87F57A2@TY3PR01MB9889.jpnprd01.prod.outlook.com
[2]: /messages/by-id/be92c57b-82e1-4920-ac31-a8a04206db7b@app.fastmail.com

Thanks and regards,
Shubham Khanna.

Attachments:

v1-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchapplication/octet-stream; name=v1-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchDownload
From 602315443dab0603bb4316eefc523485a0dc8fee Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Tue, 24 Dec 2024 12:27:27 +0530
Subject: [PATCH v1] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

This patch updates 'pg_createsubscriber' to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher.
The test case is added to validate correct warning reporting when
'max_slot_wal_keep_size' is misconfigured.

These changes ensures that logical replication does not fail due to improper
configuration of 'max_slot_wal_keep_size' on the publisher.
---
 src/bin/pg_basebackup/pg_createsubscriber.c      | 15 ++++++++++++++-
 .../pg_basebackup/t/040_pg_createsubscriber.pl   | 16 ++++++++++++++--
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index e96370a9ec..1fa37d80fe 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	int			max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -880,7 +881,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.current_setting('max_slot_wal_keep_size')");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +897,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = atoi(PQgetvalue(res, 0, 6));
 
 	PQclear(res);
 
@@ -905,6 +908,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %d",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +944,14 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+	/* Validate max_slot_wal_keep_size */
+	if (max_slot_wal_keep_size != -1)
+	{
+		pg_log_warning("publisher requires 'max_slot_wal_keep_size = -1', but only %d remain",
+					   max_slot_wal_keep_size);
+		pg_log_warning_detail("Change the 'max_slot_wal_keep_size' configuration on the publisher to -1.");
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 0a900edb65..569065a579 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -318,8 +318,13 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
+# Configure 'max_slot_wal_keep_size = 1' on the publisher and verify the
+# warning message
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1');
+$node_p->restart;
+
 # dry run mode on node S
-command_ok(
+command_checks_all(
 	[
 		'pg_createsubscriber', '--verbose',
 		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
@@ -335,7 +340,14 @@ command_ok(
 		$db1, '--database',
 		$db2
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	0,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: warning: publisher requires 'max_slot_wal_keep_size = -1'/,
+		qr/Change the 'max_slot_wal_keep_size' configuration on the publisher to -1./,
+	],
+	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
+);
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.34.1

#2vignesh C
vignesh21@gmail.com
In reply to: Shubham Khanna (#1)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Mon, 30 Dec 2024 at 09:34, Shubham Khanna
<khannashubham1197@gmail.com> wrote:

Hi,

Currently, there is a risk that pg_createsubscriber may fail to
complete successfully when the max_slot_wal_keep_size value is set too
low. This can occur if the WAL is removed before the standby using the
replication slot is able to complete replication, as the required WAL
files are no longer available.

I was able to reproduce this issue using the following steps:
Set up a streaming replication environment.
Run pg_createsubscriber in a debugger.
Pause pg_createsubscriber at the setup_recovery stage.
Perform several operations on the primary node to generate a large
volume of WAL, causing older WAL segments to be removed due to the low
max_slot_wal_keep_size setting.
Once the necessary WAL segments are deleted, continue the execution of
pg_createsubscriber.
At this point, pg_createsubscriber fails with the following error:
2024-12-29 01:21:37.590 IST [427353] FATAL: could not receive data
from WAL stream: ERROR: requested WAL segment
000000010000000000000003 has already been removed
2024-12-29 01:21:37.592 IST [427345] LOG: waiting for WAL to become
available at 0/3000110
2024-12-29 01:21:42.593 IST [427358] LOG: started streaming WAL from
primary at 0/3000000 on timeline 1
2024-12-29 01:21:42.593 IST [427358] FATAL: could not receive data
from WAL stream: ERROR: requested WAL segment
000000010000000000000003 has already been removed

This issue was previously reported in [1], with a suggestion to raise
a warning in [2]. I’ve implemented a patch that logs a warning in
dry-run mode. This will give users the opportunity to adjust the
max_slot_wal_keep_size value before running the command.

Thoughts?

+1 for throwing a warning in dry-run mode

Few comments:
1) We can have this check only in dry-run mode, it is not required in
non dry-run mode as there is nothing much user can do once the tool is
running, we can change this:
+       if (max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }
to:
+       if (dry_run && max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }
2) This error message is not quite right, can we change it to
"publisher requires max_slot_wal_keep_size to be -1, but is set to %d"
+       if (max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }

3) Also the configuration could be specified in format specifier like
it is specified in the earlier case

Regards,
Vignesh

#3Shubham Khanna
khannashubham1197@gmail.com
In reply to: vignesh C (#2)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Mon, Dec 30, 2024 at 10:10 AM vignesh C <vignesh21@gmail.com> wrote:

On Mon, 30 Dec 2024 at 09:34, Shubham Khanna
<khannashubham1197@gmail.com> wrote:

Hi,

Currently, there is a risk that pg_createsubscriber may fail to
complete successfully when the max_slot_wal_keep_size value is set too
low. This can occur if the WAL is removed before the standby using the
replication slot is able to complete replication, as the required WAL
files are no longer available.

I was able to reproduce this issue using the following steps:
Set up a streaming replication environment.
Run pg_createsubscriber in a debugger.
Pause pg_createsubscriber at the setup_recovery stage.
Perform several operations on the primary node to generate a large
volume of WAL, causing older WAL segments to be removed due to the low
max_slot_wal_keep_size setting.
Once the necessary WAL segments are deleted, continue the execution of
pg_createsubscriber.
At this point, pg_createsubscriber fails with the following error:
2024-12-29 01:21:37.590 IST [427353] FATAL: could not receive data
from WAL stream: ERROR: requested WAL segment
000000010000000000000003 has already been removed
2024-12-29 01:21:37.592 IST [427345] LOG: waiting for WAL to become
available at 0/3000110
2024-12-29 01:21:42.593 IST [427358] LOG: started streaming WAL from
primary at 0/3000000 on timeline 1
2024-12-29 01:21:42.593 IST [427358] FATAL: could not receive data
from WAL stream: ERROR: requested WAL segment
000000010000000000000003 has already been removed

This issue was previously reported in [1], with a suggestion to raise
a warning in [2]. I’ve implemented a patch that logs a warning in
dry-run mode. This will give users the opportunity to adjust the
max_slot_wal_keep_size value before running the command.

Thoughts?

+1 for throwing a warning in dry-run mode

Few comments:
1) We can have this check only in dry-run mode, it is not required in
non dry-run mode as there is nothing much user can do once the tool is
running, we can change this:
+       if (max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }
to:
+       if (dry_run && max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }
2) This error message is not quite right, can we change it to
"publisher requires max_slot_wal_keep_size to be -1, but is set to %d"
+       if (max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }

3) Also the configuration could be specified in format specifier like
it is specified in the earlier case

I have fixed the given comments. The attached patch contains the
suggested changes.

Thanks and regards,
Shubham Khanna.

Attachments:

v2-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchapplication/octet-stream; name=v2-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchDownload
From afc12bc405bdb008bacbfb8d81dc2d7533232b65 Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Tue, 24 Dec 2024 12:27:27 +0530
Subject: [PATCH v2] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

This patch updates 'pg_createsubscriber' to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher.
The test case is added to validate correct warning reporting when
'max_slot_wal_keep_size' is misconfigured.

These changes ensures that logical replication does not fail due to improper
configuration of 'max_slot_wal_keep_size' on the publisher.
---
 src/bin/pg_basebackup/pg_createsubscriber.c      | 16 +++++++++++++++-
 .../pg_basebackup/t/040_pg_createsubscriber.pl   | 16 ++++++++++++++--
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index e96370a9ec..34d88900b4 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	int			max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -880,7 +881,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.current_setting('max_slot_wal_keep_size')");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +897,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = atoi(PQgetvalue(res, 0, 6));
 
 	PQclear(res);
 
@@ -905,6 +908,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %d",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +944,15 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+	/* Validate max_slot_wal_keep_size */
+	if (dry_run && max_slot_wal_keep_size != -1)
+	{
+		pg_log_warning("publisher requires max_slot_wal_keep_size to be -1, but is set to %d",
+					   max_slot_wal_keep_size);
+		pg_log_warning_detail("Change the configuration parameter \"%s\" on the publisher to %d.",
+							  "max_slot_wal_keep_size", -1);
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 0a900edb65..daa2c8b25b 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -318,8 +318,13 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
+# Configure 'max_slot_wal_keep_size = 1' on the publisher and verify the
+# warning message
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1');
+$node_p->restart;
+
 # dry run mode on node S
-command_ok(
+command_checks_all(
 	[
 		'pg_createsubscriber', '--verbose',
 		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
@@ -335,7 +340,14 @@ command_ok(
 		$db1, '--database',
 		$db2
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	0,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: warning: publisher requires max_slot_wal_keep_size to be -1/,
+		qr/Change the configuration parameter "max_slot_wal_keep_size" on the publisher to -1./,
+	],
+	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
+);
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.34.1

#4vignesh C
vignesh21@gmail.com
In reply to: Shubham Khanna (#3)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Mon, 30 Dec 2024 at 12:04, Shubham Khanna
<khannashubham1197@gmail.com> wrote:

On Mon, Dec 30, 2024 at 10:10 AM vignesh C <vignesh21@gmail.com> wrote:

On Mon, 30 Dec 2024 at 09:34, Shubham Khanna
<khannashubham1197@gmail.com> wrote:

Hi,

Currently, there is a risk that pg_createsubscriber may fail to
complete successfully when the max_slot_wal_keep_size value is set too
low. This can occur if the WAL is removed before the standby using the
replication slot is able to complete replication, as the required WAL
files are no longer available.

I was able to reproduce this issue using the following steps:
Set up a streaming replication environment.
Run pg_createsubscriber in a debugger.
Pause pg_createsubscriber at the setup_recovery stage.
Perform several operations on the primary node to generate a large
volume of WAL, causing older WAL segments to be removed due to the low
max_slot_wal_keep_size setting.
Once the necessary WAL segments are deleted, continue the execution of
pg_createsubscriber.
At this point, pg_createsubscriber fails with the following error:
2024-12-29 01:21:37.590 IST [427353] FATAL: could not receive data
from WAL stream: ERROR: requested WAL segment
000000010000000000000003 has already been removed
2024-12-29 01:21:37.592 IST [427345] LOG: waiting for WAL to become
available at 0/3000110
2024-12-29 01:21:42.593 IST [427358] LOG: started streaming WAL from
primary at 0/3000000 on timeline 1
2024-12-29 01:21:42.593 IST [427358] FATAL: could not receive data
from WAL stream: ERROR: requested WAL segment
000000010000000000000003 has already been removed

This issue was previously reported in [1], with a suggestion to raise
a warning in [2]. I’ve implemented a patch that logs a warning in
dry-run mode. This will give users the opportunity to adjust the
max_slot_wal_keep_size value before running the command.

Thoughts?

+1 for throwing a warning in dry-run mode

Few comments:
1) We can have this check only in dry-run mode, it is not required in
non dry-run mode as there is nothing much user can do once the tool is
running, we can change this:
+       if (max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }
to:
+       if (dry_run && max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }
2) This error message is not quite right, can we change it to
"publisher requires max_slot_wal_keep_size to be -1, but is set to %d"
+       if (max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }

3) Also the configuration could be specified in format specifier like
it is specified in the earlier case

I have fixed the given comments. The attached patch contains the
suggested changes.

Few comments:
1) Since this configuration will be updated after reload, you can
reload instead of restarting:
+# Configure 'max_slot_wal_keep_size = 1' on the publisher and verify the
+# warning message
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1');
+$node_p->restart;
2) You can reset max_slot_wal_keep_size configuration after this test:
+       0,
+       [qr/./],
+       [
+               qr/pg_createsubscriber: warning: publisher requires
max_slot_wal_keep_size to be -1/,
+               qr/Change the configuration parameter
"max_slot_wal_keep_size" on the publisher to -1./,
+       ],
+       'Validate warning for misconfigured max_slot_wal_keep_size on
the publisher'
+);
3) We could update the comment to also mention the possibility of
required wal files being deleted with non-default
max_slot_wal_keep_size.
+       /* Validate max_slot_wal_keep_size */
+       if (dry_run && max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
max_slot_wal_keep_size to be -1, but is set to %d",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the configuration
parameter \"%s\" on the publisher to %d.",
+
"max_slot_wal_keep_size", -1);
+       }

4) You can update the commit message saying "the possibility of
required wal files being deleted with non-default
max_slot_wal_keep_size" and also mention that this warning will be
raised in dry_run mode.

Regards,
Vignesh

#5Shubham Khanna
khannashubham1197@gmail.com
In reply to: vignesh C (#4)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Mon, Dec 30, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote:

On Mon, 30 Dec 2024 at 12:04, Shubham Khanna
<khannashubham1197@gmail.com> wrote:

On Mon, Dec 30, 2024 at 10:10 AM vignesh C <vignesh21@gmail.com> wrote:

On Mon, 30 Dec 2024 at 09:34, Shubham Khanna
<khannashubham1197@gmail.com> wrote:

Hi,

Currently, there is a risk that pg_createsubscriber may fail to
complete successfully when the max_slot_wal_keep_size value is set too
low. This can occur if the WAL is removed before the standby using the
replication slot is able to complete replication, as the required WAL
files are no longer available.

I was able to reproduce this issue using the following steps:
Set up a streaming replication environment.
Run pg_createsubscriber in a debugger.
Pause pg_createsubscriber at the setup_recovery stage.
Perform several operations on the primary node to generate a large
volume of WAL, causing older WAL segments to be removed due to the low
max_slot_wal_keep_size setting.
Once the necessary WAL segments are deleted, continue the execution of
pg_createsubscriber.
At this point, pg_createsubscriber fails with the following error:
2024-12-29 01:21:37.590 IST [427353] FATAL: could not receive data
from WAL stream: ERROR: requested WAL segment
000000010000000000000003 has already been removed
2024-12-29 01:21:37.592 IST [427345] LOG: waiting for WAL to become
available at 0/3000110
2024-12-29 01:21:42.593 IST [427358] LOG: started streaming WAL from
primary at 0/3000000 on timeline 1
2024-12-29 01:21:42.593 IST [427358] FATAL: could not receive data
from WAL stream: ERROR: requested WAL segment
000000010000000000000003 has already been removed

This issue was previously reported in [1], with a suggestion to raise
a warning in [2]. I’ve implemented a patch that logs a warning in
dry-run mode. This will give users the opportunity to adjust the
max_slot_wal_keep_size value before running the command.

Thoughts?

+1 for throwing a warning in dry-run mode

Few comments:
1) We can have this check only in dry-run mode, it is not required in
non dry-run mode as there is nothing much user can do once the tool is
running, we can change this:
+       if (max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }
to:
+       if (dry_run && max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }
2) This error message is not quite right, can we change it to
"publisher requires max_slot_wal_keep_size to be -1, but is set to %d"
+       if (max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }

3) Also the configuration could be specified in format specifier like
it is specified in the earlier case

I have fixed the given comments. The attached patch contains the
suggested changes.

Few comments:
1) Since this configuration will be updated after reload, you can
reload instead of restarting:
+# Configure 'max_slot_wal_keep_size = 1' on the publisher and verify the
+# warning message
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1');
+$node_p->restart;
2) You can reset max_slot_wal_keep_size configuration after this test:
+       0,
+       [qr/./],
+       [
+               qr/pg_createsubscriber: warning: publisher requires
max_slot_wal_keep_size to be -1/,
+               qr/Change the configuration parameter
"max_slot_wal_keep_size" on the publisher to -1./,
+       ],
+       'Validate warning for misconfigured max_slot_wal_keep_size on
the publisher'
+);
3) We could update the comment to also mention the possibility of
required wal files being deleted with non-default
max_slot_wal_keep_size.
+       /* Validate max_slot_wal_keep_size */
+       if (dry_run && max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
max_slot_wal_keep_size to be -1, but is set to %d",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the configuration
parameter \"%s\" on the publisher to %d.",
+
"max_slot_wal_keep_size", -1);
+       }

4) You can update the commit message saying "the possibility of
required wal files being deleted with non-default
max_slot_wal_keep_size" and also mention that this warning will be
raised in dry_run mode.

I have fixed the given comments. The attached patch contains the
suggested changes.

Thanks and regards,
Shubham Khanna.

Attachments:

v3-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchapplication/octet-stream; name=v3-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchDownload
From 0be6949498ec53011a6e62733e31a42dc2a754d8 Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Tue, 24 Dec 2024 12:27:27 +0530
Subject: [PATCH v3] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

By ensuring 'max_slot_wal_keep_size' is -1, this patch prevents the potential
deletion of required WAL files on the publisher that could disrupt logical
replication. A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.

These changes ensure that logical replication does not fail due to improper
configuration of 'max_slot_wal_keep_size' on the publisher.
---
 src/bin/pg_basebackup/pg_createsubscriber.c   | 23 ++++++++++++++++++-
 .../t/040_pg_createsubscriber.pl              | 23 ++++++++++++++++---
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index e96370a9ec..057ffc64eb 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	int			max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -880,7 +881,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.current_setting('max_slot_wal_keep_size')");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +897,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = atoi(PQgetvalue(res, 0, 6));
 
 	PQclear(res);
 
@@ -905,6 +908,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %d",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +944,22 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+/*
+ * Validate max_slot_wal_keep_size
+ * Logical replication requires max_slot_wal_keep_size to be set to -1 on the
+ * publisher to prevent the deletion of WAL files that are still needed by
+ * replication slots. If this parameter is set to a non-default value, it may
+ * cause replication failures due to required WAL files being
+ * prematurely removed.
+ */
+	if (dry_run && max_slot_wal_keep_size != -1)
+	{
+		pg_log_warning("publisher requires max_slot_wal_keep_size to be -1, but is set to %d",
+					   max_slot_wal_keep_size);
+		pg_log_warning_detail("Change the configuration parameter \"%s\" on the publisher to %d.",
+							  "max_slot_wal_keep_size", -1);
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 0a900edb65..9f57ceb085 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -318,8 +318,14 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
-# dry run mode on node S
-command_ok(
+# Configure 'max_slot_wal_keep_size = 1' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1');
+$node_p->reload;
+
+# dry run mode on node S and verify the warning message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(
 	[
 		'pg_createsubscriber', '--verbose',
 		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
@@ -335,7 +341,18 @@ command_ok(
 		$db1, '--database',
 		$db2
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	0,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: warning: publisher requires max_slot_wal_keep_size to be -1/,
+		qr/Change the configuration parameter "max_slot_wal_keep_size" on the publisher to -1./,
+	],
+	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
+);
+
+# Reset 'max_slot_wal_keep_size' to default after the test
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = -1');
+$node_p->reload;
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.41.0.windows.3

#6Peter Smith
smithpb2250@gmail.com
In reply to: Shubham Khanna (#5)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Hi Shubham.

Here are some review comments for the patch v3-0001.

======
Commit message.

1.
I can't pinpoint anything specifically wrong about this commit
message, but it seems to be repeating the same information over and
over.

======
Missing pg_createsubscriber documentation?

2.
I thought any prerequisite for 'max_slot_wal_keep_size' should be
documented here [1]https://www.postgresql.org/docs/17/app-pgcreatesubscriber.html along with the other setting requirements.

======
src/bin/pg_basebackup/pg_createsubscriber.c

3.
- " pg_catalog.current_setting('max_prepared_transactions')");
+ " pg_catalog.current_setting('max_prepared_transactions'),"
+ " pg_catalog.current_setting('max_slot_wal_keep_size')");

The code comment above this SQL looks like it should also mention the
'max_slot_wal_keep_size' value requirement.

~~~

4.
+/*
+ * Validate max_slot_wal_keep_size
+ * Logical replication requires max_slot_wal_keep_size to be set to -1 on the
+ * publisher to prevent the deletion of WAL files that are still needed by
+ * replication slots. If this parameter is set to a non-default value, it may
+ * cause replication failures due to required WAL files being
+ * prematurely removed.
+ */
+ if (dry_run && max_slot_wal_keep_size != -1)
+ {
+ pg_log_warning("publisher requires max_slot_wal_keep_size to be -1,
but is set to %d",
+    max_slot_wal_keep_size);
+ pg_log_warning_detail("Change the configuration parameter \"%s\" on
the publisher to %d.",
+   "max_slot_wal_keep_size", -1);
+ }
+

4a.
The code comment is not indented correctly.

~

4b.
It seems strange that the 'max_slot_wal_keep_size' GUC name is not
substituted in the pg_log_warning, but it is in the
pg_log_warning_detail.

Furthermore, GUC names appearing in messages should always be quoted.

~

4c.
Was there some reason to make this only a warning, instead of an error?

The risk "may cause replication failures" seems quite serious, so in
that case it might be a poor user experience if we are effectively
saying: "Too bad, it's all broken now; we warned you (only if you
tried a dry run), but you did not listen".

In other words, why not make this an error condition up-front to
entirely remove any chance of this failure?

======
[1]: https://www.postgresql.org/docs/17/app-pgcreatesubscriber.html

Kind Regards,
Peter Smith.
Fujitsu Australia

#7Shubham Khanna
khannashubham1197@gmail.com
In reply to: Peter Smith (#6)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Tue, Dec 31, 2024 at 4:29 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Shubham.

Here are some review comments for the patch v3-0001.

======
Commit message.

1.
I can't pinpoint anything specifically wrong about this commit
message, but it seems to be repeating the same information over and
over.

Updated the commit message by omitting the last line from the commit message.

======
Missing pg_createsubscriber documentation?

2.
I thought any prerequisite for 'max_slot_wal_keep_size' should be
documented here [1] along with the other setting requirements.

Updated the 'pg_createsubscriber' documentation and added the
Prerequisite for 'max_slot_wal_keep_size'.

======
src/bin/pg_basebackup/pg_createsubscriber.c

3.
- " pg_catalog.current_setting('max_prepared_transactions')");
+ " pg_catalog.current_setting('max_prepared_transactions'),"
+ " pg_catalog.current_setting('max_slot_wal_keep_size')");

The code comment above this SQL looks like it should also mention the
'max_slot_wal_keep_size' value requirement.

Added the 'max_slot_wal_keep_size' value requirement in the comments.

4.
+/*
+ * Validate max_slot_wal_keep_size
+ * Logical replication requires max_slot_wal_keep_size to be set to -1 on the
+ * publisher to prevent the deletion of WAL files that are still needed by
+ * replication slots. If this parameter is set to a non-default value, it may
+ * cause replication failures due to required WAL files being
+ * prematurely removed.
+ */
+ if (dry_run && max_slot_wal_keep_size != -1)
+ {
+ pg_log_warning("publisher requires max_slot_wal_keep_size to be -1,
but is set to %d",
+    max_slot_wal_keep_size);
+ pg_log_warning_detail("Change the configuration parameter \"%s\" on
the publisher to %d.",
+   "max_slot_wal_keep_size", -1);
+ }
+

4a.
The code comment is not indented correctly.

Fixed.

~

4b.
It seems strange that the 'max_slot_wal_keep_size' GUC name is not
substituted in the pg_log_warning, but it is in the
pg_log_warning_detail.

Furthermore, GUC names appearing in messages should always be quoted.

Fixed.

4c.
Was there some reason to make this only a warning, instead of an error?

The risk "may cause replication failures" seems quite serious, so in
that case it might be a poor user experience if we are effectively
saying: "Too bad, it's all broken now; we warned you (only if you
tried a dry run), but you did not listen".

In other words, why not make this an error condition up-front to
entirely remove any chance of this failure?

Changed the warning condition to the error condition and also updated
the test case accordingly.

======

The attached Patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.

Attachments:

v4-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchapplication/octet-stream; name=v4-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchDownload
From 3641c20b2f5fec59642bb8515cf017f8188c8058 Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Tue, 24 Dec 2024 12:27:27 +0530
Subject: [PATCH v4] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

By ensuring 'max_slot_wal_keep_size' is -1, this patch prevents the potential
deletion of required WAL files on the publisher that could disrupt logical
replication. A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.
---
 doc/src/sgml/ref/pg_createsubscriber.sgml     |  7 ++++++
 src/bin/pg_basebackup/pg_createsubscriber.c   | 25 ++++++++++++++++++-
 .../t/040_pg_createsubscriber.pl              | 23 ++++++++++++++---
 3 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index 26b8e64a4e..59bc97728e 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -313,6 +313,13 @@ PostgreSQL documentation
     linkend="guc-max-wal-senders"/> configured to a value greater than or equal
     to the number of specified databases and existing WAL sender processes.
    </para>
+
+   <para>
+    The 'max_slot_wal_keep_size' must be set to -1 to prevent the automatic
+    removal of WAL files needed by replication slots. Setting this parameter to
+    a specific size may lead to replication failures if required WAL files are
+    prematurely deleted.
+   </para>
   </refsect2>
 
   <refsect2>
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index e96370a9ec..3703dae33a 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	int			max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -872,6 +873,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	 * - wal_level = logical
 	 * - max_replication_slots >= current + number of dbs to be converted
 	 * - max_wal_senders >= current + number of dbs to be converted
+	 * - max_slot_wal_keep_size = -1 (to prevent deletion of required WAL files)
 	 * -----------------------------------------------------------------------
 	 */
 	res = PQexec(conn,
@@ -880,7 +882,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.current_setting('max_slot_wal_keep_size')");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +898,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = atoi(PQgetvalue(res, 0, 6));
 
 	PQclear(res);
 
@@ -905,6 +909,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %d",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +945,23 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+/*
+ * Validate max_slot_wal_keep_size
+ * Logical replication requires max_slot_wal_keep_size to be set to -1 on the
+ * publisher to prevent the deletion of WAL files that are still needed by
+ * replication slots. If this parameter is set to a non-default value, it may
+ * cause replication failures due to required WAL files being prematurely
+ * removed.
+ */
+	if (dry_run && max_slot_wal_keep_size != -1)
+	{
+		pg_log_error("publisher requires max_slot_wal_keep_size to be -1, but only %d remain",
+					 max_slot_wal_keep_size);
+		pg_log_error_hint("Change the configuration parameter \"%s\" on the publisher to %d.",
+						  "max_slot_wal_keep_size", -1);
+		failed = true;
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 0a900edb65..f4c1e079b1 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -318,8 +318,14 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
-# dry run mode on node S
-command_ok(
+# Configure 'max_slot_wal_keep_size = 1' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1');
+$node_p->reload;
+
+# dry run mode on node S and verify the error message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(
 	[
 		'pg_createsubscriber', '--verbose',
 		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
@@ -335,7 +341,18 @@ command_ok(
 		$db1, '--database',
 		$db2
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	1,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: error: publisher requires max_slot_wal_keep_size to be -1/,
+		qr/Change the configuration parameter "max_slot_wal_keep_size" on the publisher to -1./,
+	],
+	'Validate error for misconfigured max_slot_wal_keep_size on the publisher'
+);
+
+# Reset 'max_slot_wal_keep_size' to default after the test
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = -1');
+$node_p->reload;
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.41.0.windows.3

#8Peter Smith
smithpb2250@gmail.com
In reply to: Shubham Khanna (#7)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Hi Shubham.

Here are some review comments for the patch v4-0001.

======
Commit message.

1.
The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

~

This says a "warning is raised", but patch v4 changed the warning to
an error, so this description is incompatible with the code.

======
doc/src/sgml/ref/pg_createsubscriber.sgml

2.
+   <para>
+    The 'max_slot_wal_keep_size' must be set to -1 to prevent the automatic
+    removal of WAL files needed by replication slots. Setting this parameter to
+    a specific size may lead to replication failures if required WAL files are
+    prematurely deleted.
+   </para>

The 'max_slot_wal_keep_size' should not be single-quoted like that. It
should use the same markup as the other nearby GUC names and also have
a link like those others do.

======
src/bin/pg_basebackup/pg_createsubscriber.c

3.
+/*
+ * Validate max_slot_wal_keep_size
+ * Logical replication requires max_slot_wal_keep_size to be set to -1 on the
+ * publisher to prevent the deletion of WAL files that are still needed by
+ * replication slots. If this parameter is set to a non-default value, it may
+ * cause replication failures due to required WAL files being prematurely
+ * removed.
+ */

3a.
Not fixed? For patch v3 I had already posted [1-#4a] that this entire
comment is wrongly indented.

~

3b.
That first sentence looks like it is missing a period and a following
blank line. OTOH, maybe the first sentence is not even necessary.

~~~

4.
+ if (dry_run && max_slot_wal_keep_size != -1)
+ {
+ pg_log_error("publisher requires max_slot_wal_keep_size to be -1,
but only %d remain",
+ max_slot_wal_keep_size);
+ pg_log_error_hint("Change the configuration parameter \"%s\" on the
publisher to %d.",
+   "max_slot_wal_keep_size", -1);
+ failed = true;
+ }
+

4a.
Not fixed? For patch v3 I had already posted [1-#4b] that it seemed
strange you did not use the \"%s\" substitution for the GUC name of
the first message (unlike the 2nd message).

I also think it is strange that the 1st message uses a hardwired -1,
but the 2nd message uses a parameter for the -1.

~~

4b.
I don't think "but only %d remain" is the appropriate wording for this
GUC. It looks like a cut/paste mistake from a previous message.
Anyway, maybe this message should be something quite different so it
is not just duplicating the same information as the error_hint. e.g.
see below for one idea. This could also resolve the previous review
comment.

SUGGESTION
"publisher requires WAL size must not be restricted"

~

4c.
I saw that this only emits the message in dry-mode, which is
apparently based on the suggestion from Vignesh [2-#1]. Meanwhile, all
the comments/docs say "it may cause replication failures", so I felt
it might be better to always stop everything up-front if there is a
wrong configuration, instead of waiting for potential failures to
happen -- that's why I had suggested using error instead of a warning,
but maybe I am in the minority.

IIUC there are two ways to address this configuration problem:

A. Give warnings, but only in dry-run mode. If the warnings are
ignored (or if the dry-run was not even done), there may be
replication failures later, but we HOPE it will not happen.

B. Give errors in all modes. The early config error prevents any
chance of later replication errors.

So, I think the implementation needs to choose either A or B.
Currently, it seems a mixture.

======
[1]: my v3 review - /messages/by-id/CAHut+PsgEphCa-P-3Q7cORA=1TRv15UUsaxN9ZkX56M1-J_QRg@mail.gmail.com
/messages/by-id/CAHut+PsgEphCa-P-3Q7cORA=1TRv15UUsaxN9ZkX56M1-J_QRg@mail.gmail.com
[2]: /messages/by-id/CALDaNm09cRzke52UN5zx33PT390whU92oXY4gfOSZEo17CLPjw@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#9Shubham Khanna
khannashubham1197@gmail.com
In reply to: Peter Smith (#8)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Thu, Jan 2, 2025 at 4:20 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Shubham.

Here are some review comments for the patch v4-0001.

======
Commit message.

1.
The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

This says a "warning is raised", but patch v4 changed the warning to
an error, so this description is incompatible with the code.

Since, this patch now raises a warning instead of an error so I think
this should be the same as before.

======
doc/src/sgml/ref/pg_createsubscriber.sgml

2.
+   <para>
+    The 'max_slot_wal_keep_size' must be set to -1 to prevent the automatic
+    removal of WAL files needed by replication slots. Setting this parameter to
+    a specific size may lead to replication failures if required WAL files are
+    prematurely deleted.
+   </para>

The 'max_slot_wal_keep_size' should not be single-quoted like that. It
should use the same markup as the other nearby GUC names and also have
a link like those others do.

Added the link for 'max_slot_wal_keep_size' and updated the
'pg_createsubscriber' documentation.

======
src/bin/pg_basebackup/pg_createsubscriber.c

3.
+/*
+ * Validate max_slot_wal_keep_size
+ * Logical replication requires max_slot_wal_keep_size to be set to -1 on the
+ * publisher to prevent the deletion of WAL files that are still needed by
+ * replication slots. If this parameter is set to a non-default value, it may
+ * cause replication failures due to required WAL files being prematurely
+ * removed.
+ */

3a.
Not fixed? For patch v3 I had already posted [1-#4a] that this entire
comment is wrongly indented.

Fixed.

3b.
That first sentence looks like it is missing a period and a following
blank line. OTOH, maybe the first sentence is not even necessary.

Fixed.

4.
+ if (dry_run && max_slot_wal_keep_size != -1)
+ {
+ pg_log_error("publisher requires max_slot_wal_keep_size to be -1,
but only %d remain",
+ max_slot_wal_keep_size);
+ pg_log_error_hint("Change the configuration parameter \"%s\" on the
publisher to %d.",
+   "max_slot_wal_keep_size", -1);
+ failed = true;
+ }
+

4a.
Not fixed? For patch v3 I had already posted [1-#4b] that it seemed
strange you did not use the \"%s\" substitution for the GUC name of
the first message (unlike the 2nd message).

I also think it is strange that the 1st message uses a hardwired -1,
but the 2nd message uses a parameter for the -1.

Updated the warning message and warning detail.

4b.
I don't think "but only %d remain" is the appropriate wording for this
GUC. It looks like a cut/paste mistake from a previous message.
Anyway, maybe this message should be something quite different so it
is not just duplicating the same information as the error_hint. e.g.
see below for one idea. This could also resolve the previous review
comment.

SUGGESTION
"publisher requires WAL size must not be restricted"

I have used your suggestion in the latest patch.

4c.
I saw that this only emits the message in dry-mode, which is
apparently based on the suggestion from Vignesh [2-#1]. Meanwhile, all
the comments/docs say "it may cause replication failures", so I felt
it might be better to always stop everything up-front if there is a
wrong configuration, instead of waiting for potential failures to
happen -- that's why I had suggested using error instead of a warning,
but maybe I am in the minority.

IIUC there are two ways to address this configuration problem:

A. Give warnings, but only in dry-run mode. If the warnings are
ignored (or if the dry-run was not even done), there may be
replication failures later, but we HOPE it will not happen.

B. Give errors in all modes. The early config error prevents any
chance of later replication errors.

So, I think the implementation needs to choose either A or B.
Currently, it seems a mixture.

======
[1] my v3 review -
/messages/by-id/CAHut+PsgEphCa-P-3Q7cORA=1TRv15UUsaxN9ZkX56M1-J_QRg@mail.gmail.com
[2] /messages/by-id/CALDaNm09cRzke52UN5zx33PT390whU92oXY4gfOSZEo17CLPjw@mail.gmail.com

If 'max_slot_wal_keep_size' is not set to -1, there is no surety that
'pg_createsubscriber' will function as expected and the removal of WAL
files will start to occur. Therefore, a warning is raised instead of
an error to alert users about the potential configuration issue.
If 'max_slot_wal_keep_size' is -1 (the default), replication slots may
retain an unlimited amount of WAL files.

The attached Patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.

Attachments:

v5-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchapplication/octet-stream; name=v5-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchDownload
From 2490ae21e675188f54ebcb99eba1c3b4d1a09caf Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Tue, 24 Dec 2024 12:27:27 +0530
Subject: [PATCH v5] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

By ensuring 'max_slot_wal_keep_size' is -1, this patch prevents the potential
deletion of required WAL files on the publisher that could disrupt logical
replication. A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.
---
 doc/src/sgml/ref/pg_createsubscriber.sgml     |  8 +++++++
 src/bin/pg_basebackup/pg_createsubscriber.c   | 20 +++++++++++++++-
 .../t/040_pg_createsubscriber.pl              | 24 ++++++++++++++++---
 3 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index 26b8e64a4e..27e3b6b734 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -313,6 +313,14 @@ PostgreSQL documentation
     linkend="guc-max-wal-senders"/> configured to a value greater than or equal
     to the number of specified databases and existing WAL sender processes.
    </para>
+
+   <para>
+    The source server must have <xref linkend="guc-max-slot-wal-keep-size"/> to
+    be set to <literal>-1</literal> to prevent the automatic removal of WAL
+    replication slots. Setting this parameter to files needed by a specific size
+    may lead to replication failures if required WAL files are
+    prematurely deleted.
+   </para>
   </refsect2>
 
   <refsect2>
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index faf18ccf13..a2ef374db5 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	int			max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -872,6 +873,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	 * - wal_level = logical
 	 * - max_replication_slots >= current + number of dbs to be converted
 	 * - max_wal_senders >= current + number of dbs to be converted
+	 * - max_slot_wal_keep_size = -1 (to prevent deletion of required WAL files)
 	 * -----------------------------------------------------------------------
 	 */
 	res = PQexec(conn,
@@ -880,7 +882,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.current_setting('max_slot_wal_keep_size')");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +898,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = atoi(PQgetvalue(res, 0, 6));
 
 	PQclear(res);
 
@@ -905,6 +909,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %d",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +945,18 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+	/*
+	 * Validate 'max_slot_wal_keep_size'. If this parameter is set to a
+	 * non-default value, it may cause replication failures due to required
+	 * WAL files being prematurely removed.
+	 */
+	if (dry_run && max_slot_wal_keep_size != -1)
+	{
+		pg_log_warning("publisher requires WAL size must not be restricted");
+		pg_log_warning_detail("The max_slot_wal_keep_size parameter is currently set to a non-default value, which may lead to replication failures. "
+							  "This parameter must be set to -1 to ensure that required WAL files are not prematurely removed.");
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 5426159fa5..6896d3fcbf 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -318,8 +318,14 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
-# dry run mode on node S
-command_ok(
+# Configure 'max_slot_wal_keep_size = 1' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1');
+$node_p->reload;
+
+# dry run mode on node S and verify the warning message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(
 	[
 		'pg_createsubscriber', '--verbose',
 		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
@@ -335,7 +341,19 @@ command_ok(
 		$db1, '--database',
 		$db2
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	0,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: warning: publisher requires WAL size must not be restricted/,
+		qr/The max_slot_wal_keep_size parameter is currently set to a non-default value, which may lead to replication failures. /,
+		qr/This parameter must be set to -1 to ensure that required WAL files are not prematurely removed./,
+	],
+	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
+);
+
+# Reset 'max_slot_wal_keep_size' to default after the test
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = -1');
+$node_p->reload;
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.41.0.windows.3

#10Peter Smith
smithpb2250@gmail.com
In reply to: Shubham Khanna (#9)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Hi Shubham,

Here are some review comments for patch v5-0001.

======
doc/src/sgml/ref/pg_createsubscriber.sgml

1.
+   <para>
+    The source server must have <xref linkend="guc-max-slot-wal-keep-size"/> to
+    be set to <literal>-1</literal> to prevent the automatic removal of WAL
+    replication slots. Setting this parameter to files needed by a
specific size
+    may lead to replication failures if required WAL files are
+    prematurely deleted.
+   </para>

IIUC, this problem is all about the removal of WAL *files*, not "WAL
replication slots".

Also, the "Setting this parameter to files needed by a specific size"
part sounds strange.

I think this can be simplified anyhow like below.

SUGGESTION:
Replication failures can occur if required WAL files are prematurely
deleted. To prevent this, the source server must set <xref
linkend="guc-max-slot-wal-keep-size"/> to <literal>-1</literal>,
ensuring WAL files are not automatically removed.

======
src/bin/pg_basebackup/pg_createsubscriber.c

check_publisher:

2.
+ if (dry_run && max_slot_wal_keep_size != -1)
+ {
+ pg_log_warning("publisher requires WAL size must not be restricted");
+ pg_log_warning_detail("The max_slot_wal_keep_size parameter is
currently set to a non-default value, which may lead to replication
failures. "
+   "This parameter must be set to -1 to ensure that required WAL
files are not prematurely removed.");
+ }

2a.
Whenever you mention a GUC name in a PostgreSQL message then (by
convention) it must be double-quoted.

~

2b.
The detailed message seems verbose and repetitive to me.

~

2c.
The other nearby messages use the terminology "configuration
parameter", so this should be the same.

~

2d.
The other nearby messages use \"%s\" substitution for the GUC name, so
this should be the same.

~

2e.
IMO advising the user to change a configuration parameter should be
done using the pg_log_warning_hint function (e.g. _hint, not
_details).

~~

Result:

CURRENT (pg_log_warning_details)
The max_slot_wal_keep_size parameter is currently set to a non-default
value, which may lead to replication failures. This parameter must be
set to -1 to ensure that required WAL files are not prematurely
removed.

SUGGESTION (pg_log_warning_hint)
Set the configuration parameter \"%s\" to -1 to ensure that required
WAL files are not prematurely removed.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#11Shubham Khanna
khannashubham1197@gmail.com
In reply to: Peter Smith (#10)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Fri, Jan 3, 2025 at 8:06 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Shubham,

Here are some review comments for patch v5-0001.

======
doc/src/sgml/ref/pg_createsubscriber.sgml

1.
+   <para>
+    The source server must have <xref linkend="guc-max-slot-wal-keep-size"/> to
+    be set to <literal>-1</literal> to prevent the automatic removal of WAL
+    replication slots. Setting this parameter to files needed by a
specific size
+    may lead to replication failures if required WAL files are
+    prematurely deleted.
+   </para>

IIUC, this problem is all about the removal of WAL *files*, not "WAL
replication slots".

Also, the "Setting this parameter to files needed by a specific size"
part sounds strange.

I think this can be simplified anyhow like below.

SUGGESTION:
Replication failures can occur if required WAL files are prematurely
deleted. To prevent this, the source server must set <xref
linkend="guc-max-slot-wal-keep-size"/> to <literal>-1</literal>,
ensuring WAL files are not automatically removed.

======
src/bin/pg_basebackup/pg_createsubscriber.c

check_publisher:

2.
+ if (dry_run && max_slot_wal_keep_size != -1)
+ {
+ pg_log_warning("publisher requires WAL size must not be restricted");
+ pg_log_warning_detail("The max_slot_wal_keep_size parameter is
currently set to a non-default value, which may lead to replication
failures. "
+   "This parameter must be set to -1 to ensure that required WAL
files are not prematurely removed.");
+ }

2a.
Whenever you mention a GUC name in a PostgreSQL message then (by
convention) it must be double-quoted.

~

2b.
The detailed message seems verbose and repetitive to me.

~

2c.
The other nearby messages use the terminology "configuration
parameter", so this should be the same.

~

2d.
The other nearby messages use \"%s\" substitution for the GUC name, so
this should be the same.

~

2e.
IMO advising the user to change a configuration parameter should be
done using the pg_log_warning_hint function (e.g. _hint, not
_details).

~~

Result:

CURRENT (pg_log_warning_details)
The max_slot_wal_keep_size parameter is currently set to a non-default
value, which may lead to replication failures. This parameter must be
set to -1 to ensure that required WAL files are not prematurely
removed.

SUGGESTION (pg_log_warning_hint)
Set the configuration parameter \"%s\" to -1 to ensure that required
WAL files are not prematurely removed.

I have fixed the given comments using your suggestions. Tha attached
patch contains the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachments:

v6-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchapplication/octet-stream; name=v6-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchDownload
From dd321e23705df5ca462486e2450fc3827c1f26c1 Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Tue, 24 Dec 2024 12:27:27 +0530
Subject: [PATCH v6] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

By ensuring 'max_slot_wal_keep_size' is -1, this patch prevents the potential
deletion of required WAL files on the publisher that could disrupt logical
replication. A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.
---
 doc/src/sgml/ref/pg_createsubscriber.sgml     |  7 ++++++
 src/bin/pg_basebackup/pg_createsubscriber.c   | 20 +++++++++++++++-
 .../t/040_pg_createsubscriber.pl              | 23 ++++++++++++++++---
 3 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index 26b8e64a4e..c5ee5f4eba 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -313,6 +313,13 @@ PostgreSQL documentation
     linkend="guc-max-wal-senders"/> configured to a value greater than or equal
     to the number of specified databases and existing WAL sender processes.
    </para>
+
+   <para>
+   Replication failures can occur if required WAL files are prematurely deleted.
+   To prevent this, the source server must set <xref
+   linkend="guc-max-slot-wal-keep-size"/>  to <literal>-1</literal>, ensuring
+   WAL files are not automatically removed.
+   </para>
   </refsect2>
 
   <refsect2>
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index faf18ccf13..2a84750c8e 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	int			max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -872,6 +873,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	 * - wal_level = logical
 	 * - max_replication_slots >= current + number of dbs to be converted
 	 * - max_wal_senders >= current + number of dbs to be converted
+	 * - max_slot_wal_keep_size = -1 (to prevent deletion of required WAL files)
 	 * -----------------------------------------------------------------------
 	 */
 	res = PQexec(conn,
@@ -880,7 +882,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.current_setting('max_slot_wal_keep_size')");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +898,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = atoi(PQgetvalue(res, 0, 6));
 
 	PQclear(res);
 
@@ -905,6 +909,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %d",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +945,18 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+	/*
+	 * Validate 'max_slot_wal_keep_size'. If this parameter is set to a
+	 * non-default value, it may cause replication failures due to required
+	 * WAL files being prematurely removed.
+	 */
+	if (dry_run && max_slot_wal_keep_size != -1)
+	{
+		pg_log_warning("publisher requires WAL size must not be restricted");
+		pg_log_warning_hint("Set the configuration parameter \"%s\" to -1 to ensure that required WAL files are not prematurely removed.",
+							"max_slot_wal_keep_size");
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 5426159fa5..76c42c19c7 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -318,8 +318,14 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
-# dry run mode on node S
-command_ok(
+# Configure 'max_slot_wal_keep_size = 1' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1');
+$node_p->reload;
+
+# dry run mode on node S and verify the warning message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(
 	[
 		'pg_createsubscriber', '--verbose',
 		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
@@ -335,7 +341,18 @@ command_ok(
 		$db1, '--database',
 		$db2
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	0,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: warning: publisher requires WAL size must not be restricted/,
+		qr/Set the configuration parameter "max_slot_wal_keep_size" to -1 to ensure that required WAL files are not prematurely removed./,
+	],
+	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
+);
+
+# Reset 'max_slot_wal_keep_size' to default after the test
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = -1');
+$node_p->reload;
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.34.1

#12Peter Smith
smithpb2250@gmail.com
In reply to: Shubham Khanna (#11)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Hi Shubham.

The patch v6-0001 LGTM.

OTOH, if you want to be picky, the docs wording could be slightly
modified to be more consistent with the coded warning message.

CURRENT
Replication failures can occur if required WAL files are prematurely
deleted. To prevent this, the source server must set <xref
linkend="guc-max-slot-wal-keep-size"/> to <literal>-1</literal>,
ensuring WAL files are not automatically removed.

SUGGESTION
Replication failures can occur if required WAL files are missing. To
prevent this, the source server must set <xref
linkend="guc-max-slot-wal-keep-size"/> to <literal>-1</literal> to
ensure that required WAL files are not prematurely removed.

~~~

See the attached NITPICKS diff if you want to make this change.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

PS_NITPICKS_v60001.txttext/plain; charset=US-ASCII; name=PS_NITPICKS_v60001.txtDownload
diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index c5ee5f4..40b8fac 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -315,10 +315,10 @@ PostgreSQL documentation
    </para>
 
    <para>
-   Replication failures can occur if required WAL files are prematurely deleted.
-   To prevent this, the source server must set <xref
-   linkend="guc-max-slot-wal-keep-size"/>  to <literal>-1</literal>, ensuring
-   WAL files are not automatically removed.
+   Replication failures can occur if required WAL files are missing. To prevent
+   this, the source server must set <xref linkend="guc-max-slot-wal-keep-size"/>
+   to <literal>-1</literal> to ensure that required WAL files are not
+   prematurely removed.
    </para>
   </refsect2>
 
#13Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Shubham Khanna (#11)
RE: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Dear Shubham,

Thanks for creating a patch. I have one comment about it.

check_publisher() assumed that the SQL function `pg_catalog.current_setting('max_slot_wal_keep_size')`
will return the numeric, but it just return the text representation. I.e., if the parameter is
set to 10MB, the function returns like below:

```
postgres=# SELECT * FROM pg_catalog.current_setting('max_slot_wal_keep_size');
current_setting
-----------------
10MB
(1 row)
```

Your patch can work well because atoi() ignores the latter part of the string,
e.g., "10MB" is converted to "10", but this is not clean. I suggest either of
1) accepting the value as the string, or 2) using an SQL function pg_size_bytes()
to get max_slot_wal_keep_size.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

#14Shubham Khanna
khannashubham1197@gmail.com
In reply to: Peter Smith (#12)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Mon, Jan 6, 2025 at 3:22 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Shubham.

The patch v6-0001 LGTM.

OTOH, if you want to be picky, the docs wording could be slightly
modified to be more consistent with the coded warning message.

CURRENT
Replication failures can occur if required WAL files are prematurely
deleted. To prevent this, the source server must set <xref
linkend="guc-max-slot-wal-keep-size"/> to <literal>-1</literal>,
ensuring WAL files are not automatically removed.

SUGGESTION
Replication failures can occur if required WAL files are missing. To
prevent this, the source server must set <xref
linkend="guc-max-slot-wal-keep-size"/> to <literal>-1</literal> to
ensure that required WAL files are not prematurely removed.

~~~

See the attached NITPICKS diff if you want to make this change.

I have used your suggestion and updated the 'pg_createsubscriber'
documentation accordingly.
The attached Patch contains the suggested change.

Thanks and regards,
Shubham Khanna.

Attachments:

v7-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchapplication/octet-stream; name=v7-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchDownload
From 191c60c758770f00d64b432ce2b1e9aaeb8deb05 Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Tue, 24 Dec 2024 12:27:27 +0530
Subject: [PATCH v7] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

By ensuring 'max_slot_wal_keep_size' is -1, this patch prevents the potential
deletion of required WAL files on the publisher that could disrupt logical
replication. A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.
---
 doc/src/sgml/ref/pg_createsubscriber.sgml     |  7 +++++
 src/bin/pg_basebackup/pg_createsubscriber.c   | 28 ++++++++++++++++++-
 .../t/040_pg_createsubscriber.pl              | 23 +++++++++++++--
 3 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index 26b8e64a4e..40b8fac50e 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -313,6 +313,13 @@ PostgreSQL documentation
     linkend="guc-max-wal-senders"/> configured to a value greater than or equal
     to the number of specified databases and existing WAL sender processes.
    </para>
+
+   <para>
+   Replication failures can occur if required WAL files are missing. To prevent
+   this, the source server must set <xref linkend="guc-max-slot-wal-keep-size"/>
+   to <literal>-1</literal> to ensure that required WAL files are not
+   prematurely removed.
+   </para>
   </refsect2>
 
   <refsect2>
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index faf18ccf13..d5dcd290c9 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -27,6 +27,7 @@
 #include "fe_utils/simple_list.h"
 #include "fe_utils/string_utils.h"
 #include "getopt_long.h"
+#include "common/int.h"			/* include for strtoi64 */
 
 #define	DEFAULT_SUB_PORT	"50432"
 
@@ -849,6 +850,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	int64		max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -872,6 +874,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	 * - wal_level = logical
 	 * - max_replication_slots >= current + number of dbs to be converted
 	 * - max_wal_senders >= current + number of dbs to be converted
+	 * - max_slot_wal_keep_size = -1 (to prevent deletion of required WAL files)
 	 * -----------------------------------------------------------------------
 	 */
 	res = PQexec(conn,
@@ -880,7 +883,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.pg_size_bytes(pg_catalog.current_setting('max_slot_wal_keep_size'))");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +899,14 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = strtoi64(PQgetvalue(res, 0, 6), NULL, 0);
+
+	/* use strtoi64 to convert the string result to a 64-bit integer */
+	if (max_slot_wal_keep_size == 0 && errno != 0)
+	{
+		/* handle conversion error */
+		pg_log_error("Failed to convert max_slot_wal_keep_size to int64");
+	}
 
 	PQclear(res);
 
@@ -905,6 +917,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %ld",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +953,18 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+	/*
+	 * Validate 'max_slot_wal_keep_size'. If this parameter is set to a
+	 * non-default value, it may cause replication failures due to required
+	 * WAL files being prematurely removed.
+	 */
+	if (dry_run && max_slot_wal_keep_size != -1)
+	{
+		pg_log_warning("publisher requires WAL size must not be restricted");
+		pg_log_warning_hint("Set the configuration parameter \"%s\" to -1 to ensure that required WAL files are not prematurely removed.",
+							"max_slot_wal_keep_size");
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 5426159fa5..76c42c19c7 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -318,8 +318,14 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
-# dry run mode on node S
-command_ok(
+# Configure 'max_slot_wal_keep_size = 1' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1');
+$node_p->reload;
+
+# dry run mode on node S and verify the warning message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(
 	[
 		'pg_createsubscriber', '--verbose',
 		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
@@ -335,7 +341,18 @@ command_ok(
 		$db1, '--database',
 		$db2
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	0,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: warning: publisher requires WAL size must not be restricted/,
+		qr/Set the configuration parameter "max_slot_wal_keep_size" to -1 to ensure that required WAL files are not prematurely removed./,
+	],
+	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
+);
+
+# Reset 'max_slot_wal_keep_size' to default after the test
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = -1');
+$node_p->reload;
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.34.1

#15Shubham Khanna
khannashubham1197@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#13)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Mon, Jan 6, 2025 at 7:59 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Shubham,

Thanks for creating a patch. I have one comment about it.

check_publisher() assumed that the SQL function `pg_catalog.current_setting('max_slot_wal_keep_size')`
will return the numeric, but it just return the text representation. I.e., if the parameter is
set to 10MB, the function returns like below:

```
postgres=# SELECT * FROM pg_catalog.current_setting('max_slot_wal_keep_size');
current_setting
-----------------
10MB
(1 row)
```

Your patch can work well because atoi() ignores the latter part of the string,
e.g., "10MB" is converted to "10", but this is not clean. I suggest either of
1) accepting the value as the string, or 2) using an SQL function pg_size_bytes()
to get max_slot_wal_keep_size.

I have fixed the given comment. The v7 version Patch attached at [1]/messages/by-id/CAHv8Rj+SxmBgz-vw5f7meRzEh6pgp9YhAjL5BPzTpevu31rY7Q@mail.gmail.com
has the changes for the same.
[1]: /messages/by-id/CAHv8Rj+SxmBgz-vw5f7meRzEh6pgp9YhAjL5BPzTpevu31rY7Q@mail.gmail.com

Thanks and regards,
Shubham Khanna.

#16Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Shubham Khanna (#14)
RE: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Dear Shubham,

Thanks for updating the patch. Few comments.

```
+#include "common/int.h" /* include for strtoi64 */
```

I could build the source code without the inclusion. Can you remove?

```
+	max_slot_wal_keep_size = strtoi64(PQgetvalue(res, 0, 6), NULL, 0);
+
+	/* use strtoi64 to convert the string result to a 64-bit integer */
+	if (max_slot_wal_keep_size == 0 && errno != 0)
+	{
+		/* handle conversion error */
+		pg_log_error("Failed to convert max_slot_wal_keep_size to int64");
+	}
```

I'm not sure the error handling is really needed. pg_dump also uses the function and it does
not have such handlings.

```
+	pg_log_debug("publisher: max_slot_wal_keep_size: %ld",
+				 max_slot_wal_keep_size);
```

IIUC, "%ld" does not always represent int64 format. Since it is a debug message, isn't it enough to
use INT64_FORMAT macro?

```
+# Configure 'max_slot_wal_keep_size = 1' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1');
+$node_p->reload;
```

Can you use 'max_slot_wal_keep_size = 10MB' or something to test the handling is correct?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

#17Shubham Khanna
khannashubham1197@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#16)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Thu, Jan 9, 2025 at 12:46 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Shubham,

Thanks for updating the patch. Few comments.

```
+#include "common/int.h" /* include for strtoi64 */
```

I could build the source code without the inclusion. Can you remove?

```
+       max_slot_wal_keep_size = strtoi64(PQgetvalue(res, 0, 6), NULL, 0);
+
+       /* use strtoi64 to convert the string result to a 64-bit integer */
+       if (max_slot_wal_keep_size == 0 && errno != 0)
+       {
+               /* handle conversion error */
+               pg_log_error("Failed to convert max_slot_wal_keep_size to int64");
+       }
```

I'm not sure the error handling is really needed. pg_dump also uses the function and it does
not have such handlings.

```
+       pg_log_debug("publisher: max_slot_wal_keep_size: %ld",
+                                max_slot_wal_keep_size);
```

IIUC, "%ld" does not always represent int64 format. Since it is a debug message, isn't it enough to
use INT64_FORMAT macro?

```
+# Configure 'max_slot_wal_keep_size = 1' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1');
+$node_p->reload;
```

Can you use 'max_slot_wal_keep_size = 10MB' or something to test the handling is correct?

The attached Patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.

Attachments:

v8-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchapplication/octet-stream; name=v8-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchDownload
From ccd1f19ccdd77ebca1272f04144d90f2a0deec14 Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Tue, 24 Dec 2024 12:27:27 +0530
Subject: [PATCH v8] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

By ensuring 'max_slot_wal_keep_size' is -1, this patch prevents the potential
deletion of required WAL files on the publisher that could disrupt logical
replication. A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.
---
 doc/src/sgml/ref/pg_createsubscriber.sgml     |  7 ++++++
 src/bin/pg_basebackup/pg_createsubscriber.c   | 20 +++++++++++++++-
 .../t/040_pg_createsubscriber.pl              | 23 ++++++++++++++++---
 3 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index 26b8e64a4e..40b8fac50e 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -313,6 +313,13 @@ PostgreSQL documentation
     linkend="guc-max-wal-senders"/> configured to a value greater than or equal
     to the number of specified databases and existing WAL sender processes.
    </para>
+
+   <para>
+   Replication failures can occur if required WAL files are missing. To prevent
+   this, the source server must set <xref linkend="guc-max-slot-wal-keep-size"/>
+   to <literal>-1</literal> to ensure that required WAL files are not
+   prematurely removed.
+   </para>
   </refsect2>
 
   <refsect2>
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index faf18ccf13..f1892f3f33 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	int64		max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -872,6 +873,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	 * - wal_level = logical
 	 * - max_replication_slots >= current + number of dbs to be converted
 	 * - max_wal_senders >= current + number of dbs to be converted
+	 * - max_slot_wal_keep_size = -1 (to prevent deletion of required WAL files)
 	 * -----------------------------------------------------------------------
 	 */
 	res = PQexec(conn,
@@ -880,7 +882,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.pg_size_bytes(pg_catalog.current_setting('max_slot_wal_keep_size'))");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +898,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = strtoi64(PQgetvalue(res, 0, 6), NULL, 0);
 
 	PQclear(res);
 
@@ -905,6 +909,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: " INT64_FORMAT,
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +945,18 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+	/*
+	 * Validate 'max_slot_wal_keep_size'. If this parameter is set to a
+	 * non-default value, it may cause replication failures due to required
+	 * WAL files being prematurely removed.
+	 */
+	if (dry_run && max_slot_wal_keep_size != -1)
+	{
+		pg_log_warning("publisher requires WAL size must not be restricted");
+		pg_log_warning_hint("Set the configuration parameter \"%s\" to -1 to ensure that required WAL files are not prematurely removed.",
+							"max_slot_wal_keep_size");
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 5426159fa5..c1fc0cfe9e 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -318,8 +318,14 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
-# dry run mode on node S
-command_ok(
+# Configure 'max_slot_wal_keep_size = 10MB' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 10MB');
+$node_p->reload;
+
+# dry run mode on node S and verify the warning message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(
 	[
 		'pg_createsubscriber', '--verbose',
 		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
@@ -335,7 +341,18 @@ command_ok(
 		$db1, '--database',
 		$db2
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	0,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: warning: publisher requires WAL size must not be restricted/,
+		qr/Set the configuration parameter "max_slot_wal_keep_size" to -1 to ensure that required WAL files are not prematurely removed./,
+	],
+	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
+);
+
+# Reset 'max_slot_wal_keep_size' to default after the test
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = -1');
+$node_p->reload;
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.34.1

#18Peter Smith
smithpb2250@gmail.com
In reply to: Shubham Khanna (#17)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Hi Shubham,

Some review comments for patch v8=0001.

======
Commit message.

1.
By ensuring 'max_slot_wal_keep_size' is -1, this patch prevents the potential
deletion of required WAL files on the publisher that could disrupt logical
replication. A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.

~

AFAIK the patch only logs a warning. It is not *enforcing* anything at all.

So, saying "ensuring" and "preventing" is misleading.

======
src/bin/pg_basebackup/pg_createsubscriber.c

check_publisher:

2.
+ max_slot_wal_keep_size = strtoi64(PQgetvalue(res, 0, 6), NULL, 0);

Is passing base 0 here ok?

======
src/bin/pg_basebackup/t/040_pg_createsubscriber.pl

3.
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 10MB');
+$node_p->reload;
+
+# dry run mode on node S and verify the warning message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(

Because of the --verbose I expected to see the pg_log_debug message
getting output showing the large (10MB) value for
max_slot_wal_keep_size.

But, I can only see a value of -1 in the log file
'tmp_check/log/regress_log_040_pg_createsubscriber', which may not
even be from the same test. Am I looking in the wrong place (???) e.g.
Where is the logging evidence of that publisher's bad GUC (10MB) value
in the logs before the warning is emitted?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#19Peter Smith
smithpb2250@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#13)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Mon, Jan 6, 2025 at 1:29 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Shubham,

Thanks for creating a patch. I have one comment about it.

check_publisher() assumed that the SQL function `pg_catalog.current_setting('max_slot_wal_keep_size')`
will return the numeric, but it just return the text representation. I.e., if the parameter is
set to 10MB, the function returns like below:

```
postgres=# SELECT * FROM pg_catalog.current_setting('max_slot_wal_keep_size');
current_setting
-----------------
10MB
(1 row)
```

Your patch can work well because atoi() ignores the latter part of the string,
e.g., "10MB" is converted to "10", but this is not clean. I suggest either of
1) accepting the value as the string, or 2) using an SQL function pg_size_bytes()
to get max_slot_wal_keep_size.

Hi Shubham.

Kuroda-san gave a couple of ideas above and you chose the option 2.

FWIW, recently I've been thinking that option 1 might have been a
better choice, because:
i) Using strtoi64 for a GUC value seems to be very rare. I didn't
find any other examples of what you've ended up doing in v8-0001.
ii) When you display the value in the pg_log_debug it would be better
to display the same human-readable format that the user configured
instead of some possibly huge int64 value
iii) AFAIK, we only need to check this against "-1". The actual value
is of no real importance to the patch, so going to extra trouble to
extract an int64 that we don't care about seems unnecessary

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#20Peter Smith
smithpb2250@gmail.com
In reply to: Shubham Khanna (#17)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Hi Shubham,

In a previous review [1, comment #3] I was complaining about the lack
of output emitted by the pg_log_debug() because I wanted to see some
LOG evidence of the bad value of max_slot_wal_keep_size.

FYI, I think that mystery has finally been solved.

AFAIK 2 separate problems were contributing to the lack of debug output.

1. Unlike the 'command_ok' subroutine, the 'command_checks_all'
subroutine apparently redirects the output before doing the requested
pattern matching on it. So there is no output to be seen.

2. After that, I did not realise the effect of '-verbose' is
cumulative on loglevel. Specifically, to see debug messages it is not
enough to just say '--verbose'. In fact, you need to specify it 2
times: '--verbose', '--verbose'

~

After addressing these (e.g. by executing the same pg_createsubscriber
using 'command_ok' and using a double '--verbose'), the expected LOGS
can be seen.

e.g. in the file tmp_check/log/regress_log_040_pg_createsubscriber
----------
...
pg_createsubscriber: checking settings on publisher
pg_createsubscriber: publisher: wal_level: logical
pg_createsubscriber: publisher: max_replication_slots: 10
pg_createsubscriber: publisher: current replication slots: 2
pg_createsubscriber: publisher: max_wal_senders: 10
pg_createsubscriber: publisher: current wal senders: 1
pg_createsubscriber: publisher: max_prepared_transactions: 0
pg_createsubscriber: publisher: max_slot_wal_keep_size: 10485760
pg_createsubscriber: warning: publisher requires WAL size must not be restricted
pg_createsubscriber: hint: Set the configuration parameter
"max_slot_wal_keep_size" to -1 to ensure that required WAL files are
not prematurely removed.
pg_createsubscriber: stopping the subscriber
...
----------

See the attached diff for the TAP changes I used to expose this
logging. Apply this atop v8.

======
[1]: /messages/by-id/CAHut+Pszk61QM+cEvq_1-A2y+JrAD0USB+NvtcidajYOfHDkyw@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

PS_show_debug_output.txttext/plain; charset=US-ASCII; name=PS_show_debug_output.txtDownload
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index c1fc0cf..773a27b 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -327,7 +327,7 @@ $node_p->reload;
 # 'max_slot_wal_keep_size'
 command_checks_all(
 	[
-		'pg_createsubscriber', '--verbose',
+		'pg_createsubscriber', '--verbose', '--verbose',
 		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
 		'--dry-run', '--pgdata',
 		$node_s->data_dir, '--publisher-server',
@@ -350,6 +350,26 @@ command_checks_all(
 	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
 );
 
+# And, same again to see the output...
+command_ok(
+	[
+		'pg_createsubscriber', '--verbose', '--verbose',
+		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
+		'--dry-run', '--pgdata',
+		$node_s->data_dir, '--publisher-server',
+		$node_p->connstr($db1), '--socketdir',
+		$node_s->host, '--subscriber-port',
+		$node_s->port, '--publication',
+		'pub1', '--publication',
+		'pub2', '--subscription',
+		'sub1', '--subscription',
+		'sub2', '--database',
+		$db1, '--database',
+		$db2
+	],
+	'Ditto above, but this time using command_ok to retain the output'
+);
+
 # Reset 'max_slot_wal_keep_size' to default after the test
 $node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = -1');
 $node_p->reload;
#21Shubham Khanna
khannashubham1197@gmail.com
In reply to: Peter Smith (#18)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Fri, Jan 10, 2025 at 6:56 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Shubham,

Some review comments for patch v8=0001.

======
Commit message.

1.
By ensuring 'max_slot_wal_keep_size' is -1, this patch prevents the potential
deletion of required WAL files on the publisher that could disrupt logical
replication. A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.

~

AFAIK the patch only logs a warning. It is not *enforcing* anything at all.

So, saying "ensuring" and "preventing" is misleading.

======
src/bin/pg_basebackup/pg_createsubscriber.c

check_publisher:

2.
+ max_slot_wal_keep_size = strtoi64(PQgetvalue(res, 0, 6), NULL, 0);

Is passing base 0 here ok?

======
src/bin/pg_basebackup/t/040_pg_createsubscriber.pl

3.
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 10MB');
+$node_p->reload;
+
+# dry run mode on node S and verify the warning message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(

Because of the --verbose I expected to see the pg_log_debug message
getting output showing the large (10MB) value for
max_slot_wal_keep_size.

But, I can only see a value of -1 in the log file
'tmp_check/log/regress_log_040_pg_createsubscriber', which may not
even be from the same test. Am I looking in the wrong place (???) e.g.
Where is the logging evidence of that publisher's bad GUC (10MB) value
in the logs before the warning is emitted?

I have fixed the given comments using your suggestions. The attached
patch contains the suggested changes.

Thanks and Regards,
Shubham Khanna.

Attachments:

v9-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchapplication/octet-stream; name=v9-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patchDownload
From 52edb90bd8bca67948fabc7a22c7708add3e186c Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Tue, 24 Dec 2024 12:27:27 +0530
Subject: [PATCH v9] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

This patch logs a warning if 'max_slot_wal_keep_size' is not set to '-1',
highlighting a potential risk of required WAL files being deleted on the
publisher, which could disrupt logical replication.
A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.
---
 doc/src/sgml/ref/pg_createsubscriber.sgml     |  7 +++
 src/bin/pg_basebackup/pg_createsubscriber.c   | 20 ++++++++-
 .../t/040_pg_createsubscriber.pl              | 43 +++++++++++++++++--
 3 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index 26b8e64a4e..40b8fac50e 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -313,6 +313,13 @@ PostgreSQL documentation
     linkend="guc-max-wal-senders"/> configured to a value greater than or equal
     to the number of specified databases and existing WAL sender processes.
    </para>
+
+   <para>
+   Replication failures can occur if required WAL files are missing. To prevent
+   this, the source server must set <xref linkend="guc-max-slot-wal-keep-size"/>
+   to <literal>-1</literal> to ensure that required WAL files are not
+   prematurely removed.
+   </para>
   </refsect2>
 
   <refsect2>
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index faf18ccf13..7433c7ae2a 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	char	   *max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -872,6 +873,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	 * - wal_level = logical
 	 * - max_replication_slots >= current + number of dbs to be converted
 	 * - max_wal_senders >= current + number of dbs to be converted
+	 * - max_slot_wal_keep_size = -1 (to prevent deletion of required WAL files)
 	 * -----------------------------------------------------------------------
 	 */
 	res = PQexec(conn,
@@ -880,7 +882,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.current_setting('max_slot_wal_keep_size')");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +898,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = pg_strdup(PQgetvalue(res, 0, 6));
 
 	PQclear(res);
 
@@ -905,6 +909,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %s",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +945,18 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+	/*
+	 * Validate 'max_slot_wal_keep_size'. If this parameter is set to a
+	 * non-default value, it may cause replication failures due to required
+	 * WAL files being prematurely removed.
+	 */
+	if (dry_run && (strcmp(max_slot_wal_keep_size, "-1") != 0))
+	{
+		pg_log_warning("publisher requires WAL size must not be restricted");
+		pg_log_warning_hint("Set the configuration parameter \"%s\" to -1 to ensure that required WAL files are not prematurely removed.",
+							"max_slot_wal_keep_size");
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 5426159fa5..773a27bf06 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -318,10 +318,42 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
-# dry run mode on node S
+# Configure 'max_slot_wal_keep_size = 10MB' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 10MB');
+$node_p->reload;
+
+# dry run mode on node S and verify the warning message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(
+	[
+		'pg_createsubscriber', '--verbose', '--verbose',
+		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
+		'--dry-run', '--pgdata',
+		$node_s->data_dir, '--publisher-server',
+		$node_p->connstr($db1), '--socketdir',
+		$node_s->host, '--subscriber-port',
+		$node_s->port, '--publication',
+		'pub1', '--publication',
+		'pub2', '--subscription',
+		'sub1', '--subscription',
+		'sub2', '--database',
+		$db1, '--database',
+		$db2
+	],
+	0,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: warning: publisher requires WAL size must not be restricted/,
+		qr/Set the configuration parameter "max_slot_wal_keep_size" to -1 to ensure that required WAL files are not prematurely removed./,
+	],
+	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
+);
+
+# And, same again to see the output...
 command_ok(
 	[
-		'pg_createsubscriber', '--verbose',
+		'pg_createsubscriber', '--verbose', '--verbose',
 		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
 		'--dry-run', '--pgdata',
 		$node_s->data_dir, '--publisher-server',
@@ -335,7 +367,12 @@ command_ok(
 		$db1, '--database',
 		$db2
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	'Ditto above, but this time using command_ok to retain the output'
+);
+
+# Reset 'max_slot_wal_keep_size' to default after the test
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = -1');
+$node_p->reload;
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.41.0.windows.3

#22Shubham Khanna
khannashubham1197@gmail.com
In reply to: Peter Smith (#19)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Fri, Jan 10, 2025 at 8:21 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Mon, Jan 6, 2025 at 1:29 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Shubham,

Thanks for creating a patch. I have one comment about it.

check_publisher() assumed that the SQL function `pg_catalog.current_setting('max_slot_wal_keep_size')`
will return the numeric, but it just return the text representation. I.e., if the parameter is
set to 10MB, the function returns like below:

```
postgres=# SELECT * FROM pg_catalog.current_setting('max_slot_wal_keep_size');
current_setting
-----------------
10MB
(1 row)
```

Your patch can work well because atoi() ignores the latter part of the string,
e.g., "10MB" is converted to "10", but this is not clean. I suggest either of
1) accepting the value as the string, or 2) using an SQL function pg_size_bytes()
to get max_slot_wal_keep_size.

Hi Shubham.

Kuroda-san gave a couple of ideas above and you chose the option 2.

FWIW, recently I've been thinking that option 1 might have been a
better choice, because:
i) Using strtoi64 for a GUC value seems to be very rare. I didn't
find any other examples of what you've ended up doing in v8-0001.
ii) When you display the value in the pg_log_debug it would be better
to display the same human-readable format that the user configured
instead of some possibly huge int64 value
iii) AFAIK, we only need to check this against "-1". The actual value
is of no real importance to the patch, so going to extra trouble to
extract an int64 that we don't care about seems unnecessary

I have fixed the given comment and used the string to accept the
value. The v9 version Patch attached at [1]/messages/by-id/CAHv8RjL4MFEDosz9+TFmufYCDoNCnyN6PFeSxG6MwsH1ufhZLA@mail.gmail.com has the changes for the
same.
[1]: /messages/by-id/CAHv8RjL4MFEDosz9+TFmufYCDoNCnyN6PFeSxG6MwsH1ufhZLA@mail.gmail.com

Thanks and regards,
Shubham Khanna.

#23Shubham Khanna
khannashubham1197@gmail.com
In reply to: Peter Smith (#20)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Tue, Jan 14, 2025 at 10:10 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Shubham,

In a previous review [1, comment #3] I was complaining about the lack
of output emitted by the pg_log_debug() because I wanted to see some
LOG evidence of the bad value of max_slot_wal_keep_size.

FYI, I think that mystery has finally been solved.

AFAIK 2 separate problems were contributing to the lack of debug output.

1. Unlike the 'command_ok' subroutine, the 'command_checks_all'
subroutine apparently redirects the output before doing the requested
pattern matching on it. So there is no output to be seen.

2. After that, I did not realise the effect of '-verbose' is
cumulative on loglevel. Specifically, to see debug messages it is not
enough to just say '--verbose'. In fact, you need to specify it 2
times: '--verbose', '--verbose'

~

After addressing these (e.g. by executing the same pg_createsubscriber
using 'command_ok' and using a double '--verbose'), the expected LOGS
can be seen.

e.g. in the file tmp_check/log/regress_log_040_pg_createsubscriber
----------
...
pg_createsubscriber: checking settings on publisher
pg_createsubscriber: publisher: wal_level: logical
pg_createsubscriber: publisher: max_replication_slots: 10
pg_createsubscriber: publisher: current replication slots: 2
pg_createsubscriber: publisher: max_wal_senders: 10
pg_createsubscriber: publisher: current wal senders: 1
pg_createsubscriber: publisher: max_prepared_transactions: 0
pg_createsubscriber: publisher: max_slot_wal_keep_size: 10485760
pg_createsubscriber: warning: publisher requires WAL size must not be restricted
pg_createsubscriber: hint: Set the configuration parameter
"max_slot_wal_keep_size" to -1 to ensure that required WAL files are
not prematurely removed.
pg_createsubscriber: stopping the subscriber
...
----------

See the attached diff for the TAP changes I used to expose this
logging. Apply this atop v8.

======
[1] /messages/by-id/CAHut+Pszk61QM+cEvq_1-A2y+JrAD0USB+NvtcidajYOfHDkyw@mail.gmail.com

I have used your diff to update the TAP tests accordingly. The v9
version Patch attached at [1]/messages/by-id/CAHv8RjL4MFEDosz9+TFmufYCDoNCnyN6PFeSxG6MwsH1ufhZLA@mail.gmail.com has the changes for the same.
[1]: /messages/by-id/CAHv8RjL4MFEDosz9+TFmufYCDoNCnyN6PFeSxG6MwsH1ufhZLA@mail.gmail.com

Thanks and regards,
Shubham Khanna.

#24Peter Smith
smithpb2250@gmail.com
In reply to: Shubham Khanna (#21)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Hi Shubham.

Patch v9-0001 LGTM.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#25Shubham Khanna
khannashubham1197@gmail.com
In reply to: Peter Smith (#24)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Wed, Jan 15, 2025 at 3:28 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Shubham.

Patch v9-0001 LGTM.

Upon reviewing the v9-0001 patch, I noticed that running 'pg perltidy'
was necessary. To ensure the patch adheres to the required coding
standards, I have applied 'pg perltidy' and made the necessary
adjustments.
The attached patch includes these formatting fixes along with the
latest changes.

Thanks and regards,
Shubham Khanna.

Attachments:

v10-0001-Validate-max_slot_wal_keep_size-in-pg_createsubs.patchapplication/octet-stream; name=v10-0001-Validate-max_slot_wal_keep_size-in-pg_createsubs.patchDownload
From 07a34d2541687779fb07ce1a294454659279b4e2 Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Tue, 24 Dec 2024 12:27:27 +0530
Subject: [PATCH v10] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

This patch logs a warning if 'max_slot_wal_keep_size' is not set to '-1',
highlighting a potential risk of required WAL files being deleted on the
publisher, which could disrupt logical replication.
A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.
---
 doc/src/sgml/ref/pg_createsubscriber.sgml     |  7 ++
 src/bin/pg_basebackup/pg_createsubscriber.c   | 20 +++++-
 .../t/040_pg_createsubscriber.pl              | 64 +++++++++++++++----
 3 files changed, 76 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index 26b8e64a4e..40b8fac50e 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -313,6 +313,13 @@ PostgreSQL documentation
     linkend="guc-max-wal-senders"/> configured to a value greater than or equal
     to the number of specified databases and existing WAL sender processes.
    </para>
+
+   <para>
+   Replication failures can occur if required WAL files are missing. To prevent
+   this, the source server must set <xref linkend="guc-max-slot-wal-keep-size"/>
+   to <literal>-1</literal> to ensure that required WAL files are not
+   prematurely removed.
+   </para>
   </refsect2>
 
   <refsect2>
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index faf18ccf13..7433c7ae2a 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	char	   *max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -872,6 +873,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	 * - wal_level = logical
 	 * - max_replication_slots >= current + number of dbs to be converted
 	 * - max_wal_senders >= current + number of dbs to be converted
+	 * - max_slot_wal_keep_size = -1 (to prevent deletion of required WAL files)
 	 * -----------------------------------------------------------------------
 	 */
 	res = PQexec(conn,
@@ -880,7 +882,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.current_setting('max_slot_wal_keep_size')");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +898,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = pg_strdup(PQgetvalue(res, 0, 6));
 
 	PQclear(res);
 
@@ -905,6 +909,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %s",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +945,18 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+	/*
+	 * Validate 'max_slot_wal_keep_size'. If this parameter is set to a
+	 * non-default value, it may cause replication failures due to required
+	 * WAL files being prematurely removed.
+	 */
+	if (dry_run && (strcmp(max_slot_wal_keep_size, "-1") != 0))
+	{
+		pg_log_warning("publisher requires WAL size must not be restricted");
+		pg_log_warning_hint("Set the configuration parameter \"%s\" to -1 to ensure that required WAL files are not prematurely removed.",
+							"max_slot_wal_keep_size");
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 5426159fa5..d55fd78009 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -318,24 +318,60 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
-# dry run mode on node S
+# Configure 'max_slot_wal_keep_size = 10MB' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 10MB');
+$node_p->reload;
+
+# dry run mode on node S and verify the warning message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(
+	[
+		'pg_createsubscriber', '--verbose',
+		'--verbose', '--recovery-timeout',
+		"$PostgreSQL::Test::Utils::timeout_default", '--dry-run',
+		'--pgdata', $node_s->data_dir,
+		'--publisher-server', $node_p->connstr($db1),
+		'--socketdir', $node_s->host,
+		'--subscriber-port', $node_s->port,
+		'--publication', 'pub1',
+		'--publication', 'pub2',
+		'--subscription', 'sub1',
+		'--subscription', 'sub2',
+		'--database', $db1,
+		'--database', $db2
+	],
+	0,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: warning: publisher requires WAL size must not be restricted/,
+		qr/Set the configuration parameter "max_slot_wal_keep_size" to -1 to ensure that required WAL files are not prematurely removed./,
+	],
+	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
+);
+
+# And, same again to see the output...
 command_ok(
 	[
 		'pg_createsubscriber', '--verbose',
-		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
-		'--dry-run', '--pgdata',
-		$node_s->data_dir, '--publisher-server',
-		$node_p->connstr($db1), '--socketdir',
-		$node_s->host, '--subscriber-port',
-		$node_s->port, '--publication',
-		'pub1', '--publication',
-		'pub2', '--subscription',
-		'sub1', '--subscription',
-		'sub2', '--database',
-		$db1, '--database',
-		$db2
+		'--verbose', '--recovery-timeout',
+		"$PostgreSQL::Test::Utils::timeout_default", '--dry-run',
+		'--pgdata', $node_s->data_dir,
+		'--publisher-server', $node_p->connstr($db1),
+		'--socketdir', $node_s->host,
+		'--subscriber-port', $node_s->port,
+		'--publication', 'pub1',
+		'--publication', 'pub2',
+		'--subscription', 'sub1',
+		'--subscription', 'sub2',
+		'--database', $db1,
+		'--database', $db2
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	'Ditto above, but this time using command_ok to retain the output');
+
+# Reset 'max_slot_wal_keep_size' to default after the test
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = -1');
+$node_p->reload;
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.34.1

#26Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Shubham Khanna (#25)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Thu, 16 Jan 2025 at 10:48, Shubham Khanna
<khannashubham1197@gmail.com> wrote:

On Wed, Jan 15, 2025 at 3:28 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Shubham.

Patch v9-0001 LGTM.

Upon reviewing the v9-0001 patch, I noticed that running 'pg perltidy'
was necessary. To ensure the patch adheres to the required coding
standards, I have applied 'pg perltidy' and made the necessary
adjustments.
The attached patch includes these formatting fixes along with the
latest changes.

Hi Shubham,

I reviewed the v10 patch. I have following comments:

1. The documentation is added under the prerequisite section. And the
prerequisite section states that
' There are some prerequisites for
<application>pg_createsubscriber</application> to convert the target server
into a logical replica. If these are not met, an error will be reported.'

But for our case we are giving a warning only during the dry run. So,
I feel we should add the documentation under the 'Warnings' section.

2. Spacing is not consistent with the rest of document:

+   Replication failures can occur if required WAL files are missing. To prevent
+   this, the source server must set <xref
linkend="guc-max-slot-wal-keep-size"/>
+   to <literal>-1</literal> to ensure that required WAL files are not
+   prematurely removed.

For the above added line, we should add a space before each line.

Thanks and Regards,
Shlok Kyal

#27Shubham Khanna
khannashubham1197@gmail.com
In reply to: Shlok Kyal (#26)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Thu, Jan 16, 2025 at 12:12 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

On Thu, 16 Jan 2025 at 10:48, Shubham Khanna
<khannashubham1197@gmail.com> wrote:

On Wed, Jan 15, 2025 at 3:28 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Shubham.

Patch v9-0001 LGTM.

Upon reviewing the v9-0001 patch, I noticed that running 'pg perltidy'
was necessary. To ensure the patch adheres to the required coding
standards, I have applied 'pg perltidy' and made the necessary
adjustments.
The attached patch includes these formatting fixes along with the
latest changes.

Hi Shubham,

I reviewed the v10 patch. I have following comments:

1. The documentation is added under the prerequisite section. And the
prerequisite section states that
' There are some prerequisites for
<application>pg_createsubscriber</application> to convert the target server
into a logical replica. If these are not met, an error will be reported.'

But for our case we are giving a warning only during the dry run. So,
I feel we should add the documentation under the 'Warnings' section.

2. Spacing is not consistent with the rest of document:

+   Replication failures can occur if required WAL files are missing. To prevent
+   this, the source server must set <xref
linkend="guc-max-slot-wal-keep-size"/>
+   to <literal>-1</literal> to ensure that required WAL files are not
+   prematurely removed.

For the above added line, we should add a space before each line.

Fixed the given comments. The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.

Attachments:

v11-0001-Validate-max_slot_wal_keep_size-in-pg_createsubs.patchapplication/octet-stream; name=v11-0001-Validate-max_slot_wal_keep_size-in-pg_createsubs.patchDownload
From 4d1532fd5ffa3715d69049971272d9f4a3e13455 Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Tue, 24 Dec 2024 12:27:27 +0530
Subject: [PATCH v11] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

This patch logs a warning if 'max_slot_wal_keep_size' is not set to '-1',
highlighting a potential risk of required WAL files being deleted on the
publisher, which could disrupt logical replication.
A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.
---
 doc/src/sgml/ref/pg_createsubscriber.sgml     |  7 ++
 src/bin/pg_basebackup/pg_createsubscriber.c   | 20 +++++-
 .../t/040_pg_createsubscriber.pl              | 64 +++++++++++++++----
 3 files changed, 76 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index 26b8e64a4e..d56487fe2c 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -377,6 +377,13 @@ PostgreSQL documentation
     server.  If the target server has a standby, replication will break and a
     fresh standby should be created.
    </para>
+
+   <para>
+    Replication failures can occur if required WAL files are missing. To prevent
+    this, the source server must set
+    <xref linkend="guc-max-slot-wal-keep-size"/> to <literal>-1</literal> to
+    ensure that required WAL files are not prematurely removed.
+   </para>
   </refsect2>
 
   <refsect2>
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index faf18ccf13..7433c7ae2a 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	char	   *max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -872,6 +873,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	 * - wal_level = logical
 	 * - max_replication_slots >= current + number of dbs to be converted
 	 * - max_wal_senders >= current + number of dbs to be converted
+	 * - max_slot_wal_keep_size = -1 (to prevent deletion of required WAL files)
 	 * -----------------------------------------------------------------------
 	 */
 	res = PQexec(conn,
@@ -880,7 +882,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.current_setting('max_slot_wal_keep_size')");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +898,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = pg_strdup(PQgetvalue(res, 0, 6));
 
 	PQclear(res);
 
@@ -905,6 +909,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %s",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +945,18 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+	/*
+	 * Validate 'max_slot_wal_keep_size'. If this parameter is set to a
+	 * non-default value, it may cause replication failures due to required
+	 * WAL files being prematurely removed.
+	 */
+	if (dry_run && (strcmp(max_slot_wal_keep_size, "-1") != 0))
+	{
+		pg_log_warning("publisher requires WAL size must not be restricted");
+		pg_log_warning_hint("Set the configuration parameter \"%s\" to -1 to ensure that required WAL files are not prematurely removed.",
+							"max_slot_wal_keep_size");
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 5426159fa5..d55fd78009 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -318,24 +318,60 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
-# dry run mode on node S
+# Configure 'max_slot_wal_keep_size = 10MB' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 10MB');
+$node_p->reload;
+
+# dry run mode on node S and verify the warning message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(
+	[
+		'pg_createsubscriber', '--verbose',
+		'--verbose', '--recovery-timeout',
+		"$PostgreSQL::Test::Utils::timeout_default", '--dry-run',
+		'--pgdata', $node_s->data_dir,
+		'--publisher-server', $node_p->connstr($db1),
+		'--socketdir', $node_s->host,
+		'--subscriber-port', $node_s->port,
+		'--publication', 'pub1',
+		'--publication', 'pub2',
+		'--subscription', 'sub1',
+		'--subscription', 'sub2',
+		'--database', $db1,
+		'--database', $db2
+	],
+	0,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: warning: publisher requires WAL size must not be restricted/,
+		qr/Set the configuration parameter "max_slot_wal_keep_size" to -1 to ensure that required WAL files are not prematurely removed./,
+	],
+	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
+);
+
+# And, same again to see the output...
 command_ok(
 	[
 		'pg_createsubscriber', '--verbose',
-		'--recovery-timeout', "$PostgreSQL::Test::Utils::timeout_default",
-		'--dry-run', '--pgdata',
-		$node_s->data_dir, '--publisher-server',
-		$node_p->connstr($db1), '--socketdir',
-		$node_s->host, '--subscriber-port',
-		$node_s->port, '--publication',
-		'pub1', '--publication',
-		'pub2', '--subscription',
-		'sub1', '--subscription',
-		'sub2', '--database',
-		$db1, '--database',
-		$db2
+		'--verbose', '--recovery-timeout',
+		"$PostgreSQL::Test::Utils::timeout_default", '--dry-run',
+		'--pgdata', $node_s->data_dir,
+		'--publisher-server', $node_p->connstr($db1),
+		'--socketdir', $node_s->host,
+		'--subscriber-port', $node_s->port,
+		'--publication', 'pub1',
+		'--publication', 'pub2',
+		'--subscription', 'sub1',
+		'--subscription', 'sub2',
+		'--database', $db1,
+		'--database', $db2
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	'Ditto above, but this time using command_ok to retain the output');
+
+# Reset 'max_slot_wal_keep_size' to default after the test
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = -1');
+$node_p->reload;
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.41.0.windows.3

#28Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Shubham Khanna (#27)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Thu, 16 Jan 2025 at 12:34, Shubham Khanna
<khannashubham1197@gmail.com> wrote:

On Thu, Jan 16, 2025 at 12:12 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

On Thu, 16 Jan 2025 at 10:48, Shubham Khanna
<khannashubham1197@gmail.com> wrote:

On Wed, Jan 15, 2025 at 3:28 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Shubham.

Patch v9-0001 LGTM.

Upon reviewing the v9-0001 patch, I noticed that running 'pg perltidy'
was necessary. To ensure the patch adheres to the required coding
standards, I have applied 'pg perltidy' and made the necessary
adjustments.
The attached patch includes these formatting fixes along with the
latest changes.

Hi Shubham,

I reviewed the v10 patch. I have following comments:

1. The documentation is added under the prerequisite section. And the
prerequisite section states that
' There are some prerequisites for
<application>pg_createsubscriber</application> to convert the target server
into a logical replica. If these are not met, an error will be reported.'

But for our case we are giving a warning only during the dry run. So,
I feel we should add the documentation under the 'Warnings' section.

2. Spacing is not consistent with the rest of document:

+   Replication failures can occur if required WAL files are missing. To prevent
+   this, the source server must set <xref
linkend="guc-max-slot-wal-keep-size"/>
+   to <literal>-1</literal> to ensure that required WAL files are not
+   prematurely removed.

For the above added line, we should add a space before each line.

Fixed the given comments. The attached patch contains the suggested changes.

Hi Shubham,

I reviewed the v11 patch and it looks good to me.

Thanks and Regards,
Shlok Kyal

#29Peter Smith
smithpb2250@gmail.com
In reply to: Shubham Khanna (#27)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Patch v11-0001 LGTM.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#30Shubham Khanna
khannashubham1197@gmail.com
In reply to: Peter Smith (#29)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Fri, Jan 17, 2025 at 3:39 AM Peter Smith <smithpb2250@gmail.com> wrote:

Patch v11-0001 LGTM.

======

Following the recent comments on Patch v11-0001, the patch was not
applied to the HEAD. I have addressed the feedback and prepared a
rebased version to incorporate the necessary adjustments.
Please find the updated patch.

Thanks and regards,
Shubham Khanna.

Attachments:

v12-0001-Validate-max_slot_wal_keep_size-in-pg_createsubs.patchapplication/octet-stream; name=v12-0001-Validate-max_slot_wal_keep_size-in-pg_createsubs.patchDownload
From 80a8e6d61f955f94a08f7f6cb443fc44035cd45c Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Wed, 22 Jan 2025 16:32:58 +0530
Subject: [PATCH v12] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

This patch logs a warning if 'max_slot_wal_keep_size' is not set to '-1',
highlighting a potential risk of required WAL files being deleted on the
publisher, which could disrupt logical replication.
A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.
---
 doc/src/sgml/ref/pg_createsubscriber.sgml     |  7 ++
 src/bin/pg_basebackup/pg_createsubscriber.c   | 20 +++++-
 .../t/040_pg_createsubscriber.pl              | 67 ++++++++++++++-----
 3 files changed, 77 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index 26b8e64a4e..d56487fe2c 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -377,6 +377,13 @@ PostgreSQL documentation
     server.  If the target server has a standby, replication will break and a
     fresh standby should be created.
    </para>
+
+   <para>
+    Replication failures can occur if required WAL files are missing. To prevent
+    this, the source server must set
+    <xref linkend="guc-max-slot-wal-keep-size"/> to <literal>-1</literal> to
+    ensure that required WAL files are not prematurely removed.
+   </para>
   </refsect2>
 
   <refsect2>
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index faf18ccf13..7433c7ae2a 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	char	   *max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -872,6 +873,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	 * - wal_level = logical
 	 * - max_replication_slots >= current + number of dbs to be converted
 	 * - max_wal_senders >= current + number of dbs to be converted
+	 * - max_slot_wal_keep_size = -1 (to prevent deletion of required WAL files)
 	 * -----------------------------------------------------------------------
 	 */
 	res = PQexec(conn,
@@ -880,7 +882,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.current_setting('max_slot_wal_keep_size')");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +898,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = pg_strdup(PQgetvalue(res, 0, 6));
 
 	PQclear(res);
 
@@ -905,6 +909,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %s",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +945,18 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+	/*
+	 * Validate 'max_slot_wal_keep_size'. If this parameter is set to a
+	 * non-default value, it may cause replication failures due to required
+	 * WAL files being prematurely removed.
+	 */
+	if (dry_run && (strcmp(max_slot_wal_keep_size, "-1") != 0))
+	{
+		pg_log_warning("publisher requires WAL size must not be restricted");
+		pg_log_warning_hint("Set the configuration parameter \"%s\" to -1 to ensure that required WAL files are not prematurely removed.",
+							"max_slot_wal_keep_size");
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index c8dbdb7e9b..c52ae8a40a 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -331,25 +331,60 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
-# dry run mode on node S
+# Configure 'max_slot_wal_keep_size = 10MB' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 10MB');
+$node_p->reload;
+
+# dry run mode on node S and verify the warning message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(
+	[
+		'pg_createsubscriber', '--verbose',
+		'--verbose', '--recovery-timeout',
+		"$PostgreSQL::Test::Utils::timeout_default", '--dry-run',
+		'--pgdata', $node_s->data_dir,
+		'--publisher-server', $node_p->connstr($db1),
+		'--socketdir', $node_s->host,
+		'--subscriber-port', $node_s->port,
+		'--publication', 'pub1',
+		'--publication', 'pub2',
+		'--subscription', 'sub1',
+		'--subscription', 'sub2',
+		'--database', $db1,
+		'--database', $db2
+	],
+	0,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: warning: publisher requires WAL size must not be restricted/,
+		qr/Set the configuration parameter "max_slot_wal_keep_size" to -1 to ensure that required WAL files are not prematurely removed./,
+	],
+	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
+);
+
+# And, same again to see the output...
 command_ok(
 	[
-		'pg_createsubscriber',
-		'--verbose',
-		'--dry-run',
-		'--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
-		'--pgdata' => $node_s->data_dir,
-		'--publisher-server' => $node_p->connstr($db1),
-		'--socketdir' => $node_s->host,
-		'--subscriber-port' => $node_s->port,
-		'--publication' => 'pub1',
-		'--publication' => 'pub2',
-		'--subscription' => 'sub1',
-		'--subscription' => 'sub2',
-		'--database' => $db1,
-		'--database' => $db2,
+		'pg_createsubscriber', '--verbose',
+		'--verbose', '--recovery-timeout',
+		"$PostgreSQL::Test::Utils::timeout_default", '--dry-run',
+		'--pgdata', $node_s->data_dir,
+		'--publisher-server', $node_p->connstr($db1),
+		'--socketdir', $node_s->host,
+		'--subscriber-port', $node_s->port,
+		'--publication', 'pub1',
+		'--publication', 'pub2',
+		'--subscription', 'sub1',
+		'--subscription', 'sub2',
+		'--database', $db1,
+		'--database', $db2
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	'Ditto above, but this time using command_ok to retain the output');
+
+# Reset 'max_slot_wal_keep_size' to default after the test
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = -1');
+$node_p->reload;
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.41.0.windows.3

#31Peter Smith
smithpb2250@gmail.com
In reply to: Shubham Khanna (#30)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Hi Shubham.

The v12-0001 works OK, but your added TAP tests of this patch do not
conform to the new style of the fat-comma parameter passing since the
recent commit [1]https://github.com/postgres/postgres/commit/ce1b0f9da03e1cebc293f60b378079b22bd7cc0f, so you'll need to fix them so they do...

======
src/bin/pg_basebackup/t/040_pg_createsubscriber.pl

1.
+command_checks_all(
+ [
+ 'pg_createsubscriber', '--verbose',
+ '--verbose', '--recovery-timeout',
+ "$PostgreSQL::Test::Utils::timeout_default", '--dry-run',
+ '--pgdata', $node_s->data_dir,
+ '--publisher-server', $node_p->connstr($db1),
+ '--socketdir', $node_s->host,
+ '--subscriber-port', $node_s->port,
+ '--publication', 'pub1',
+ '--publication', 'pub2',
+ '--subscription', 'sub1',
+ '--subscription', 'sub2',
+ '--database', $db1,
+ '--database', $db2
+ ],

1a.
Fix the formatting. e.g. Look at master for example what new style
parameters look like and make yours look the same.

~

1b.
Here I think we don't really any --verbose at all because AFAIK
command_checks_all is not producing any output we can look at anyhow.

~~~

2.
+# And, same again to see the output...
 command_ok(
  [
- 'pg_createsubscriber',
- '--verbose',
- '--dry-run',
- '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
- '--pgdata' => $node_s->data_dir,
- '--publisher-server' => $node_p->connstr($db1),
- '--socketdir' => $node_s->host,
- '--subscriber-port' => $node_s->port,
- '--publication' => 'pub1',
- '--publication' => 'pub2',
- '--subscription' => 'sub1',
- '--subscription' => 'sub2',
- '--database' => $db1,
- '--database' => $db2,
+ 'pg_createsubscriber', '--verbose',
+ '--verbose', '--recovery-timeout',
+ "$PostgreSQL::Test::Utils::timeout_default", '--dry-run',
+ '--pgdata', $node_s->data_dir,
+ '--publisher-server', $node_p->connstr($db1),
+ '--socketdir', $node_s->host,
+ '--subscriber-port', $node_s->port,
+ '--publication', 'pub1',
+ '--publication', 'pub2',
+ '--subscription', 'sub1',
+ '--subscription', 'sub2',
+ '--database', $db1,
+ '--database', $db2
  ],

2a.
+# And, same again to see the output...

Maybe a better comment is more like

# Execute the same command again but this time use 'command_ok' so
that the log files can be inspected.

~

2b.
Fix the formatting. Look at master for example what new style
parameters look like and make yours look the same.

~

2c.
Include some comment (like exists elsewhere in this file) to say
"--verbose is used twice to show more information.", otherwise readers
could be tempted to think the double --verbose is a mistake.

~~~

3.
Rerun perltidy (the correct version) to ensure the formatting is
stable and good.

======
[1]: https://github.com/postgres/postgres/commit/ce1b0f9da03e1cebc293f60b378079b22bd7cc0f

Kind Regards,
Peter Smith.
Fujitsu Australia

#32Shubham Khanna
khannashubham1197@gmail.com
In reply to: Peter Smith (#31)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Thu, Jan 23, 2025 at 12:30 PM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Shubham.

The v12-0001 works OK, but your added TAP tests of this patch do not
conform to the new style of the fat-comma parameter passing since the
recent commit [1], so you'll need to fix them so they do...

======
src/bin/pg_basebackup/t/040_pg_createsubscriber.pl

1.
+command_checks_all(
+ [
+ 'pg_createsubscriber', '--verbose',
+ '--verbose', '--recovery-timeout',
+ "$PostgreSQL::Test::Utils::timeout_default", '--dry-run',
+ '--pgdata', $node_s->data_dir,
+ '--publisher-server', $node_p->connstr($db1),
+ '--socketdir', $node_s->host,
+ '--subscriber-port', $node_s->port,
+ '--publication', 'pub1',
+ '--publication', 'pub2',
+ '--subscription', 'sub1',
+ '--subscription', 'sub2',
+ '--database', $db1,
+ '--database', $db2
+ ],

1a.
Fix the formatting. e.g. Look at master for example what new style
parameters look like and make yours look the same.

~

1b.
Here I think we don't really any --verbose at all because AFAIK
command_checks_all is not producing any output we can look at anyhow.

~~~

2.
+# And, same again to see the output...
command_ok(
[
- 'pg_createsubscriber',
- '--verbose',
- '--dry-run',
- '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
- '--pgdata' => $node_s->data_dir,
- '--publisher-server' => $node_p->connstr($db1),
- '--socketdir' => $node_s->host,
- '--subscriber-port' => $node_s->port,
- '--publication' => 'pub1',
- '--publication' => 'pub2',
- '--subscription' => 'sub1',
- '--subscription' => 'sub2',
- '--database' => $db1,
- '--database' => $db2,
+ 'pg_createsubscriber', '--verbose',
+ '--verbose', '--recovery-timeout',
+ "$PostgreSQL::Test::Utils::timeout_default", '--dry-run',
+ '--pgdata', $node_s->data_dir,
+ '--publisher-server', $node_p->connstr($db1),
+ '--socketdir', $node_s->host,
+ '--subscriber-port', $node_s->port,
+ '--publication', 'pub1',
+ '--publication', 'pub2',
+ '--subscription', 'sub1',
+ '--subscription', 'sub2',
+ '--database', $db1,
+ '--database', $db2
],

2a.
+# And, same again to see the output...

Maybe a better comment is more like

# Execute the same command again but this time use 'command_ok' so
that the log files can be inspected.

~

2b.
Fix the formatting. Look at master for example what new style
parameters look like and make yours look the same.

~

2c.
Include some comment (like exists elsewhere in this file) to say
"--verbose is used twice to show more information.", otherwise readers
could be tempted to think the double --verbose is a mistake.

~~~

3.
Rerun perltidy (the correct version) to ensure the formatting is
stable and good.

======

Fixed the given comments. The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.

Attachments:

v13-0001-Validate-max_slot_wal_keep_size-in-pg_createsubs.patchapplication/octet-stream; name=v13-0001-Validate-max_slot_wal_keep_size-in-pg_createsubs.patchDownload
From 1b79a607d8ecd8d46e2f2be45da2a2d1d0976236 Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Wed, 22 Jan 2025 16:32:58 +0530
Subject: [PATCH v13] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

This patch logs a warning if 'max_slot_wal_keep_size' is not set to '-1',
highlighting a potential risk of required WAL files being deleted on the
publisher, which could disrupt logical replication.
A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.
---
 doc/src/sgml/ref/pg_createsubscriber.sgml     |  7 +++
 src/bin/pg_basebackup/pg_createsubscriber.c   | 20 ++++++++-
 .../t/040_pg_createsubscriber.pl              | 44 +++++++++++++++++--
 3 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index 26b8e64a4e..d56487fe2c 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -377,6 +377,13 @@ PostgreSQL documentation
     server.  If the target server has a standby, replication will break and a
     fresh standby should be created.
    </para>
+
+   <para>
+    Replication failures can occur if required WAL files are missing. To prevent
+    this, the source server must set
+    <xref linkend="guc-max-slot-wal-keep-size"/> to <literal>-1</literal> to
+    ensure that required WAL files are not prematurely removed.
+   </para>
   </refsect2>
 
   <refsect2>
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index faf18ccf13..7433c7ae2a 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	char	   *max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -872,6 +873,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	 * - wal_level = logical
 	 * - max_replication_slots >= current + number of dbs to be converted
 	 * - max_wal_senders >= current + number of dbs to be converted
+	 * - max_slot_wal_keep_size = -1 (to prevent deletion of required WAL files)
 	 * -----------------------------------------------------------------------
 	 */
 	res = PQexec(conn,
@@ -880,7 +882,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.current_setting('max_slot_wal_keep_size')");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +898,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = pg_strdup(PQgetvalue(res, 0, 6));
 
 	PQclear(res);
 
@@ -905,6 +909,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %s",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +945,18 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+	/*
+	 * Validate 'max_slot_wal_keep_size'. If this parameter is set to a
+	 * non-default value, it may cause replication failures due to required
+	 * WAL files being prematurely removed.
+	 */
+	if (dry_run && (strcmp(max_slot_wal_keep_size, "-1") != 0))
+	{
+		pg_log_warning("publisher requires WAL size must not be restricted");
+		pg_log_warning_hint("Set the configuration parameter \"%s\" to -1 to ensure that required WAL files are not prematurely removed.",
+							"max_slot_wal_keep_size");
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index c8dbdb7e9b..5cb9b6c590 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -331,11 +331,45 @@ $node_p->safe_psql($db1,
 $node_p->wait_for_replay_catchup($node_s);
 $node_s->stop;
 
-# dry run mode on node S
+# Configure 'max_slot_wal_keep_size = 10MB' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 10MB');
+$node_p->reload;
+
+# dry run mode on node S and verify the warning message for misconfigured
+# 'max_slot_wal_keep_size'
+command_checks_all(
+	[
+		'pg_createsubscriber',
+		'--dry-run',
+		'--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
+		'--pgdata' => $node_s->data_dir,
+		'--publisher-server' => $node_p->connstr($db1),
+		'--socketdir' => $node_s->host,
+		'--subscriber-port' => $node_s->port,
+		'--publication' => 'pub1',
+		'--publication' => 'pub2',
+		'--subscription' => 'sub1',
+		'--subscription' => 'sub2',
+		'--database' => $db1,
+		'--database' => $db2,
+	],
+	0,
+	[qr/./],
+	[
+		qr/pg_createsubscriber: warning: publisher requires WAL size must not be restricted/,
+		qr/Set the configuration parameter "max_slot_wal_keep_size" to -1 to ensure that required WAL files are not prematurely removed./,
+	],
+	'Validate warning for misconfigured max_slot_wal_keep_size on the publisher'
+);
+
+# Execute the same command again but this time use 'command_ok' so
+# that the log files can be inspected.  --verbose is used twice
+# to show more information.
 command_ok(
 	[
 		'pg_createsubscriber',
-		'--verbose',
+		'--verbose', '--verbose',
 		'--dry-run',
 		'--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
 		'--pgdata' => $node_s->data_dir,
@@ -349,7 +383,11 @@ command_ok(
 		'--database' => $db1,
 		'--database' => $db2,
 	],
-	'run pg_createsubscriber --dry-run on node S');
+	'Ditto above, but this time using command_ok to retain the output');
+
+# Reset 'max_slot_wal_keep_size' to default after the test
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = -1');
+$node_p->reload;
 
 # Check if node S is still a standby
 $node_s->start;
-- 
2.41.0.windows.3

#33Peter Smith
smithpb2250@gmail.com
In reply to: Shubham Khanna (#32)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

Patch v13-0001 LGTM.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#34Amit Kapila
amit.kapila16@gmail.com
In reply to: Shubham Khanna (#32)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Thu, Jan 23, 2025 at 6:42 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:

Fixed the given comments. The attached patch contains the suggested changes.

I am fine with giving a WARNING for max_slot_wal_keep_size but I don't
think we need a test for each and every setting required for
pg_createsubscriber. This will add time to the tests without adding
much value. So, I don't think this patch deserves an additional test.

--
With Regards,
Amit Kapila.

#35Shubham Khanna
khannashubham1197@gmail.com
In reply to: Amit Kapila (#34)
1 attachment(s)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Mon, Feb 17, 2025 at 4:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jan 23, 2025 at 6:42 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:

Fixed the given comments. The attached patch contains the suggested changes.

I am fine with giving a WARNING for max_slot_wal_keep_size but I don't
think we need a test for each and every setting required for
pg_createsubscriber. This will add time to the tests without adding
much value. So, I don't think this patch deserves an additional test.

I understand the concern that adding test cases for every required
setting in pg_createsubscriber may unnecessarily increase test
execution time without providing significant additional value.
Based on this, I have removed the additional test cases as they do not
offer substantial benefits to the overall test suite.

The attached Patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.

Attachments:

v14-0001-Validate-max_slot_wal_keep_size-in-pg_createsubs.patchapplication/octet-stream; name=v14-0001-Validate-max_slot_wal_keep_size-in-pg_createsubs.patchDownload
From acac447e5c74598dc6b201be466c426e26a29436 Mon Sep 17 00:00:00 2001
From: Khanna <Shubham.Khanna@fujitsu.com>
Date: Wed, 22 Jan 2025 16:32:58 +0530
Subject: [PATCH v14] Validate-max_slot_wal_keep_size-in-pg_createsubscriber

This patch introduces validation for the 'max_slot_wal_keep_size' GUC in the
'pg_createsubscriber' utility. The utility now checks that the publisher's
'max_slot_wal_keep_size' is set to '-1' during subscription creation.
This ensures proper functioning of logical replication slots.

The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

This patch logs a warning if 'max_slot_wal_keep_size' is not set to '-1',
highlighting a potential risk of required WAL files being deleted on the
publisher, which could disrupt logical replication.
A test case has been added to validate correct warning reporting
when 'max_slot_wal_keep_size' is misconfigured.
---
 doc/src/sgml/ref/pg_createsubscriber.sgml   |  7 +++++++
 src/bin/pg_basebackup/pg_createsubscriber.c | 20 +++++++++++++++++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index 26b8e64a4e0..d56487fe2ca 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -377,6 +377,13 @@ PostgreSQL documentation
     server.  If the target server has a standby, replication will break and a
     fresh standby should be created.
    </para>
+
+   <para>
+    Replication failures can occur if required WAL files are missing. To prevent
+    this, the source server must set
+    <xref linkend="guc-max-slot-wal-keep-size"/> to <literal>-1</literal> to
+    ensure that required WAL files are not prematurely removed.
+   </para>
   </refsect2>
 
   <refsect2>
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index 2d881d54f5b..3fe5157fcd1 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -849,6 +849,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	int			max_walsenders;
 	int			cur_walsenders;
 	int			max_prepared_transactions;
+	char	   *max_slot_wal_keep_size;
 
 	pg_log_info("checking settings on publisher");
 
@@ -872,6 +873,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	 * - wal_level = logical
 	 * - max_replication_slots >= current + number of dbs to be converted
 	 * - max_wal_senders >= current + number of dbs to be converted
+	 * - max_slot_wal_keep_size = -1 (to prevent deletion of required WAL files)
 	 * -----------------------------------------------------------------------
 	 */
 	res = PQexec(conn,
@@ -880,7 +882,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 				 " (SELECT count(*) FROM pg_catalog.pg_replication_slots),"
 				 " pg_catalog.current_setting('max_wal_senders'),"
 				 " (SELECT count(*) FROM pg_catalog.pg_stat_activity WHERE backend_type = 'walsender'),"
-				 " pg_catalog.current_setting('max_prepared_transactions')");
+				 " pg_catalog.current_setting('max_prepared_transactions'),"
+				 " pg_catalog.current_setting('max_slot_wal_keep_size')");
 
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
@@ -895,6 +898,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	max_walsenders = atoi(PQgetvalue(res, 0, 3));
 	cur_walsenders = atoi(PQgetvalue(res, 0, 4));
 	max_prepared_transactions = atoi(PQgetvalue(res, 0, 5));
+	max_slot_wal_keep_size = pg_strdup(PQgetvalue(res, 0, 6));
 
 	PQclear(res);
 
@@ -905,6 +909,8 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 	pg_log_debug("publisher: current wal senders: %d", cur_walsenders);
 	pg_log_debug("publisher: max_prepared_transactions: %d",
 				 max_prepared_transactions);
+	pg_log_debug("publisher: max_slot_wal_keep_size: %s",
+				 max_slot_wal_keep_size);
 
 	disconnect_database(conn, false);
 
@@ -939,6 +945,18 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 							  "Prepared transactions will be replicated at COMMIT PREPARED.");
 	}
 
+	/*
+	 * Validate 'max_slot_wal_keep_size'. If this parameter is set to a
+	 * non-default value, it may cause replication failures due to required
+	 * WAL files being prematurely removed.
+	 */
+	if (dry_run && (strcmp(max_slot_wal_keep_size, "-1") != 0))
+	{
+		pg_log_warning("publisher requires WAL size must not be restricted");
+		pg_log_warning_hint("Set the configuration parameter \"%s\" to -1 to ensure that required WAL files are not prematurely removed.",
+							"max_slot_wal_keep_size");
+	}
+
 	pg_free(wal_level);
 
 	if (failed)
-- 
2.41.0.windows.3

#36Amit Kapila
amit.kapila16@gmail.com
In reply to: Shubham Khanna (#35)
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

On Mon, Feb 17, 2025 at 4:23 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:

The attached Patch contains the suggested changes.

Pushed after making a minor change in the WARNING message.

--
With Regards,
Amit Kapila.