Dropping publication breaks logical replication
Hi Vignesh, Amit,
We encountered a situation where a customer dropped a publication
accidentally and that broke logical replication in an irrecoverable
manner. This is PG 15.3 but the team confirmed that the behaviour is
reproducible with PG 17 as well.
When a WAL sender processes a WAL record recording a change in
publication, it ends up calling LoadPublication() which throws an
error if a publication mentioned in START_REPLICATION command is not
found. The downstream tries to reconnect but the WAL sender again
repeats the same process going in an error loop. Creating the
publication does not help since WAL sender will always encounter the
WAL record dropping the publication first.
There are ways to come out of this situation, but not very clean always
1. Remove publication from subscription, run logical replication till
it passes the point where publication was added, add the publication
back and continue. It's not always possible to know when the
publication was added back and thus it becomes tedious or next to
impossible to apply these steps.
2. Reseeding the replication slot which involves copying all the data
again and not feasible in case of large databases.
3. Skipping the transaction which dropped the publication. This will
work if drop publication was the only thing in that transaction but
not otherwise. Confirming that is tricky and requires some expert
help.
In PG 18 onwards, this behaviour is fixed by throwing a WARNING
instead of an error. In the relevant thread [1] where the fix to PG 18
was discussed, backpatching was also discussed. Back then it was
deferred because of lack of field reports. But we are seeing this
situation now. So maybe it's time to backpatch the fix. Further PG 15
documentation mentions that
https://www.postgresql.org/docs/15/sql-createsubscription.html. So the
users will expect that their logical replication will not be affected
(except for the data published by the publication) if a publication is
dropped or does not exist. So, backpatching the change would make the
behaviour compatible with the documentation.
The backport seems to be straight forward. Please let me know if you
need my help in doing so, if we decide to backport the fix.
--
Best Wishes,
Ashutosh Bapat
On Fri, Aug 1, 2025 at 10:55 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
Hi Vignesh, Amit,
We encountered a situation where a customer dropped a publication
accidentally and that broke logical replication in an irrecoverable
manner. This is PG 15.3 but the team confirmed that the behaviour is
reproducible with PG 17 as well.When a WAL sender processes a WAL record recording a change in
publication, it ends up calling LoadPublication() which throws an
error if a publication mentioned in START_REPLICATION command is not
found. The downstream tries to reconnect but the WAL sender again
repeats the same process going in an error loop. Creating the
publication does not help since WAL sender will always encounter the
WAL record dropping the publication first.There are ways to come out of this situation, but not very clean always
1. Remove publication from subscription, run logical replication till
it passes the point where publication was added, add the publication
back and continue. It's not always possible to know when the
publication was added back and thus it becomes tedious or next to
impossible to apply these steps.
2. Reseeding the replication slot which involves copying all the data
again and not feasible in case of large databases.
3. Skipping the transaction which dropped the publication. This will
work if drop publication was the only thing in that transaction but
not otherwise. Confirming that is tricky and requires some expert
help.In PG 18 onwards, this behaviour is fixed by throwing a WARNING
instead of an error. In the relevant thread [1] where the fix to PG 18
was discussed, backpatching was also discussed. Back then it was
deferred because of lack of field reports. But we are seeing this
situation now. So maybe it's time to backpatch the fix. Further PG 15
documentation mentions that
https://www.postgresql.org/docs/15/sql-createsubscription.html. So the
users will expect that their logical replication will not be affected
(except for the data published by the publication) if a publication is
dropped or does not exist. So, backpatching the change would make the
behaviour compatible with the documentation.The backport seems to be straight forward. Please let me know if you
need my help in doing so, if we decide to backport the fix.
I think you missed to add the link to the "relevant thread [1] "
--
Regards,
Dilip Kumar
Google
On Fri, Aug 1, 2025 at 11:14 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Aug 1, 2025 at 10:55 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:Hi Vignesh, Amit,
We encountered a situation where a customer dropped a publication
accidentally and that broke logical replication in an irrecoverable
manner. This is PG 15.3 but the team confirmed that the behaviour is
reproducible with PG 17 as well.When a WAL sender processes a WAL record recording a change in
publication, it ends up calling LoadPublication() which throws an
error if a publication mentioned in START_REPLICATION command is not
found. The downstream tries to reconnect but the WAL sender again
repeats the same process going in an error loop. Creating the
publication does not help since WAL sender will always encounter the
WAL record dropping the publication first.There are ways to come out of this situation, but not very clean always
1. Remove publication from subscription, run logical replication till
it passes the point where publication was added, add the publication
back and continue. It's not always possible to know when the
publication was added back and thus it becomes tedious or next to
impossible to apply these steps.
2. Reseeding the replication slot which involves copying all the data
again and not feasible in case of large databases.
3. Skipping the transaction which dropped the publication. This will
work if drop publication was the only thing in that transaction but
not otherwise. Confirming that is tricky and requires some expert
help.In PG 18 onwards, this behaviour is fixed by throwing a WARNING
instead of an error. In the relevant thread [1] where the fix to PG 18
was discussed, backpatching was also discussed. Back then it was
deferred because of lack of field reports. But we are seeing this
situation now. So maybe it's time to backpatch the fix. Further PG 15
documentation mentions that
https://www.postgresql.org/docs/15/sql-createsubscription.html. So the
users will expect that their logical replication will not be affected
(except for the data published by the publication) if a publication is
dropped or does not exist. So, backpatching the change would make the
behaviour compatible with the documentation.The backport seems to be straight forward. Please let me know if you
need my help in doing so, if we decide to backport the fix.I think you missed to add the link to the "relevant thread [1] "
Thanks for noticing it. Here it is
[1]: /messages/by-id/CALDaNm0-n8FGAorM+bTxkzn+AOUyx5=L_XmnvOP6T24+-NcBKg@mail.gmail.com
--
Best Wishes,
Ashutosh Bapat
On Fri, Aug 1, 2025 at 10:54 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
Hi Vignesh, Amit,
We encountered a situation where a customer dropped a publication
accidentally and that broke logical replication in an irrecoverable
manner. This is PG 15.3 but the team confirmed that the behaviour is
reproducible with PG 17 as well.When a WAL sender processes a WAL record recording a change in
publication, it ends up calling LoadPublication() which throws an
error if a publication mentioned in START_REPLICATION command is not
found. The downstream tries to reconnect but the WAL sender again
repeats the same process going in an error loop. Creating the
publication does not help since WAL sender will always encounter the
WAL record dropping the publication first.There are ways to come out of this situation, but not very clean always
1. Remove publication from subscription, run logical replication till
it passes the point where publication was added, add the publication
back and continue. It's not always possible to know when the
publication was added back and thus it becomes tedious or next to
impossible to apply these steps.
2. Reseeding the replication slot which involves copying all the data
again and not feasible in case of large databases.
3. Skipping the transaction which dropped the publication. This will
work if drop publication was the only thing in that transaction but
not otherwise. Confirming that is tricky and requires some expert
help.In PG 18 onwards, this behaviour is fixed by throwing a WARNING
instead of an error. In the relevant thread [1] where the fix to PG 18
was discussed, backpatching was also discussed. Back then it was
deferred because of lack of field reports. But we are seeing this
situation now.
Thanks for the report. One more reason we were hesitant to backpatch
was that it is possible that some users may expect replication to stop
in this case as mentioned by Tomas in one of his emails [1]/messages/by-id/dc08add3-10a8-738b-983a-191c7406707b@enterprisedb.com ("See the
para starting with "Imagine you have a subscriber ..." in his email").
We thought, as it could be perceived as a behavior change, so better
to do it as a HEAD only change.
Now, seeing this report, it seems the customer(s) are probably okay to
skip a missing publication and let replication continue. So, we should
consider backpatching this change but it would be better if few more
people can share their opinion on this matter.
[1]: /messages/by-id/dc08add3-10a8-738b-983a-191c7406707b@enterprisedb.com
--
With Regards,
Amit Kapila.
On Fri, Aug 1, 2025 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Aug 1, 2025 at 10:54 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:Hi Vignesh, Amit,
We encountered a situation where a customer dropped a publication
accidentally and that broke logical replication in an irrecoverable
manner. This is PG 15.3 but the team confirmed that the behaviour is
reproducible with PG 17 as well.When a WAL sender processes a WAL record recording a change in
publication, it ends up calling LoadPublication() which throws an
error if a publication mentioned in START_REPLICATION command is not
found. The downstream tries to reconnect but the WAL sender again
repeats the same process going in an error loop. Creating the
publication does not help since WAL sender will always encounter the
WAL record dropping the publication first.There are ways to come out of this situation, but not very clean always
1. Remove publication from subscription, run logical replication till
it passes the point where publication was added, add the publication
back and continue. It's not always possible to know when the
publication was added back and thus it becomes tedious or next to
impossible to apply these steps.
2. Reseeding the replication slot which involves copying all the data
again and not feasible in case of large databases.
3. Skipping the transaction which dropped the publication. This will
work if drop publication was the only thing in that transaction but
not otherwise. Confirming that is tricky and requires some expert
help.In PG 18 onwards, this behaviour is fixed by throwing a WARNING
instead of an error. In the relevant thread [1] where the fix to PG 18
was discussed, backpatching was also discussed. Back then it was
deferred because of lack of field reports. But we are seeing this
situation now.Thanks for the report. One more reason we were hesitant to backpatch
was that it is possible that some users may expect replication to stop
in this case as mentioned by Tomas in one of his emails [1] ("See the
para starting with "Imagine you have a subscriber ..." in his email").
We thought, as it could be perceived as a behavior change, so better
to do it as a HEAD only change.
Yes, that's a valid concern. We have to choose between missing some
changes because of missing publication and an irrecoverable error. The
latter seems more serious. The first is covered by our documentation -
maybe indirectly and we throw a WARNING. So choosing the second seems
a better option. Maybe we could do a better job at documenting this.
I wish we could pass a "missing_ok" flag with START_REPLICATION
command, but we can't do that in the back branches. And we haven't
done that when we committed the fix to PG 18.
Now, seeing this report, it seems the customer(s) are probably okay to
skip a missing publication and let replication continue. So, we should
consider backpatching this change but it would be better if few more
people can share their opinion on this matter.
Including Tomas for his opinion. Who else do you think can provide an
opinion based on experience?
Thinking aloud about what you suggest in [1]/messages/by-id/CAA4eK1K40xhObN1MWO7=rzrJmo+oQ048O8drZ86-F7artVvwQQ@mail.gmail.com in the same thread. The
problem there is, upstream can not access downstream subscription and
has no control over them so it can not avoid dropping a publication
even if it's being used by a subscription. What at most we can do is
not allow dropping a publication being used by a running WAL sender by
locking publication in use somehow. However, even that won't help
much. Assume that a WAL sender disconnects for some other reason,
followed by the publication getting dropped. We end up in the same
situation.
[1]: /messages/by-id/CAA4eK1K40xhObN1MWO7=rzrJmo+oQ048O8drZ86-F7artVvwQQ@mail.gmail.com
--
Best Wishes,
Ashutosh Bapat
On Fri, 1 Aug 2025 at 10:54, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
Hi Vignesh, Amit,
We encountered a situation where a customer dropped a publication
accidentally and that broke logical replication in an irrecoverable
manner. This is PG 15.3 but the team confirmed that the behaviour is
reproducible with PG 17 as well.When a WAL sender processes a WAL record recording a change in
publication, it ends up calling LoadPublication() which throws an
error if a publication mentioned in START_REPLICATION command is not
found. The downstream tries to reconnect but the WAL sender again
repeats the same process going in an error loop. Creating the
publication does not help since WAL sender will always encounter the
WAL record dropping the publication first.There are ways to come out of this situation, but not very clean always
1. Remove publication from subscription, run logical replication till
it passes the point where publication was added, add the publication
back and continue. It's not always possible to know when the
publication was added back and thus it becomes tedious or next to
impossible to apply these steps.
2. Reseeding the replication slot which involves copying all the data
again and not feasible in case of large databases.
3. Skipping the transaction which dropped the publication. This will
work if drop publication was the only thing in that transaction but
not otherwise. Confirming that is tricky and requires some expert
help.In PG 18 onwards, this behaviour is fixed by throwing a WARNING
instead of an error. In the relevant thread [1] where the fix to PG 18
was discussed, backpatching was also discussed. Back then it was
deferred because of lack of field reports. But we are seeing this
situation now. So maybe it's time to backpatch the fix. Further PG 15
documentation mentions that
https://www.postgresql.org/docs/15/sql-createsubscription.html. So the
users will expect that their logical replication will not be affected
(except for the data published by the publication) if a publication is
dropped or does not exist. So, backpatching the change would make the
behaviour compatible with the documentation.The backport seems to be straight forward. Please let me know if you
need my help in doing so, if we decide to backport the fix.
Now that this has been reported on the back branches, we should
consider whether it's appropriate to backport the fix. Here are the
patches prepared for the back branches.
Regards,
Vignesh
Attachments:
v1_PG13-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION.patchtext/x-patch; charset=US-ASCII; name=v1_PG13-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION.patchDownload
From fa8d5b256e8d7ca1428695115aad115fbec6f11c Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 2 Aug 2025 18:46:49 +0530
Subject: [PATCH v1_PG13] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 ++++++-
src/test/subscription/t/001_rep_changes.pl | 48 ++++++++++++++++++++-
2 files changed, 61 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 3d98d60d6c4..b3bd2b92f58 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -642,6 +642,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -652,9 +657,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl
index c60ef1c4f52..8c33ae51f4b 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 27;
+use Test::More tests => 28;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -390,6 +390,52 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*) FROM pg_replication_origin");
is($result, qq(0), 'check replication origin was dropped on subscriber');
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub1 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub1 CONNECTION '$publisher_connstr' PUBLICATION tap_pub1"
+);
+
+# Wait for initial table sync to finish
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub1');
+
+$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub1");
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub1/,
+ $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub1 FOR TABLE tab_3");
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(1)");
+
+$node_publisher->wait_for_catchup('tap_sub1');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab_3");
+is($result, qq(1),
+ 'check that the incremental data is replicated after the publication is created'
+);
+
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub1");
+
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
v1_PG14-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchtext/x-patch; charset=US-ASCII; name=v1_PG14-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchDownload
From 5d5e45a48ecbaf9dba05dd757a82ac98e35f0057 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 1 Aug 2025 22:26:06 +0530
Subject: [PATCH v1_PG14] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 +++++-
src/test/subscription/t/021_alter_sub_pub.pl | 57 +++++++++++++++++++-
2 files changed, 69 insertions(+), 4 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index a81215cff86..f463e4d6d60 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -841,6 +841,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -851,9 +856,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/021_alter_sub_pub.pl b/src/test/subscription/t/021_alter_sub_pub.pl
index 4c59d44e33f..9cc93caf7e9 100644
--- a/src/test/subscription/t/021_alter_sub_pub.pl
+++ b/src/test/subscription/t/021_alter_sub_pub.pl
@@ -1,12 +1,14 @@
# Copyright (c) 2021, PostgreSQL Global Development Group
-# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
+# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION and
+# ensures that creating a publication associated with a subscription at a later
+# point of time does not break logical replication.
use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 3;
+use Test::More tests => 4;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -81,6 +83,57 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*), min(a), max(a) FROM tab_1");
is($result, qq(20|1|10), 'check initial data is copied to subscriber');
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+my $oldpid = $node_publisher->safe_psql('postgres',
+ "SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+);
+
+# Set the subscription with a missing publication
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub ADD PUBLICATION tap_pub_3");
+
+# Wait for the walsender to restart after altering the subscription
+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+ )
+ or die
+ "Timed out while waiting for apply worker to restart after altering the subscription";
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres',"INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub_3/, $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION");
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT * FROM tab_3");
+is($result, qq(1
+2), 'check that the incremental data is replicated after the publication is created');
+
# shutdown
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
v1_PG15-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchtext/x-patch; charset=US-ASCII; name=v1_PG15-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchDownload
From f0e05a438b89e86e6424c461aff6d2de618bd917 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 1 Aug 2025 15:28:46 +0530
Subject: [PATCH v1_PG15] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 +++++-
src/test/subscription/t/024_add_drop_pub.pl | 55 ++++++++++++++++++++-
2 files changed, 68 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 64f62de635f..9e427beed4f 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1773,6 +1773,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -1783,9 +1788,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/024_add_drop_pub.pl b/src/test/subscription/t/024_add_drop_pub.pl
index eaf47e66f1a..b39feaf3119 100644
--- a/src/test/subscription/t/024_add_drop_pub.pl
+++ b/src/test/subscription/t/024_add_drop_pub.pl
@@ -1,7 +1,9 @@
# Copyright (c) 2021-2022, PostgreSQL Global Development Group
-# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
+# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION and
+# ensures that creating a publication associated with a subscription at a later
+# point of time does not break logical replication.
use strict;
use warnings;
use PostgreSQL::Test::Cluster;
@@ -80,6 +82,57 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*), min(a), max(a) FROM tab_1");
is($result, qq(20|1|10), 'check initial data is copied to subscriber');
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+my $oldpid = $node_publisher->safe_psql('postgres',
+ "SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+);
+
+# Set the subscription with a missing publication
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub SET PUBLICATION tap_pub_3");
+
+# Wait for the walsender to restart after altering the subscription
+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+ )
+ or die
+ "Timed out while waiting for apply worker to restart after altering the subscription";
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres',"INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub_3/, $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION");
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT * FROM tab_3");
+is($result, qq(1
+2), 'check that the incremental data is replicated after the publication is created');
+
# shutdown
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
v1_PG17-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchtext/x-patch; charset=US-ASCII; name=v1_PG17-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchDownload
From 3d874dc02ccadb04723d4451da6da43bd1ed4d14 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 1 Aug 2025 14:51:26 +0530
Subject: [PATCH v1_PG17] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 +++++-
src/test/subscription/t/024_add_drop_pub.pl | 55 ++++++++++++++++++++-
2 files changed, 68 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 99518c6b6dd..fa0fb915f15 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1741,6 +1741,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -1751,9 +1756,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/024_add_drop_pub.pl b/src/test/subscription/t/024_add_drop_pub.pl
index c0d7ffcb6b8..1ea7d4611da 100644
--- a/src/test/subscription/t/024_add_drop_pub.pl
+++ b/src/test/subscription/t/024_add_drop_pub.pl
@@ -1,7 +1,9 @@
# Copyright (c) 2021-2024, PostgreSQL Global Development Group
-# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
+# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION and
+# ensures that creating a publication associated with a subscription at a later
+# point of time does not break logical replication.
use strict;
use warnings FATAL => 'all';
use PostgreSQL::Test::Cluster;
@@ -80,6 +82,57 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*), min(a), max(a) FROM tab_1");
is($result, qq(20|1|10), 'check initial data is copied to subscriber');
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+my $oldpid = $node_publisher->safe_psql('postgres',
+ "SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+);
+
+# Set the subscription with a missing publication
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub SET PUBLICATION tap_pub_3");
+
+# Wait for the walsender to restart after altering the subscription
+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+ )
+ or die
+ "Timed out while waiting for apply worker to restart after altering the subscription";
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres',"INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub_3/, $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION");
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT * FROM tab_3");
+is($result, qq(1
+2), 'check that the incremental data is replicated after the publication is created');
+
# shutdown
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
v1_PG16-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchtext/x-patch; charset=US-ASCII; name=v1_PG16-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchDownload
From 910b8d66e50050574cfb98a53d6a1ed8f00ac6ec Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 1 Aug 2025 15:25:53 +0530
Subject: [PATCH v1_PG16] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 +++++-
src/test/subscription/t/024_add_drop_pub.pl | 55 ++++++++++++++++++++-
2 files changed, 68 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 32b74bb4752..d83da138081 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1737,6 +1737,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -1747,9 +1752,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/024_add_drop_pub.pl b/src/test/subscription/t/024_add_drop_pub.pl
index 8614b1b5b34..33ecfe7a415 100644
--- a/src/test/subscription/t/024_add_drop_pub.pl
+++ b/src/test/subscription/t/024_add_drop_pub.pl
@@ -1,7 +1,9 @@
# Copyright (c) 2021-2023, PostgreSQL Global Development Group
-# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
+# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION and
+# ensures that creating a publication associated with a subscription at a later
+# point of time does not break logical replication.
use strict;
use warnings;
use PostgreSQL::Test::Cluster;
@@ -80,6 +82,57 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*), min(a), max(a) FROM tab_1");
is($result, qq(20|1|10), 'check initial data is copied to subscriber');
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+my $oldpid = $node_publisher->safe_psql('postgres',
+ "SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+);
+
+# Set the subscription with a missing publication
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub SET PUBLICATION tap_pub_3");
+
+# Wait for the walsender to restart after altering the subscription
+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+ )
+ or die
+ "Timed out while waiting for apply worker to restart after altering the subscription";
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres',"INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub_3/, $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION");
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT * FROM tab_3");
+is($result, qq(1
+2), 'check that the incremental data is replicated after the publication is created');
+
# shutdown
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
Hi Vignesh,
Thanks for the patches.
On Sat, Aug 2, 2025 at 7:10 PM vignesh C <vignesh21@gmail.com> wrote:
The backport seems to be straight forward. Please let me know if you
need my help in doing so, if we decide to backport the fix.Now that this has been reported on the back branches, we should
consider whether it's appropriate to backport the fix. Here are the
patches prepared for the back branches.
PG14 and + patches do not test that DROP PUBLICATION does not disrupt
the publication. I think we need to test that as well.
PG13 tests DROP PUBLICATION OTOH. That's good. I think it has a race
condition because +my $offset = -s $node_publisher->logfile; is
executed after dropping the publication. If some background change
triggers publication validation before capturing the file offset, we
might miss the WARNING and the test will fail. Instead capturing
offset before dropping publication may be safer - the publication
exists till it dropped, so the log file cannot have WARNING in there
when offset is captured.
--
Best Wishes,
Ashutosh Bapat
On Mon, 4 Aug 2025 at 09:47, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
Hi Vignesh,
Thanks for the patches.On Sat, Aug 2, 2025 at 7:10 PM vignesh C <vignesh21@gmail.com> wrote:
The backport seems to be straight forward. Please let me know if you
need my help in doing so, if we decide to backport the fix.Now that this has been reported on the back branches, we should
consider whether it's appropriate to backport the fix. Here are the
patches prepared for the back branches.PG14 and + patches do not test that DROP PUBLICATION does not disrupt
the publication. I think we need to test that as well.
Currently, the test across all branches except PG13 is the same test
used in the master branch. For PG13, since there was no existing
subscription, I modified the test slightly to accommodate that. If I
handle the comment you suggest, the test in master and the backbranch
will be different. Should we keep the test similar to the master or is
it ok to address your above comment and keep it different?
PG13 tests DROP PUBLICATION OTOH. That's good. I think it has a race
condition because +my $offset = -s $node_publisher->logfile; is
executed after dropping the publication. If some background change
triggers publication validation before capturing the file offset, we
might miss the WARNING and the test will fail. Instead capturing
offset before dropping publication may be safer - the publication
exists till it dropped, so the log file cannot have WARNING in there
when offset is captured.
I will handle this in the next version.
Regards,
Vignesh
On Mon, 4 Aug 2025 at 16:08, vignesh C <vignesh21@gmail.com> wrote:
On Mon, 4 Aug 2025 at 09:47, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:Hi Vignesh,
Thanks for the patches.On Sat, Aug 2, 2025 at 7:10 PM vignesh C <vignesh21@gmail.com> wrote:
The backport seems to be straight forward. Please let me know if you
need my help in doing so, if we decide to backport the fix.Now that this has been reported on the back branches, we should
consider whether it's appropriate to backport the fix. Here are the
patches prepared for the back branches.PG14 and + patches do not test that DROP PUBLICATION does not disrupt
the publication. I think we need to test that as well.Currently, the test across all branches except PG13 is the same test
used in the master branch. For PG13, since there was no existing
subscription, I modified the test slightly to accommodate that. If I
handle the comment you suggest, the test in master and the backbranch
will be different. Should we keep the test similar to the master or is
it ok to address your above comment and keep it different?PG13 tests DROP PUBLICATION OTOH. That's good. I think it has a race
condition because +my $offset = -s $node_publisher->logfile; is
executed after dropping the publication. If some background change
triggers publication validation before capturing the file offset, we
might miss the WARNING and the test will fail. Instead capturing
offset before dropping publication may be safer - the publication
exists till it dropped, so the log file cannot have WARNING in there
when offset is captured.I will handle this in the next version.
This is addressed in the attached patch. Only the PG13 branch patch is
updated, there is no change in other branch patches.
Regards,
Vignesh
Attachments:
v1_PG17-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchtext/x-patch; charset=US-ASCII; name=v1_PG17-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchDownload
From 3d874dc02ccadb04723d4451da6da43bd1ed4d14 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 1 Aug 2025 14:51:26 +0530
Subject: [PATCH v1_PG17] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 +++++-
src/test/subscription/t/024_add_drop_pub.pl | 55 ++++++++++++++++++++-
2 files changed, 68 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 99518c6b6dd..fa0fb915f15 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1741,6 +1741,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -1751,9 +1756,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/024_add_drop_pub.pl b/src/test/subscription/t/024_add_drop_pub.pl
index c0d7ffcb6b8..1ea7d4611da 100644
--- a/src/test/subscription/t/024_add_drop_pub.pl
+++ b/src/test/subscription/t/024_add_drop_pub.pl
@@ -1,7 +1,9 @@
# Copyright (c) 2021-2024, PostgreSQL Global Development Group
-# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
+# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION and
+# ensures that creating a publication associated with a subscription at a later
+# point of time does not break logical replication.
use strict;
use warnings FATAL => 'all';
use PostgreSQL::Test::Cluster;
@@ -80,6 +82,57 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*), min(a), max(a) FROM tab_1");
is($result, qq(20|1|10), 'check initial data is copied to subscriber');
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+my $oldpid = $node_publisher->safe_psql('postgres',
+ "SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+);
+
+# Set the subscription with a missing publication
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub SET PUBLICATION tap_pub_3");
+
+# Wait for the walsender to restart after altering the subscription
+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+ )
+ or die
+ "Timed out while waiting for apply worker to restart after altering the subscription";
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres',"INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub_3/, $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION");
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT * FROM tab_3");
+is($result, qq(1
+2), 'check that the incremental data is replicated after the publication is created');
+
# shutdown
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
v1_PG15-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchtext/x-patch; charset=US-ASCII; name=v1_PG15-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchDownload
From f0e05a438b89e86e6424c461aff6d2de618bd917 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 1 Aug 2025 15:28:46 +0530
Subject: [PATCH v1_PG15] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 +++++-
src/test/subscription/t/024_add_drop_pub.pl | 55 ++++++++++++++++++++-
2 files changed, 68 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 64f62de635f..9e427beed4f 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1773,6 +1773,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -1783,9 +1788,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/024_add_drop_pub.pl b/src/test/subscription/t/024_add_drop_pub.pl
index eaf47e66f1a..b39feaf3119 100644
--- a/src/test/subscription/t/024_add_drop_pub.pl
+++ b/src/test/subscription/t/024_add_drop_pub.pl
@@ -1,7 +1,9 @@
# Copyright (c) 2021-2022, PostgreSQL Global Development Group
-# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
+# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION and
+# ensures that creating a publication associated with a subscription at a later
+# point of time does not break logical replication.
use strict;
use warnings;
use PostgreSQL::Test::Cluster;
@@ -80,6 +82,57 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*), min(a), max(a) FROM tab_1");
is($result, qq(20|1|10), 'check initial data is copied to subscriber');
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+my $oldpid = $node_publisher->safe_psql('postgres',
+ "SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+);
+
+# Set the subscription with a missing publication
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub SET PUBLICATION tap_pub_3");
+
+# Wait for the walsender to restart after altering the subscription
+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+ )
+ or die
+ "Timed out while waiting for apply worker to restart after altering the subscription";
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres',"INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub_3/, $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION");
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT * FROM tab_3");
+is($result, qq(1
+2), 'check that the incremental data is replicated after the publication is created');
+
# shutdown
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
v1_PG14-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchtext/x-patch; charset=US-ASCII; name=v1_PG14-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchDownload
From 5d5e45a48ecbaf9dba05dd757a82ac98e35f0057 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 1 Aug 2025 22:26:06 +0530
Subject: [PATCH v1_PG14] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 +++++-
src/test/subscription/t/021_alter_sub_pub.pl | 57 +++++++++++++++++++-
2 files changed, 69 insertions(+), 4 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index a81215cff86..f463e4d6d60 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -841,6 +841,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -851,9 +856,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/021_alter_sub_pub.pl b/src/test/subscription/t/021_alter_sub_pub.pl
index 4c59d44e33f..9cc93caf7e9 100644
--- a/src/test/subscription/t/021_alter_sub_pub.pl
+++ b/src/test/subscription/t/021_alter_sub_pub.pl
@@ -1,12 +1,14 @@
# Copyright (c) 2021, PostgreSQL Global Development Group
-# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
+# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION and
+# ensures that creating a publication associated with a subscription at a later
+# point of time does not break logical replication.
use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 3;
+use Test::More tests => 4;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -81,6 +83,57 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*), min(a), max(a) FROM tab_1");
is($result, qq(20|1|10), 'check initial data is copied to subscriber');
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+my $oldpid = $node_publisher->safe_psql('postgres',
+ "SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+);
+
+# Set the subscription with a missing publication
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub ADD PUBLICATION tap_pub_3");
+
+# Wait for the walsender to restart after altering the subscription
+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+ )
+ or die
+ "Timed out while waiting for apply worker to restart after altering the subscription";
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres',"INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub_3/, $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION");
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT * FROM tab_3");
+is($result, qq(1
+2), 'check that the incremental data is replicated after the publication is created');
+
# shutdown
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
v1_PG16-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchtext/x-patch; charset=US-ASCII; name=v1_PG16-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchDownload
From 910b8d66e50050574cfb98a53d6a1ed8f00ac6ec Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 1 Aug 2025 15:25:53 +0530
Subject: [PATCH v1_PG16] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 +++++-
src/test/subscription/t/024_add_drop_pub.pl | 55 ++++++++++++++++++++-
2 files changed, 68 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 32b74bb4752..d83da138081 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1737,6 +1737,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -1747,9 +1752,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/024_add_drop_pub.pl b/src/test/subscription/t/024_add_drop_pub.pl
index 8614b1b5b34..33ecfe7a415 100644
--- a/src/test/subscription/t/024_add_drop_pub.pl
+++ b/src/test/subscription/t/024_add_drop_pub.pl
@@ -1,7 +1,9 @@
# Copyright (c) 2021-2023, PostgreSQL Global Development Group
-# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
+# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION and
+# ensures that creating a publication associated with a subscription at a later
+# point of time does not break logical replication.
use strict;
use warnings;
use PostgreSQL::Test::Cluster;
@@ -80,6 +82,57 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*), min(a), max(a) FROM tab_1");
is($result, qq(20|1|10), 'check initial data is copied to subscriber');
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+my $oldpid = $node_publisher->safe_psql('postgres',
+ "SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+);
+
+# Set the subscription with a missing publication
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub SET PUBLICATION tap_pub_3");
+
+# Wait for the walsender to restart after altering the subscription
+$node_publisher->poll_query_until('postgres',
+ "SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name = 'tap_sub' AND state = 'streaming';"
+ )
+ or die
+ "Timed out while waiting for apply worker to restart after altering the subscription";
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres',"INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub_3/, $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION");
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT * FROM tab_3");
+is($result, qq(1
+2), 'check that the incremental data is replicated after the publication is created');
+
# shutdown
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
v2_PG13-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION.patchtext/x-patch; charset=US-ASCII; name=v2_PG13-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION.patchDownload
From 29ff09285362319a6938d33d14d0e84a5a778608 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 2 Aug 2025 18:46:49 +0530
Subject: [PATCH v2_PG13] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 ++++++-
src/test/subscription/t/001_rep_changes.pl | 47 ++++++++++++++++++++-
2 files changed, 60 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 3d98d60d6c4..b3bd2b92f58 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -642,6 +642,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -652,9 +657,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl
index c60ef1c4f52..311c3e0cb45 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 27;
+use Test::More tests => 28;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -390,6 +390,51 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*) FROM pg_replication_origin");
is($result, qq(0), 'check replication origin was dropped on subscriber');
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub1 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub1 CONNECTION '$publisher_connstr' PUBLICATION tap_pub1"
+);
+
+# Wait for initial table sync to finish
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub1');
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub1");
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub1/,
+ $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub1 FOR TABLE tab_3");
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(1)");
+
+$node_publisher->wait_for_catchup('tap_sub1');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab_3");
+is($result, qq(1),
+ 'check that the incremental data is replicated after the publication is created'
+);
+
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub1");
+
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
On Mon, Aug 4, 2025 at 4:08 PM vignesh C <vignesh21@gmail.com> wrote:
On Mon, 4 Aug 2025 at 09:47, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:Hi Vignesh,
Thanks for the patches.On Sat, Aug 2, 2025 at 7:10 PM vignesh C <vignesh21@gmail.com> wrote:
The backport seems to be straight forward. Please let me know if you
need my help in doing so, if we decide to backport the fix.Now that this has been reported on the back branches, we should
consider whether it's appropriate to backport the fix. Here are the
patches prepared for the back branches.PG14 and + patches do not test that DROP PUBLICATION does not disrupt
the publication. I think we need to test that as well.Currently, the test across all branches except PG13 is the same test
used in the master branch. For PG13, since there was no existing
subscription, I modified the test slightly to accommodate that. If I
handle the comment you suggest, the test in master and the backbranch
will be different. Should we keep the test similar to the master or is
it ok to address your above comment and keep it different?
IMO we should modify the test on master as well and either backpatch
both commits or backpatch after combining those two commits.
--
Best Wishes,
Ashutosh Bapat
On Fri, Aug 1, 2025 at 5:06 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Fri, Aug 1, 2025 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Now, seeing this report, it seems the customer(s) are probably okay to
skip a missing publication and let replication continue. So, we should
consider backpatching this change but it would be better if few more
people can share their opinion on this matter.Including Tomas for his opinion. Who else do you think can provide an
opinion based on experience?
I don't have any particular names in mind but Dilip and Sawada-San
names are listed as reviewers in the commit [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7c99dc587a010a0c40d72a0e435111ca7a371c02, so it would be good
to see what are their thoughts on this.
Please note that this behavior is from the time logical replication
was introduced, so we need to be a bit careful in changing the
behavior in backbranches.
--
With Regards,
Amit Kapila.
On Tue, Aug 5, 2025 at 9:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Aug 1, 2025 at 5:06 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Fri, Aug 1, 2025 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Now, seeing this report, it seems the customer(s) are probably okay to
skip a missing publication and let replication continue. So, we should
consider backpatching this change but it would be better if few more
people can share their opinion on this matter.Including Tomas for his opinion. Who else do you think can provide an
opinion based on experience?I don't have any particular names in mind but Dilip and Sawada-San
names are listed as reviewers in the commit [1], so it would be good
to see what are their thoughts on this.Please note that this behavior is from the time logical replication
was introduced, so we need to be a bit careful in changing the
behavior in backbranches.
I believe we should backpatch this fix. The old behavior doesn't seem
intentional, and IMHO users might not be relying on that behavior, but
that's just my thought and someone can come across a real world use
case where a user might be depending on that behavior? Although we
initially didn't backpatch it because it changed existing behavior and
hadn't received any complaints, a recent complaint suggests that it's
now better to improve the back branches as well.
--
Regards,
Dilip Kumar
Google
On Mon, 4 Aug 2025 at 09:47, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
Hi Vignesh,
Thanks for the patches.On Sat, Aug 2, 2025 at 7:10 PM vignesh C <vignesh21@gmail.com> wrote:
The backport seems to be straight forward. Please let me know if you
need my help in doing so, if we decide to backport the fix.Now that this has been reported on the back branches, we should
consider whether it's appropriate to backport the fix. Here are the
patches prepared for the back branches.PG14 and + patches do not test that DROP PUBLICATION does not disrupt
the publication. I think we need to test that as well.
The attached v3 version patch has the changes for the same.
Regards,
Vignesh
Attachments:
v3_PG15-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchapplication/octet-stream; name=v3_PG15-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchDownload
From c3ddf0daacc43a39653c627d2bf6d822efefc27e Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 1 Aug 2025 15:28:46 +0530
Subject: [PATCH v3_PG15] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 ++++++-
src/test/subscription/t/024_add_drop_pub.pl | 49 ++++++++++++++++++++-
2 files changed, 62 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 64f62de635f..9e427beed4f 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1773,6 +1773,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -1783,9 +1788,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/024_add_drop_pub.pl b/src/test/subscription/t/024_add_drop_pub.pl
index eaf47e66f1a..2aa21c9cb0d 100644
--- a/src/test/subscription/t/024_add_drop_pub.pl
+++ b/src/test/subscription/t/024_add_drop_pub.pl
@@ -1,7 +1,9 @@
# Copyright (c) 2021-2022, PostgreSQL Global Development Group
-# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
+# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION and
+# ensures that creating a publication associated with a subscription at a later
+# point of time does not break logical replication.
use strict;
use warnings;
use PostgreSQL::Test::Cluster;
@@ -80,6 +82,51 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*), min(a), max(a) FROM tab_1");
is($result, qq(20|1|10), 'check initial data is copied to subscriber');
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub_3"
+);
+
+# Wait for initial table sync to finish
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_3");
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub_3/,
+ $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab_3");
+is($result, qq(2),
+ 'check that the incremental data is replicated after the publication is created'
+);
+
# shutdown
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
v3_PG17-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchapplication/octet-stream; name=v3_PG17-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchDownload
From 5a537d1cc1c474bbaccc948bc716c3fc809add34 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 1 Aug 2025 14:51:26 +0530
Subject: [PATCH v3_PG17] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 ++++++-
src/test/subscription/t/024_add_drop_pub.pl | 49 ++++++++++++++++++++-
2 files changed, 62 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 99518c6b6dd..fa0fb915f15 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1741,6 +1741,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -1751,9 +1756,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/024_add_drop_pub.pl b/src/test/subscription/t/024_add_drop_pub.pl
index c0d7ffcb6b8..6694a3cf60d 100644
--- a/src/test/subscription/t/024_add_drop_pub.pl
+++ b/src/test/subscription/t/024_add_drop_pub.pl
@@ -1,7 +1,9 @@
# Copyright (c) 2021-2024, PostgreSQL Global Development Group
-# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
+# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION and
+# ensures that creating a publication associated with a subscription at a later
+# point of time does not break logical replication.
use strict;
use warnings FATAL => 'all';
use PostgreSQL::Test::Cluster;
@@ -80,6 +82,51 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*), min(a), max(a) FROM tab_1");
is($result, qq(20|1|10), 'check initial data is copied to subscriber');
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub_3"
+);
+
+# Wait for initial table sync to finish
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_3");
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub_3/,
+ $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab_3");
+is($result, qq(2),
+ 'check that the incremental data is replicated after the publication is created'
+);
+
# shutdown
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
v3_PG13-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION.patchapplication/octet-stream; name=v3_PG13-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION.patchDownload
From d18abc404f80df221ca1e0a44f80847c43851d1f Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 2 Aug 2025 18:46:49 +0530
Subject: [PATCH v3_PG13] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 ++++++-
src/test/subscription/t/001_rep_changes.pl | 47 ++++++++++++++++++++-
2 files changed, 60 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 3d98d60d6c4..b3bd2b92f58 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -642,6 +642,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -652,9 +657,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl
index c60ef1c4f52..99f05897bbf 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 27;
+use Test::More tests => 28;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -390,6 +390,51 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*) FROM pg_replication_origin");
is($result, qq(0), 'check replication origin was dropped on subscriber');
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub1 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub1 CONNECTION '$publisher_connstr' PUBLICATION tap_pub1"
+);
+
+# Wait for initial table sync to finish
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub1');
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub1");
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub1/,
+ $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub1 FOR TABLE tab_3");
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub1');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab_3");
+is($result, qq(2),
+ 'check that the incremental data is replicated after the publication is created'
+);
+
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub1");
+
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
v3_PG16-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchapplication/octet-stream; name=v3_PG16-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchDownload
From c0e734988024bcf6001d52259a197a3e2df385e4 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 1 Aug 2025 15:25:53 +0530
Subject: [PATCH v3_PG16] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 ++++++-
src/test/subscription/t/024_add_drop_pub.pl | 49 ++++++++++++++++++++-
2 files changed, 62 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 32b74bb4752..d83da138081 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1737,6 +1737,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -1747,9 +1752,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/024_add_drop_pub.pl b/src/test/subscription/t/024_add_drop_pub.pl
index 8614b1b5b34..5266fe30172 100644
--- a/src/test/subscription/t/024_add_drop_pub.pl
+++ b/src/test/subscription/t/024_add_drop_pub.pl
@@ -1,7 +1,9 @@
# Copyright (c) 2021-2023, PostgreSQL Global Development Group
-# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
+# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION and
+# ensures that creating a publication associated with a subscription at a later
+# point of time does not break logical replication.
use strict;
use warnings;
use PostgreSQL::Test::Cluster;
@@ -80,6 +82,51 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*), min(a), max(a) FROM tab_1");
is($result, qq(20|1|10), 'check initial data is copied to subscriber');
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub_3"
+);
+
+# Wait for initial table sync to finish
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_3");
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub_3/,
+ $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab_3");
+is($result, qq(2),
+ 'check that the incremental data is replicated after the publication is created'
+);
+
# shutdown
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
v3_PG14-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchapplication/octet-stream; name=v3_PG14-0001-Fix-ALTER-SUBSCRIPTION-.-SET-PUBLICATION-.-c.patchDownload
From 5729539bde15f1810b01590610fa4b2b2f01c4db Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 1 Aug 2025 22:26:06 +0530
Subject: [PATCH v3_PG14] Fix ALTER SUBSCRIPTION ... SET PUBLICATION ...
command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR: publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.
We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.
---
src/backend/replication/pgoutput/pgoutput.c | 16 +++++-
src/test/subscription/t/021_alter_sub_pub.pl | 51 +++++++++++++++++++-
2 files changed, 63 insertions(+), 4 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index a81215cff86..f463e4d6d60 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -841,6 +841,11 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
/*
* Load publications from the list of publication names.
+ *
+ * Here, we skip the publications that don't exist yet. This will allow us
+ * to silently continue the replication in the absence of a missing publication.
+ * This is required because we allow the users to create publications after they
+ * have specified the required publications at the time of replication start.
*/
static List *
LoadPublications(List *pubnames)
@@ -851,9 +856,16 @@ LoadPublications(List *pubnames)
foreach(lc, pubnames)
{
char *pubname = (char *) lfirst(lc);
- Publication *pub = GetPublicationByName(pubname, false);
+ Publication *pub = GetPublicationByName(pubname, true);
- result = lappend(result, pub);
+ if (pub)
+ result = lappend(result, pub);
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipped loading publication: %s", pubname),
+ errdetail("The publication does not exist at this point in the WAL."),
+ errhint("Create the publication if it does not exist."));
}
return result;
diff --git a/src/test/subscription/t/021_alter_sub_pub.pl b/src/test/subscription/t/021_alter_sub_pub.pl
index 4c59d44e33f..c4d0b99a66a 100644
--- a/src/test/subscription/t/021_alter_sub_pub.pl
+++ b/src/test/subscription/t/021_alter_sub_pub.pl
@@ -1,12 +1,14 @@
# Copyright (c) 2021, PostgreSQL Global Development Group
-# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
+# This test checks behaviour of ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION and
+# ensures that creating a publication associated with a subscription at a later
+# point of time does not break logical replication.
use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 3;
+use Test::More tests => 4;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -81,6 +83,51 @@ $result = $node_subscriber->safe_psql('postgres',
"SELECT count(*), min(a), max(a) FROM tab_1");
is($result, qq(20|1|10), 'check initial data is copied to subscriber');
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# Ensure that setting a missing publication to the subscription does not
+# disrupt existing logical replication. Instead, it should log a warning
+# while allowing replication to continue. Additionally, verify that replication
+# resumes after the missing publication is created for the publication table.
+
+# Create table on publisher and subscriber
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_3 (a int)");
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub_3"
+);
+
+# Wait for initial table sync to finish
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
+
+my $offset = -s $node_publisher->logfile;
+
+$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_3");
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(1)");
+
+# Verify that a warning is logged.
+$node_publisher->wait_for_log(
+ qr/WARNING: ( [A-Z0-9]+:)? skipped loading publication: tap_pub_3/,
+ $offset);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_3");
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_3 values(2)");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Verify that the insert operation gets replicated to subscriber after
+# publication is created.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab_3");
+is($result, qq(2),
+ 'check that the incremental data is replicated after the publication is created'
+);
+
# shutdown
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.43.0
On Tue, Aug 5, 2025 at 9:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Aug 1, 2025 at 5:06 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Fri, Aug 1, 2025 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Now, seeing this report, it seems the customer(s) are probably okay to
skip a missing publication and let replication continue. So, we should
consider backpatching this change but it would be better if few more
people can share their opinion on this matter.Including Tomas for his opinion. Who else do you think can provide an
opinion based on experience?I don't have any particular names in mind but Dilip and Sawada-San
names are listed as reviewers in the commit [1], so it would be good
to see what are their thoughts on this.Please note that this behavior is from the time logical replication
was introduced, so we need to be a bit careful in changing the
behavior in backbranches.
Agreed.
Only Dilip has expressed an opinion so far. Haven't heard from others,
so can't guess what their opinions are.
If we are ok backpatching it, I will review Vignesh's patches thoroughly.
--
Best Wishes,
Ashutosh Bapat
On Fri, Aug 8, 2025 at 5:19 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Tue, Aug 5, 2025 at 9:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Aug 1, 2025 at 5:06 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Fri, Aug 1, 2025 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Now, seeing this report, it seems the customer(s) are probably okay to
skip a missing publication and let replication continue. So, we should
consider backpatching this change but it would be better if few more
people can share their opinion on this matter.Including Tomas for his opinion. Who else do you think can provide an
opinion based on experience?I don't have any particular names in mind but Dilip and Sawada-San
names are listed as reviewers in the commit [1], so it would be good
to see what are their thoughts on this.Please note that this behavior is from the time logical replication
was introduced, so we need to be a bit careful in changing the
behavior in backbranches.Agreed.
Only Dilip has expressed an opinion so far. Haven't heard from others,
so can't guess what their opinions are.
Yeah, let's wait for a few more days. Even if we decide to backpatch
it, let's target the next minor release.
--
With Regards,
Amit Kapila.
On Fri, Aug 8, 2025 at 5:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Aug 8, 2025 at 5:19 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Tue, Aug 5, 2025 at 9:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Aug 1, 2025 at 5:06 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Fri, Aug 1, 2025 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Now, seeing this report, it seems the customer(s) are probably okay to
skip a missing publication and let replication continue. So, we should
consider backpatching this change but it would be better if few more
people can share their opinion on this matter.Including Tomas for his opinion. Who else do you think can provide an
opinion based on experience?I don't have any particular names in mind but Dilip and Sawada-San
names are listed as reviewers in the commit [1], so it would be good
to see what are their thoughts on this.Please note that this behavior is from the time logical replication
was introduced, so we need to be a bit careful in changing the
behavior in backbranches.Agreed.
Only Dilip has expressed an opinion so far. Haven't heard from others,
so can't guess what their opinions are.Yeah, let's wait for a few more days. Even if we decide to backpatch
it, let's target the next minor release.
I'm personally hesitant to backpatch this change. I'm not sure if
there are any users who aware of this behavior and depend on it, but
it seems to me that for users who update to a new minor version having
this change, the problem will simply change from that replication
stops due to missing publications to that replication can continue but
they will almost silently lose some changes (users often don't see
warnings in server logs). I guess dealing with the latter problem
would be more difficult.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com