How is this possible "publication does not exist"

Started by Dave Cramerabout 6 years ago15 messages
#1Dave Cramer
davecramer@gmail.com

The publication exists but for some reason the function can't find it

SELECT * FROM pg_logical_slot_get_binary_changes('debezium', NULL,
NULL,'proto_version','1','publication_names','dbz_publication');
ERROR: publication "dbz_publication" does not exist
CONTEXT: slot "debezium", output plugin "pgoutput", in the change
callback, associated LSN 0/307D8E8
postgres=# select * from pg_publication;
pubname | pubowner | puballtables | pubinsert | pubupdate |
pubdelete | pubtruncate
-----------------+----------+--------------+-----------+-----------+-----------+-------------
dbz_publication | 10 | t | t | t | t
| t
(1 row)

postgres=# SELECT * FROM pg_logical_slot_get_binary_changes('debezium',
NULL, NULL,'proto_version','1','publication_names','dbz_publication');
ERROR: publication "dbz_publication" does not exist
CONTEXT: slot "debezium", output plugin "pgoutput", in the change
callback, associated LSN 0/307D8E8

Dave Cramer

#2Dave Cramer
davecramer@gmail.com
In reply to: Dave Cramer (#1)
Re: How is this possible "publication does not exist"

On Thu, 19 Dec 2019 at 11:59, Dave Cramer <davecramer@gmail.com> wrote:

The publication exists but for some reason the function can't find it

SELECT * FROM pg_logical_slot_get_binary_changes('debezium', NULL,
NULL,'proto_version','1','publication_names','dbz_publication');
ERROR: publication "dbz_publication" does not exist
CONTEXT: slot "debezium", output plugin "pgoutput", in the change
callback, associated LSN 0/307D8E8
postgres=# select * from pg_publication;
pubname | pubowner | puballtables | pubinsert | pubupdate |
pubdelete | pubtruncate

-----------------+----------+--------------+-----------+-----------+-----------+-------------
dbz_publication | 10 | t | t | t | t
| t
(1 row)

postgres=# SELECT * FROM pg_logical_slot_get_binary_changes('debezium',
NULL, NULL,'proto_version','1','publication_names','dbz_publication');
ERROR: publication "dbz_publication" does not exist
CONTEXT: slot "debezium", output plugin "pgoutput", in the change
callback, associated LSN 0/307D8E8

It seems that if you drop the publication on an existing slot it needs to
be recreated. Is this expected behaviour

drop publication dbz_publication ;
DROP PUBLICATION
postgres=# create publication dbz_publication for all tables;
CREATE PUBLICATION
postgres=# SELECT * FROM pg_logical_slot_get_binary_changes('debezium',
NULL, NULL,'proto_version','1','publication_names','dbz_publication');
ERROR: publication "dbz_publication" does not exist
CONTEXT: slot "debezium", output plugin "pgoutput", in the change
callback, associated LSN 0/4324180

Dave Cramer

#3Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Dave Cramer (#2)
Re: How is this possible "publication does not exist"

On 2019-12-19 19:15, Dave Cramer wrote:

It seems that if you drop the publication on an existing slot it needs
to be recreated. Is this expected behaviour

A publication is not associated with a slot. Only a subscription is
associated with a slot.

drop publication dbz_publication ;
DROP PUBLICATION
postgres=# create publication dbz_publication for all tables;
CREATE PUBLICATION
postgres=# SELECT * FROM pg_logical_slot_get_binary_changes('debezium',
NULL, NULL,'proto_version','1','publication_names','dbz_publication');
ERROR:  publication "dbz_publication" does not exist
CONTEXT:  slot "debezium", output plugin "pgoutput", in the change
callback, associated LSN 0/4324180

This must be something particular to Debezium.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Peter Eisentraut (#3)
Re: How is this possible "publication does not exist"

On Thu, Dec 19, 2019 at 07:19:56PM +0100, Peter Eisentraut wrote:

On 2019-12-19 19:15, Dave Cramer wrote:

It seems that if you drop the publication on an existing slot it
needs to be recreated. Is this expected behaviour

A publication is not associated with a slot. Only a subscription is
associated with a slot.

drop publication dbz_publication ;
DROP PUBLICATION
postgres=# create publication dbz_publication for all tables;
CREATE PUBLICATION
postgres=# SELECT * FROM
pg_logical_slot_get_binary_changes('debezium', NULL,
NULL,'proto_version','1','publication_names','dbz_publication');
ERROR: �publication "dbz_publication" does not exist
CONTEXT: �slot "debezium", output plugin "pgoutput", in the change
callback, associated LSN 0/4324180

This must be something particular to Debezium.

Yeah, I don't see this error message anywhere in our sources on 11 or
12, so perhaps debezium does something funny? It's not clear to me
where, though, as this suggests it uses the pgoutput module.

While trying to reproduce this I however ran into a related issue with
pgoutput/pg_logical_slot_get_binary_changes. If you call the function
repeatedly (~10x) you'll get an error like this:

FATAL: out of relcache_callback_list slots
CONTEXT: slot "slot", output plugin "pgoutput", in the startup callback
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

The reason is very simple - each call executes pgoutput_startup, which
does CacheRegisterRelcacheCallback in init_rel_sync_cache. And we do
this on each pg_logical_slot_get_binary_changes() call and never remove
the callbacks, so we simply run out of MAX_RELCACHE_CALLBACKS slots.

Not sure if this is a known issue/behavior, but it seems a bit annoying
and possibly related to the issue reported by Dave.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In reply to: Peter Eisentraut (#3)
Re: How is this possible "publication does not exist"

On Thu, 19 Dec 2019 19:19:56 +0100
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

On 2019-12-19 19:15, Dave Cramer wrote:

It seems that if you drop the publication on an existing slot it needs
to be recreated. Is this expected behaviour

A publication is not associated with a slot. Only a subscription is
associated with a slot.

Couldn't it be the same scenario I reported back in october? See:

Subject: Logical replication dead but synching
Date: Thu, 10 Oct 2019 11:57:52 +0200

#6Marco Slot
marco@citusdata.com
In reply to: Tomas Vondra (#4)
Re: How is this possible "publication does not exist"

I've been running into a similar issue and am a little puzzled by it,
especially since it survives restarts.

On Fri, Dec 20, 2019 at 2:39 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Yeah, I don't see this error message anywhere in our sources on 11 or
12, so perhaps debezium does something funny? It's not clear to me
where, though, as this suggests it uses the pgoutput module.

The error message comes from GetPublicationByName and the context is
added by output_plugin_error_callback in logical.c. Stack trace of
where the error occurs below.

# SELECT * FROM pg_publication;
pubname | pubowner | puballtables | pubinsert | pubupdate |
pubdelete | pubtruncate
----------------+----------+--------------+-----------+-----------+-----------+-------------
migration_pub | 10 | f | t | t | t | t
(1 row)

# SELECT * FROM pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | temporary
| active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn
----------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
migration_slot | pgoutput | logical | 13121 | postgres | f
| f | | | 17153 | 0/CDFC840 | 0/CDFC878
(1 row)

# SELECT * FROM pg_logical_slot_get_binary_changes('migration_slot',
NULL, NULL,'proto_version','1','publication_names','migration_pub');
ERROR: publication "migration_pub" does not exist
CONTEXT: slot "migration_slot", output plugin "pgoutput", in the
change callback, associated LSN 0/CDFC878

#0 errstart (elevel=elevel@entry=20,
filename=filename@entry=0x5581958a6c70 "pg_publication.c",
lineno=lineno@entry=401,
funcname=funcname@entry=0x5581958a6ea0 <__func__.24991>
"GetPublicationByName", domain=domain@entry=0x0) at elog.c:251
#1 0x00005581954771e5 in GetPublicationByName (pubname=0x558196d107a0
"migration_pub", missing_ok=missing_ok@entry=false) at
pg_publication.c:401
#2 0x00007f77ba1cd704 in LoadPublications (pubnames=<optimized out>)
at pgoutput.c:467
#3 0x00007f77ba1cd7e3 in get_rel_sync_entry
(data=data@entry=0x558196cedee8, relid=<optimized out>) at
pgoutput.c:559
#4 0x00007f77ba1cdb52 in pgoutput_change (ctx=0x558196d7b4f8,
txn=<optimized out>, relation=0x7f77ba1e67c8, change=0x558196cdbab8)
at pgoutput.c:315
#5 0x000055819566a2e6 in change_cb_wrapper (cache=<optimized out>,
txn=<optimized out>, relation=<optimized out>, change=<optimized out>)
at logical.c:747
#6 0x0000558195675785 in ReorderBufferCommit (rb=0x558196d35d38,
xid=xid@entry=17153, commit_lsn=215994160, end_lsn=<optimized out>,
commit_time=commit_time@entry=662061745906576,
origin_id=origin_id@entry=0, origin_lsn=0) at reorderbuffer.c:1592
#7 0x0000558195667407 in DecodeCommit (ctx=ctx@entry=0x558196d7b4f8,
buf=buf@entry=0x7ffd61faae60, parsed=parsed@entry=0x7ffd61faacf0,
xid=17153) at decode.c:641
#8 0x00005581956675a0 in DecodeXactOp (ctx=0x558196d7b4f8,
buf=buf@entry=0x7ffd61faae60) at decode.c:249
#9 0x00005581956684cb in LogicalDecodingProcessRecord
(ctx=ctx@entry=0x558196d7b4f8, record=<optimized out>) at decode.c:117
#10 0x000055819566c108 in pg_logical_slot_get_changes_guts
(fcinfo=0x7ffd61fab120, confirm=confirm@entry=true,
binary=binary@entry=true) at logicalfuncs.c:309
#11 0x000055819566c35d in pg_logical_slot_get_binary_changes
(fcinfo=<optimized out>) at logicalfuncs.c:391

cheers,
Marco

#7Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Marco Slot (#6)
Re: How is this possible "publication does not exist"

On 12/24/20 12:50 PM, Marco Slot wrote:

I've been running into a similar issue and am a little puzzled by it,
especially since it survives restarts.

Interesting. Which PostgreSQL version are you using? Any idea how to
reproduce it? Were there any failures right before the issue appeared?

I wonder if this might be a case of index corruption. Can you try
forcing an index scan on pg_publication?

SET enable_seqscan = false;
SET enable_bitmapscan = off;
SELECT * FROM pg_publication WHERE pubname = 'migration_pub';

Also, it might be helpful to know why get_rel_sync_entry ended up
calling LoadPublications - did we just create the entry?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Tomas Vondra (#4)
1 attachment(s)
Re: How is this possible "publication does not exist"

20 дек. 2019 г., в 06:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):

While trying to reproduce this I however ran into a related issue with
pgoutput/pg_logical_slot_get_binary_changes. If you call the function
repeatedly (~10x) you'll get an error like this:

FATAL: out of relcache_callback_list slots
CONTEXT: slot "slot", output plugin "pgoutput", in the startup callback
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

The reason is very simple - each call executes pgoutput_startup, which
does CacheRegisterRelcacheCallback in init_rel_sync_cache. And we do
this on each pg_logical_slot_get_binary_changes() call and never remove
the callbacks, so we simply run out of MAX_RELCACHE_CALLBACKS slots.

Not sure if this is a known issue/behavior, but it seems a bit annoying
and possibly related to the issue reported by Dave.

Sorry for bumping old thread.
I was involved in troubleshooting logical replication recently. And found out that it sometimes has a really annoying error reporting.
While source of the problem was allegedly in low max_replication_slots, users were getting only this error about relcache_callback_list.

Maybe we could fix this particular error by deduplicating callbacks?

Best regards, Andrey Borodin.

Attachments:

0001-Avoid-duplication-in-relcache-and-syscache-callbacks.patchapplication/octet-stream; name=0001-Avoid-duplication-in-relcache-and-syscache-callbacks.patch; x-unix-mode=0644Download
From 5ee9ae963e3a6143975f543761f51b1bc0dc5de8 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Fri, 23 Jul 2021 16:00:31 +0500
Subject: [PATCH] Avoid duplication in relcache and syscache callbacks

---
 src/backend/utils/cache/inval.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index dcfd9e8389..692ec8f04f 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -1430,17 +1430,32 @@ CacheInvalidateRelmap(Oid databaseId)
  * Yes, there's a possibility of a false match to zero, but it doesn't seem
  * worth troubling over, especially since most of the current callees just
  * flush all cached state anyway.
+ * Each callback\arg will we called only once, even if it was registered
+ * many times.
  */
 void
 CacheRegisterSyscacheCallback(int cacheid,
 							  SyscacheCallbackFunction func,
 							  Datum arg)
 {
+	int i;
 	if (cacheid < 0 || cacheid >= SysCacheSize)
 		elog(FATAL, "invalid cache ID: %d", cacheid);
 	if (syscache_callback_count >= MAX_SYSCACHE_CALLBACKS)
 		elog(FATAL, "out of syscache_callback_list slots");
 
+	for (i = 0; i < syscache_callback_count; i++)
+	{
+		if (syscache_callback_list[i].id == cacheid &&
+			syscache_callback_list[i].function == func &&
+			syscache_callback_list[i].arg == arg)
+			{
+				elog(DEBUG1, "Duplicate registartion of syscache callback");
+				/* Already registered */
+				return;
+			}
+	}
+
 	if (syscache_callback_links[cacheid] == 0)
 	{
 		/* first callback for this cache */
@@ -1472,14 +1487,28 @@ CacheRegisterSyscacheCallback(int cacheid,
  *
  * NOTE: InvalidOid will be passed if a cache reset request is received.
  * In this case the called routines should flush all cached state.
+ * Each callback\arg will we called only once, even if it was registered
+ * many times.
  */
 void
 CacheRegisterRelcacheCallback(RelcacheCallbackFunction func,
 							  Datum arg)
 {
+	int i;
 	if (relcache_callback_count >= MAX_RELCACHE_CALLBACKS)
 		elog(FATAL, "out of relcache_callback_list slots");
 
+	for (i = 0; i < relcache_callback_count; i++)
+	{
+		if (relcache_callback_list[i].function == func &&
+			relcache_callback_list[i].arg == arg)
+			{
+				/* Already registered */
+				elog(DEBUG1, "Duplicate registartion of relcache callback");
+				return;
+			}
+	}
+
 	relcache_callback_list[relcache_callback_count].function = func;
 	relcache_callback_list[relcache_callback_count].arg = arg;
 
-- 
2.24.3 (Apple Git-128)

#9Dave Cramer
davecramer@gmail.com
In reply to: Andrey Borodin (#8)
Re: How is this possible "publication does not exist"

Reviving this thread

2021-08-10 19:05:09.096 UTC [3738] LOG: logical replication apply
worker for subscription "sub_mycluster_alltables" has started
2021-08-10 19:05:09.107 UTC [3739] LOG: logical replication table
synchronization worker for subscription "sub_mycluster_alltables",
table "t_random" has started
2021-08-10 19:05:12.222 UTC [3739] LOG: logical replication table
synchronization worker for subscription "sub_mycluster_alltables",
table "t_random" has finished
2021-08-10 19:05:14.806 UTC [3738] ERROR: could not receive data from
WAL stream: ERROR: publication "sub_mycluster_alltables" does not
exist
CONTEXT: slot "sub_mycluster_alltables", output plugin
"pgoutput", in the change callback, associated LSN 0/4015DF0
2021-08-10 19:05:14.811 UTC [175] LOG: background worker "logical
replication worker" (PID 3738) exited with exit code 1

select * from pg_publication;
-[ RECORD 1 ]+------------------------
oid | 16415
pubname | sub_mycluster_alltables
pubowner | 10
puballtables | t
pubinsert | t
pubupdate | t
pubdelete | t
pubtruncate | t

select * from pg_replication_slots;
-[ RECORD 1 ]-------+--------------------------------
slot_name | mycluster_cjvq_68cf55677c_6vgcf
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 433
xmin |
catalog_xmin |
restart_lsn | 0/D000000
confirmed_flush_lsn |
-[ RECORD 2 ]-------+--------------------------------
slot_name | sub_mycluster_alltables
plugin | pgoutput
slot_type | logical
datoid | 16395
database | mycluster
temporary | f
active | t
active_pid | 8799
xmin |
catalog_xmin | 500
restart_lsn | 0/40011C0

confirmed_flush_lsn | 0/40011C0

I'm at a loss as to where to even look at this point.

Dave

#10Amit Kapila
amit.kapila16@gmail.com
In reply to: Dave Cramer (#9)
Re: How is this possible "publication does not exist"

On Wed, Aug 11, 2021 at 1:18 AM Dave Cramer <davecramer@gmail.com> wrote:

Reviving this thread

2021-08-10 19:05:09.096 UTC [3738] LOG: logical replication apply worker for subscription "sub_mycluster_alltables" has started
2021-08-10 19:05:09.107 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables", table "t_random" has started
2021-08-10 19:05:12.222 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables", table "t_random" has finished
2021-08-10 19:05:14.806 UTC [3738] ERROR: could not receive data from WAL stream: ERROR: publication "sub_mycluster_alltables" does not exist
CONTEXT: slot "sub_mycluster_alltables", output plugin "pgoutput", in the change callback, associated LSN 0/4015DF0
2021-08-10 19:05:14.811 UTC [175] LOG: background worker "logical replication worker" (PID 3738) exited with exit code 1

select * from pg_publication;
-[ RECORD 1 ]+------------------------
oid | 16415
pubname | sub_mycluster_alltables
pubowner | 10
puballtables | t
pubinsert | t
pubupdate | t
pubdelete | t
pubtruncate | t

By any chance, did you dropped and recreated this publication as
mentioned in your first email? If so, I think this can happen because
of our use of historical snapshots to consult system catalogs.

--
With Regards,
Amit Kapila.

#11Dave Cramer
davecramer@gmail.com
In reply to: Amit Kapila (#10)
Re: How is this possible "publication does not exist"

On Wed, 11 Aug 2021 at 07:24, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 11, 2021 at 1:18 AM Dave Cramer <davecramer@gmail.com> wrote:

Reviving this thread

2021-08-10 19:05:09.096 UTC [3738] LOG: logical replication apply

worker for subscription "sub_mycluster_alltables" has started

2021-08-10 19:05:09.107 UTC [3739] LOG: logical replication table

synchronization worker for subscription "sub_mycluster_alltables", table
"t_random" has started

2021-08-10 19:05:12.222 UTC [3739] LOG: logical replication table

synchronization worker for subscription "sub_mycluster_alltables", table
"t_random" has finished

2021-08-10 19:05:14.806 UTC [3738] ERROR: could not receive data from

WAL stream: ERROR: publication "sub_mycluster_alltables" does not exist

CONTEXT: slot "sub_mycluster_alltables", output plugin

"pgoutput", in the change callback, associated LSN 0/4015DF0

2021-08-10 19:05:14.811 UTC [175] LOG: background worker "logical

replication worker" (PID 3738) exited with exit code 1

select * from pg_publication;
-[ RECORD 1 ]+------------------------
oid | 16415
pubname | sub_mycluster_alltables
pubowner | 10
puballtables | t
pubinsert | t
pubupdate | t
pubdelete | t
pubtruncate | t

By any chance, did you dropped and recreated this publication as
mentioned in your first email? If so, I think this can happen because
of our use of historical snapshots to consult system catalogs.

In this case, no.

I am suspecting this error comes from pgoutput though.

Dave

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Dave Cramer (#11)
Re: How is this possible "publication does not exist"

On Wed, Aug 11, 2021 at 4:57 PM Dave Cramer <davecramer@gmail.com> wrote:

On Wed, 11 Aug 2021 at 07:24, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 11, 2021 at 1:18 AM Dave Cramer <davecramer@gmail.com> wrote:

Reviving this thread

2021-08-10 19:05:09.096 UTC [3738] LOG: logical replication apply worker for subscription "sub_mycluster_alltables" has started
2021-08-10 19:05:09.107 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables", table "t_random" has started
2021-08-10 19:05:12.222 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables", table "t_random" has finished
2021-08-10 19:05:14.806 UTC [3738] ERROR: could not receive data from WAL stream: ERROR: publication "sub_mycluster_alltables" does not exist
CONTEXT: slot "sub_mycluster_alltables", output plugin "pgoutput", in the change callback, associated LSN 0/4015DF0
2021-08-10 19:05:14.811 UTC [175] LOG: background worker "logical replication worker" (PID 3738) exited with exit code 1

select * from pg_publication;
-[ RECORD 1 ]+------------------------
oid | 16415
pubname | sub_mycluster_alltables
pubowner | 10
puballtables | t
pubinsert | t
pubupdate | t
pubdelete | t
pubtruncate | t

By any chance, did you dropped and recreated this publication as
mentioned in your first email? If so, I think this can happen because
of our use of historical snapshots to consult system catalogs.

In this case, no.

I am suspecting this error comes from pgoutput though.

I think it is and the context is generated via
output_plugin_error_callback. Is this reproducible for you and if so,
can you share a test case or some steps to reproduce this? Does this
work and suddenly start giving errors or it happens the very first
time you tried to set up publication/subscription? I think some more
details are required about your setup and steps to analyze this
problem. You might want to check publication-side logs but not sure if
get any better clue there.

--
With Regards,
Amit Kapila.

#13Dave Cramer
davecramer@gmail.com
In reply to: Amit Kapila (#12)
Re: How is this possible "publication does not exist"

On Wed, 11 Aug 2021 at 07:37, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 11, 2021 at 4:57 PM Dave Cramer <davecramer@gmail.com> wrote:

On Wed, 11 Aug 2021 at 07:24, Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Wed, Aug 11, 2021 at 1:18 AM Dave Cramer <davecramer@gmail.com>

wrote:

Reviving this thread

2021-08-10 19:05:09.096 UTC [3738] LOG: logical replication apply

worker for subscription "sub_mycluster_alltables" has started

2021-08-10 19:05:09.107 UTC [3739] LOG: logical replication table

synchronization worker for subscription "sub_mycluster_alltables", table
"t_random" has started

2021-08-10 19:05:12.222 UTC [3739] LOG: logical replication table

synchronization worker for subscription "sub_mycluster_alltables", table
"t_random" has finished

2021-08-10 19:05:14.806 UTC [3738] ERROR: could not receive data

from WAL stream: ERROR: publication "sub_mycluster_alltables" does not
exist

CONTEXT: slot "sub_mycluster_alltables", output plugin

"pgoutput", in the change callback, associated LSN 0/4015DF0

2021-08-10 19:05:14.811 UTC [175] LOG: background worker "logical

replication worker" (PID 3738) exited with exit code 1

select * from pg_publication;
-[ RECORD 1 ]+------------------------
oid | 16415
pubname | sub_mycluster_alltables
pubowner | 10
puballtables | t
pubinsert | t
pubupdate | t
pubdelete | t
pubtruncate | t

By any chance, did you dropped and recreated this publication as
mentioned in your first email? If so, I think this can happen because
of our use of historical snapshots to consult system catalogs.

In this case, no.

I am suspecting this error comes from pgoutput though.

I think it is and the context is generated via
output_plugin_error_callback. Is this reproducible for you and if so,
can you share a test case or some steps to reproduce this? Does this
work and suddenly start giving errors or it happens the very first
time you tried to set up publication/subscription? I think some more
details are required about your setup and steps to analyze this
problem. You might want to check publication-side logs but not sure if
get any better clue there.

In this case I am the messenger. I will try to get a repeatable test case.

Dave

#14Marco Slot
marco.slot@gmail.com
In reply to: Amit Kapila (#12)
Re: How is this possible "publication does not exist"

On Wed, Aug 11, 2021 at 1:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think it is and the context is generated via
output_plugin_error_callback. Is this reproducible for you and if so,
can you share a test case or some steps to reproduce this? Does this
work and suddenly start giving errors or it happens the very first
time you tried to set up publication/subscription? I think some more
details are required about your setup and steps to analyze this
problem. You might want to check publication-side logs but not sure if
get any better clue there.

This seems to regularly reproduce the issue on PostgreSQL 14.4:

drop subscription if exists local_sub;
drop publication if exists local_pub;
drop table if exists local;

select pg_create_logical_replication_slot('test','pgoutput');
create table local (x int, y int);
insert into local values (1,2);
create publication local_pub for table local;
create subscription local_sub connection 'host=localhost port=5432'
publication local_pub with (create_slot = false, slot_name = 'test',
copy_data = false);

The log on the publisher then repeatedly shows:
2022-08-04 10:46:56.140 CEST [12785] ERROR: publication "local_pub"
does not exist
2022-08-04 10:46:56.140 CEST [12785] CONTEXT: slot "test", output
plugin "pgoutput", in the change callback, associated LSN 8/6C01A270
2022-08-04 10:46:56.140 CEST [12785] STATEMENT: START_REPLICATION
SLOT "test" LOGICAL 0/0 (proto_version '2', publication_names
'"local_pub"')

(fails in the same way when setting up the subscription on a different node)

The local_pub does appear in pg_publication, but it seems a bit like
the change_cb is using an old snapshot when reading from the catalog
in GetPublicationByName.

cheers,
Marco

#15Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Marco Slot (#14)
Re: How is this possible "publication does not exist"

At Thu, 4 Aug 2022 10:56:45 +0200, Marco Slot <marco.slot@gmail.com> wrote in

On Wed, Aug 11, 2021 at 1:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think it is and the context is generated via
output_plugin_error_callback. Is this reproducible for you and if so,
can you share a test case or some steps to reproduce this? Does this
work and suddenly start giving errors or it happens the very first
time you tried to set up publication/subscription? I think some more
details are required about your setup and steps to analyze this
problem. You might want to check publication-side logs but not sure if
get any better clue there.

This seems to regularly reproduce the issue on PostgreSQL 14.4:

drop subscription if exists local_sub;
drop publication if exists local_pub;
drop table if exists local;

select pg_create_logical_replication_slot('test','pgoutput');
create table local (x int, y int);
insert into local values (1,2);
create publication local_pub for table local;
create subscription local_sub connection 'host=localhost port=5432'
publication local_pub with (create_slot = false, slot_name = 'test',
copy_data = false);

The local_pub does appear in pg_publication, but it seems a bit like
the change_cb is using an old snapshot when reading from the catalog
in GetPublicationByName.

Yes. So the slot should be created *after* the publication is
created. A discussion [1]/messages/by-id/CAA4eK1LwQAEPJMTwVe3UYODeNMkK2QHf-WZF5aXp5ZcjDRcrUA@mail.gmail.com was held on allowing missing publications
in such a case but it has not been rached a conclusion..

[1]: /messages/by-id/CAA4eK1LwQAEPJMTwVe3UYODeNMkK2QHf-WZF5aXp5ZcjDRcrUA@mail.gmail.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center