Logical replication breaks: "unexpected duplicate for tablespace 0, relfilenode 2774069304"

Started by Kouber Saparevover 2 years ago4 messagesgeneral
Jump to latest
#1Kouber Saparev
kouber@gmail.com

We are using logical replication in a quite busy environment. On the
publisher side temporary tables are created and dropped all the time (due
to some Zend Entity Framework extension "optimisation"), thus bloating
heavily the system catalogs (among others).

At some point all our logical replication subscribers / replication slots
drop, because of an error:

*"could not receive data from WAL stream: ERROR: unexpected duplicate for
tablespace 0, relfilenode 2774069304"*

The table for this file node is not even included in any of the
publications we have. I've found a similar issue described [1]https://postgrespro.com/list/thread-id/2597009#TYCPR01MB83731ADE7FD7C7CF5D335BCEEDE99@TYCPR01MB8373.jpnprd01.prod.outlook.com before, so I
was wondering whether this patch is applied? Our subscriber database is
PostgreSQL 16.1 and the publisher - PostgreSQL 15.4.

What quick solution would fix the replication? Repack of the table? Reload
of the database? Killing some backends?

We rely heavily on this feature in a production environment and cannot just
leave the subscriber side out of sync.

Regards,
--
Kouber Saparev

[1]: https://postgrespro.com/list/thread-id/2597009#TYCPR01MB83731ADE7FD7C7CF5D335BCEEDE99@TYCPR01MB8373.jpnprd01.prod.outlook.com
https://postgrespro.com/list/thread-id/2597009#TYCPR01MB83731ADE7FD7C7CF5D335BCEEDE99@TYCPR01MB8373.jpnprd01.prod.outlook.com

#2Michael Paquier
michael@paquier.xyz
In reply to: Kouber Saparev (#1)
Re: Logical replication breaks: "unexpected duplicate for tablespace 0, relfilenode 2774069304"

On Fri, Dec 22, 2023 at 10:55:24AM +0200, Kouber Saparev wrote:

The table for this file node is not even included in any of the
publications we have. I've found a similar issue described [1] before, so I
was wondering whether this patch is applied? Our subscriber database is
PostgreSQL 16.1 and the publisher - PostgreSQL 15.4.

Or just this link using the community archives based on the
message-ID:
/messages/by-id/TYCPR01MB83731ADE7FD7C7CF5D335BCEEDE99@TYCPR01MB8373.jpnprd01.prod.outlook.com

What quick solution would fix the replication? Repack of the table? Reload
of the database? Killing some backends?

There may be something you could do as a short-term solution, but it
does not solve the actual root of the problem, because the error you
are seeing is not something users should be able to face.

The first problem that we have here is that we've lost track of the
patch proposed, so I have added a CF entry for now:
https://commitfest.postgresql.org/46/4720/
--
Michael

#3Kouber Saparev
kouber@gmail.com
In reply to: Michael Paquier (#2)
Re: Logical replication breaks: "unexpected duplicate for tablespace 0, relfilenode 2774069304"

На нд, 24.12.2023 г. в 3:37 Michael Paquier <michael@paquier.xyz> написа:

What quick solution would fix the replication? Repack of the table?

Reload

of the database? Killing some backends?

There may be something you could do as a short-term solution, but it
does not solve the actual root of the problem, because the error you
are seeing is not something users should be able to face.

We need to have an action plan once this happens again (which might be in
the middle of the night etc.) - i.e. how to rebuild and resync our
logically replicated tables, because trying to just enable the subscription
does not work - the same error reappears, so we have to drop all the slots,
recreate them and deal with the syncing staff. If a repack (or something
else) on the publisher side could allow us to re-enable the subscription
easily, without dropping the slots, then for the moment it will save us
from this prolonged desync/downtime situation.

The first problem that we have here is that we've lost track of the
patch proposed, so I have added a CF entry for now:
https://commitfest.postgresql.org/46/4720/

Thank you. Is there a bug report or should we file one? It looks like
something that compromises the reliability of the logical replication as a
whole.

--
Kouber

#4Michael Paquier
michael@paquier.xyz
In reply to: Kouber Saparev (#3)
Re: Logical replication breaks: "unexpected duplicate for tablespace 0, relfilenode 2774069304"

On Thu, Dec 28, 2023 at 02:03:12PM +0200, Kouber Saparev wrote:

The first problem that we have here is that we've lost track of the
patch proposed, so I have added a CF entry for now:
https://commitfest.postgresql.org/46/4720/

Thank you. Is there a bug report or should we file one? It looks like
something that compromises the reliability of the logical replication as a
whole.

Having a CF entry means that it is already tracked, so no need to do
more here for the moment. The next step would be to look at the
proposed patch in more details, and work on fixing the issue.
--
Michael