Replication slot is not able to sync up

Started by Suraj Kharage10 months ago40 messages

suraj.kharage@enterprisedb.com

10 months ago

Hi,

Noticed below behaviour where replication slot is not able to sync up if
any catalog changes happened after the creation.
Getting below LOG when trying to sync replication slots using
pg_sync_replication_slots() function.
The newly created slot does not appear on the standby after this LOG -

2025-05-23 07:57:12.453 IST [4178805] *LOG: could not synchronize
replication slot "failover_slot" because remote slot precedes local slot*
2025-05-23 07:57:12.453 IST [4178805] *DETAIL: The remote slot has LSN
0/B000060 and catalog xmin 764, but the local slot has LSN 0/B000060 and
catalog xmin 765.*
2025-05-23 07:57:12.453 IST [4178805] STATEMENT: SELECT
pg_sync_replication_slots();

Below is the test case tried on latest master branch -
=========
- Create the Primary and start the server.
wal_level = logical

- Create the physical slot on Primary.
SELECT pg_create_physical_replication_slot('slot1');

- Setup the standby using pg_basebackup.
bin/pg_basebackup -D data1 -p 5418 -d "dbname=postgres" -R

primary_slot_name = 'slot1'
hot_standby_feedback = on
port = 5419

-- Start the standby.

-- Connect to Primary and create a logical replication slot.
SELECT pg_create_logical_replication_slot('failover_slot', 'pgoutput',
false, false, true);

-- Perform some catalog changes. e.g.:
create table abc(id int);
postgres@4179034=#select xmin from pg_class where relname='abc';
xmin
------
764
(1 row)

-- Connect to the standby and try to sync the replication slots.
SELECT pg_sync_replication_slots();

In the logfile, can see below LOG -
2025-05-23 07:57:12.453 IST [4178805] LOG: could not synchronize
replication slot "failover_slot" because remote slot precedes local slot
2025-05-23 07:57:12.453 IST [4178805] DETAIL: The remote slot has LSN
0/B000060 and catalog xmin 764, but the local slot has LSN 0/B000060 and
catalog xmin 765.
2025-05-23 07:57:12.453 IST [4178805] STATEMENT: SELECT
pg_sync_replication_slots();

select xmin,* from pg_replication_slots ;
no rows

Is there any way to sync up the replication slot after the catalog changes
have been made after creation?
--

Thanks & Regards,
Suraj kharage,

enterprisedb.com <https://www.enterprisedb.com/>

Amit Kapila

amit.kapila16@gmail.com

10 months ago

In reply to: Suraj Kharage (#1)

Re: Replication slot is not able to sync up

On Fri, May 23, 2025 at 9:57 AM Suraj Kharage <
suraj.kharage@enterprisedb.com> wrote:

Hi,

Noticed below behaviour where replication slot is not able to sync up if
any catalog changes happened after the creation.
Getting below LOG when trying to sync replication slots using
pg_sync_replication_slots() function.
The newly created slot does not appear on the standby after this LOG -

2025-05-23 07:57:12.453 IST [4178805] *LOG: could not synchronize
replication slot "failover_slot" because remote slot precedes local slot*
2025-05-23 07:57:12.453 IST [4178805] *DETAIL: The remote slot has LSN
0/B000060 and catalog xmin 764, but the local slot has LSN 0/B000060 and
catalog xmin 765.*
2025-05-23 07:57:12.453 IST [4178805] STATEMENT: SELECT
pg_sync_replication_slots();

Below is the test case tried on latest master branch -
=========
- Create the Primary and start the server.
wal_level = logical

- Create the physical slot on Primary.
SELECT pg_create_physical_replication_slot('slot1');

- Setup the standby using pg_basebackup.
bin/pg_basebackup -D data1 -p 5418 -d "dbname=postgres" -R

primary_slot_name = 'slot1'
hot_standby_feedback = on
port = 5419

-- Start the standby.

-- Connect to Primary and create a logical replication slot.
SELECT pg_create_logical_replication_slot('failover_slot', 'pgoutput',
false, false, true);

postgres@4177929=#select xmin,* from pg_replication_slots ;
xmin | slot_name | plugin | slot_type | datoid | database |
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn | wal_status | safe_wal_size | two_phas
e | two_phase_at | inactive_since | conflicting |
invalidation_reason | failover | synced

------+---------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+---------

--+--------------+----------------------------------+-------------+---------------------+----------+--------
765 | slot1 | | physical | | | f
| t | 4177898 | 765 | | 0/B018B00 |
| reserved | | f
| | | |
| f | f
| failover_slot | pgoutput | logical | 5 | postgres | f
| f | | | 764 | 0/B000060 | 0/B000098
| reserved | | f
| | 2025-05-23 07:55:31.277584+05:30 | f |
| t | f
(2 rows)

-- Perform some catalog changes. e.g.:
create table abc(id int);
postgres@4179034=#select xmin from pg_class where relname='abc';
xmin
------
764
(1 row)

-- Connect to the standby and try to sync the replication slots.
SELECT pg_sync_replication_slots();

In the logfile, can see below LOG -
2025-05-23 07:57:12.453 IST [4178805] LOG: could not synchronize
replication slot "failover_slot" because remote slot precedes local slot
2025-05-23 07:57:12.453 IST [4178805] DETAIL: The remote slot has LSN
0/B000060 and catalog xmin 764, but the local slot has LSN 0/B000060 and
catalog xmin 765.
2025-05-23 07:57:12.453 IST [4178805] STATEMENT: SELECT
pg_sync_replication_slots();

select xmin,* from pg_replication_slots ;
no rows

Primary -
postgres@4179034=#select xmin,* from pg_replication_slots ;
xmin | slot_name | plugin | slot_type | datoid | database |
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn | wal_status | safe_wal_size | two_phas
e | two_phase_at | inactive_since | conflicting |
invalidation_reason | failover | synced

------+---------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+---------

--+--------------+----------------------------------+-------------+---------------------+----------+--------
765 | slot1 | | physical | | | f
| t | 4177898 | 765 | | 0/B018C08 |
| reserved | | f
| | | |
| f | f
| failover_slot | pgoutput | logical | 5 | postgres | f
| f | | | 764 | 0/B000060 | 0/B000098
| reserved | | f
| | 2025-05-23 07:55:31.277584+05:30 | f |
| t | f
(2 rows)
=========

Is there any way to sync up the replication slot after the catalog changes
have been made after creation?

The remote_slot (slot on primary) should be advanced before you invoke
sync_slot. Can you do pg_logical_slot_get_changes() API before performing
sync? You can check the xmin of the logical slot after get_changes to
ensure that xmin has moved to 765 in your case.

--
With Regards,
Amit Kapila.

Robert Haas

robertmhaas@gmail.com

10 months ago

In reply to: Amit Kapila (#2)

Re: Replication slot is not able to sync up

On Fri, May 23, 2025 at 12:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

The remote_slot (slot on primary) should be advanced before you invoke sync_slot. Can you do pg_logical_slot_get_changes() API before performing sync? You can check the xmin of the logical slot after get_changes to ensure that xmin has moved to 765 in your case.

I'm fairly dismayed by this example. I hope I'm misunderstanding
something, because otherwise I have difficulty understanding how we
thought it was OK to ship this feature in this condition.

At the moment that pg_sync_replication_slots() is executed, a slot
named failover_slot exists on only one of the two servers. How can you
justify emitting an error message complaining that "remote slot
precedes local slot"? There's only one slot! I understand that, under
the hood, we probably created an additional slot on the standby and
then tried to fast-forward it, and this error occurred in the second
step. But a user shouldn't have to understand those kinds of internal
implementation details to make sense of the error message. If the
problem is that we're not able to create a slot on the standby at an
old enough LSN or XID position to permit its use with the
corresponding slot on the master, it should be reported that way.

It also seems like having to execute a manual step like
pg_logical_slot_get_changes() in order for things to work is really
missing the point of the feature. I mean, it seems like the intention
of the feature was that someone can just periodically call
pg_sync_replication_slots() on each standby and the right things will
happen -- creating slots or fast-forwarding them or dropping them, as
required. But if that sometimes requires manual fiddling like having
to consume changes from a slot then basically the feature just doesn't
work, because now the user will have to somehow understand when that
is required and what they need to do to fix it. This doesn't even seem
like a particularly obscure case.

To be honest, even after spending quite a bit of time on this, I still
don't really understand what's happening with the xmins here. Just
after creating the logical slot on the primary, it has xmin 764 on one
slot and xmin 765 on the other, and I don't understand why that's the
case, nor why the extra DDL command is needed to trigger the problem.
But I also can't shake the feeling that I shouldn't *need* to
understand that stuff to use the feature. Isn't that the whole point?

--
Robert Haas
EDB: http://www.enterprisedb.com

Amit Kapila

amit.kapila16@gmail.com

10 months ago

In reply to: Robert Haas (#3)

Re: Replication slot is not able to sync up

On Fri, May 23, 2025 at 11:25 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, May 23, 2025 at 12:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

The remote_slot (slot on primary) should be advanced before you invoke sync_slot. Can you do pg_logical_slot_get_changes() API before performing sync? You can check the xmin of the logical slot after get_changes to ensure that xmin has moved to 765 in your case.

I'm fairly dismayed by this example. I hope I'm misunderstanding
something, because otherwise I have difficulty understanding how we
thought it was OK to ship this feature in this condition.

At the moment that pg_sync_replication_slots() is executed, a slot
named failover_slot exists on only one of the two servers. How can you
justify emitting an error message complaining that "remote slot
precedes local slot"? There's only one slot! I understand that, under
the hood, we probably created an additional slot on the standby and
then tried to fast-forward it, and this error occurred in the second
step. But a user shouldn't have to understand those kinds of internal
implementation details to make sense of the error message.

Fair point.

If the

problem is that we're not able to create a slot on the standby at an
old enough LSN or XID position to permit its use with the
corresponding slot on the master, it should be reported that way.

That is the case, and we should improve the LOG message. However, let
me first explain to you what is going on here. This happens because
the DDL is replicated before the pg_sync_replication_slots() call, due
to which the locally created slot on the standby will acquire an xmin
later (765) than the slot on the master (764). So, we can't sync in
that particular sync cycle because otherwise, we can't guarantee the
required rows will be present on the standby later when one tries to
use the slot.

IIUC, the users will use this feature where master (publisher) and
subscriber nodes are doing logical replication, and we want to keep
the corresponding logical slot's copy on the physical standby. So that
if the master goes down, then the subscriber can continue logical
replication from the physical standby. In such a setup, users won't
need to bother with such LOGs because even if we are not able to sync
the logical slot in a particular sync cycle and the LOG appears, we
should be able to sync in the next cycle.

In the case presented here, the logical slot is expected to keep
forwarding, and in the consecutive sync cycle, the sync should be
successful. Users using logical decoding APIs should also be aware
that if due for some reason, the logical slot is not moving forward,
the master/publisher node will start accumulating dead rows and WAL,
which can create bigger problems.

--
With Regards,
Amit Kapila.

shveta malik

shveta.malik@gmail.com

10 months ago

In reply to: Amit Kapila (#4)

Re: Replication slot is not able to sync up

On Sat, May 24, 2025 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

If the

problem is that we're not able to create a slot on the standby at an
old enough LSN or XID position to permit its use with the
corresponding slot on the master, it should be reported that way.

That is the case, and we should improve the LOG message.

Agree that log messages need improvement. Please find the patch
attached for the same. I also intend to update the docs in this area
for users to understand this feature better, and will work on that
soon.

thanks
Shveta

shveta malik

shveta.malik@gmail.com

10 months ago

In reply to: shveta malik (#5)

Re: Replication slot is not able to sync up

On Mon, May 26, 2025 at 12:02 PM shveta malik <shveta.malik@gmail.com> wrote:

Agree that log messages need improvement. Please find the patch
attached for the same. I also intend to update the docs in this area
for users to understand this feature better, and will work on that
soon.

PFA the patch with doc changes as well. The doc explains the need of
pg_logical_slot_get_changes() for a particular scenario.

Also attached the script to show how this setup works. When the
replication slot is being actively consumed on primary, we do not
observe that particular LOG (could not synchronize replication) on
standby and synchronization proceeds without any manual intervention.

Thanks Nisha for the script.

thanks
Shveta

Masahiko Sawada

sawada.mshk@gmail.com

10 months ago

In reply to: Amit Kapila (#4)

Re: Replication slot is not able to sync up

On Fri, May 23, 2025 at 10:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

In the case presented here, the logical slot is expected to keep
forwarding, and in the consecutive sync cycle, the sync should be
successful. Users using logical decoding APIs should also be aware
that if due for some reason, the logical slot is not moving forward,
the master/publisher node will start accumulating dead rows and WAL,
which can create bigger problems.

I've tried this case and am concerned that the slot synchronization
using pg_sync_replication_slots() would never succeed while the
primary keeps getting write transactions. Even if the user manually
consumes changes on the primary, the primary server keeps advancing
its XID in the meanwhile. On the standby, we ensure that the
TransamVariables->nextXid is beyond the XID of WAL record that it's
going to apply so the xmin horizon calculated by
GetOldestSafeDecodingTransactionId() ends up always being higher than
the slot's catalog_xmin on the primary. We get the log message "could
not synchronize replication slot "s" because remote slot precedes
local slot" and cleanup the slot on the standby at the end of
pg_sync_replication_slots().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

10 months ago

In reply to: Masahiko Sawada (#7)

RE: Replication slot is not able to sync up

On Wed, May 28, 2025 at 2:09 AM Masahiko Sawada wrote:

On Fri, May 23, 2025 at 10:07 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

In the case presented here, the logical slot is expected to keep
forwarding, and in the consecutive sync cycle, the sync should be
successful. Users using logical decoding APIs should also be aware
that if due for some reason, the logical slot is not moving forward,
the master/publisher node will start accumulating dead rows and WAL,
which can create bigger problems.

I've tried this case and am concerned that the slot synchronization using
pg_sync_replication_slots() would never succeed while the primary keeps
getting write transactions. Even if the user manually consumes changes on the
primary, the primary server keeps advancing its XID in the meanwhile. On the
standby, we ensure that the
TransamVariables->nextXid is beyond the XID of WAL record that it's
going to apply so the xmin horizon calculated by
GetOldestSafeDecodingTransactionId() ends up always being higher than the
slot's catalog_xmin on the primary. We get the log message "could not
synchronize replication slot "s" because remote slot precedes local slot" and
cleanup the slot on the standby at the end of pg_sync_replication_slots().

I think the issue occurs because unlike the slotsync worker, the SQL API
removes temporary slots when the function ends, so it cannot hold back the
standby's catalog_xmin. If transactions on the primary keep advancing xids, the
source slot's catalog_xmin on the primary fails to catch up with the standby's
nextXid, causing sync failure.

We chose this behavior because we could not predict when (or if) the SQL
function might be executed again, and the creating session might persist after
promotion. Without automatic cleanup, this could lead to temporary slots being
retained for a longer time.

This only affects the initial sync when creating a new slot on the standby.
Once the slot exists, the standby's catalog_xmin stabilizes, preventing the
issue in subsequent syncs.

I think the SQL API was mainly intended for testing and debugging purposes
where controlled sync operations are useful. For production use, the slotsync
worker (with sync_replication_slots=on) is recommended because it automatically
handles this problem and requires minimal manual intervention. But to avoid
confusion, I think we should clearly document this distinction.

Best Regards,
Hou zj

Masahiko Sawada

sawada.mshk@gmail.com

10 months ago

In reply to: Zhijie Hou (Fujitsu) (#8)

Re: Replication slot is not able to sync up

On Tue, May 27, 2025 at 9:15 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Wed, May 28, 2025 at 2:09 AM Masahiko Sawada wrote:

On Fri, May 23, 2025 at 10:07 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

In the case presented here, the logical slot is expected to keep
forwarding, and in the consecutive sync cycle, the sync should be
successful. Users using logical decoding APIs should also be aware
that if due for some reason, the logical slot is not moving forward,
the master/publisher node will start accumulating dead rows and WAL,
which can create bigger problems.

I've tried this case and am concerned that the slot synchronization using
pg_sync_replication_slots() would never succeed while the primary keeps
getting write transactions. Even if the user manually consumes changes on the
primary, the primary server keeps advancing its XID in the meanwhile. On the
standby, we ensure that the
TransamVariables->nextXid is beyond the XID of WAL record that it's
going to apply so the xmin horizon calculated by
GetOldestSafeDecodingTransactionId() ends up always being higher than the
slot's catalog_xmin on the primary. We get the log message "could not
synchronize replication slot "s" because remote slot precedes local slot" and
cleanup the slot on the standby at the end of pg_sync_replication_slots().

I think the issue occurs because unlike the slotsync worker, the SQL API
removes temporary slots when the function ends, so it cannot hold back the
standby's catalog_xmin. If transactions on the primary keep advancing xids, the
source slot's catalog_xmin on the primary fails to catch up with the standby's
nextXid, causing sync failure.

Agreed with this analysis.

This only affects the initial sync when creating a new slot on the standby.
Once the slot exists, the standby's catalog_xmin stabilizes, preventing the
issue in subsequent syncs.

Right. I think this is an area where we can improve, if there is a
real use case.

I think the SQL API was mainly intended for testing and debugging purposes
where controlled sync operations are useful. For production use, the slotsync
worker (with sync_replication_slots=on) is recommended because it automatically
handles this problem and requires minimal manual intervention. But to avoid
confusion, I think we should clearly document this distinction.

I didn't know it was intended for testing and debugging purposes so
clearilying it in the documentation would be a good idea. Also, I
agree that using the slotsync worker is the primary usage of this
feature. I'm interested in whether there is a use case where the SQL
API is more preferable. If there is, we can improve the SQL API part,
especially the first synchronization part, for v19 or later.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#10

shveta malik

shveta.malik@gmail.com

10 months ago

In reply to: Masahiko Sawada (#9)

Re: Replication slot is not able to sync up

On Wed, May 28, 2025 at 11:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I didn't know it was intended for testing and debugging purposes so
clearilying it in the documentation would be a good idea.

I have added the suggested docs in v3.

thanks
Shveta

#11

Robert Haas

robertmhaas@gmail.com

10 months ago

In reply to: Zhijie Hou (Fujitsu) (#8)

Re: Replication slot is not able to sync up

On Wed, May 28, 2025 at 12:15 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

I think the SQL API was mainly intended for testing and debugging purposes
where controlled sync operations are useful. For production use, the slotsync
worker (with sync_replication_slots=on) is recommended because it automatically
handles this problem and requires minimal manual intervention. But to avoid
confusion, I think we should clearly document this distinction.

If this analysis is correct, this should never have been committed, at
least not in this form. When we ship something, it needs to work.
Testing and debugging facilities are best placed in src/test/modules
or in contrib; if for some reason they really need to be in
src/backend, then they had better be clearly documented as such.

What really annoys me about this is that the function gives every
superficial impression of being something you could actually use. Why
wouldn't a user believe that if they periodically connect and run
pg_sync_replication_slots(), things will be OK? I can certainly
imagine a user *wanting* that to work. I'd like that to work. But it
seems like either it's impossible for some reason that isn't clear to
me, and we just went ahead and shipped it in a non-working state
anyway, or it is possible to make it work and we didn't do the
necessary engineering before something got committed. Either way,
that's really disappointing.

I think the issue occurs because unlike the slotsync worker, the SQL API
removes temporary slots when the function ends, so it cannot hold back the
standby's catalog_xmin. If transactions on the primary keep advancing xids, the
source slot's catalog_xmin on the primary fails to catch up with the standby's
nextXid, causing sync failure.

I still don't understand how this problem arises in the first place.
It seems like you're describing a situation where we need to prevent
the standby from getting ahead of the primary, but that should be
impossible by definition.

--
Robert Haas
EDB: http://www.enterprisedb.com

#12

Amit Kapila

amit.kapila16@gmail.com

9 months ago

In reply to: Robert Haas (#11)

Re: Replication slot is not able to sync up

On Thu, May 29, 2025 at 6:01 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, May 28, 2025 at 12:15 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

I think the SQL API was mainly intended for testing and debugging purposes
where controlled sync operations are useful. For production use, the slotsync
worker (with sync_replication_slots=on) is recommended because it automatically
handles this problem and requires minimal manual intervention. But to avoid
confusion, I think we should clearly document this distinction.

If this analysis is correct, this should never have been committed, at
least not in this form. When we ship something, it needs to work.
Testing and debugging facilities are best placed in src/test/modules
or in contrib; if for some reason they really need to be in
src/backend, then they had better be clearly documented as such.

What really annoys me about this is that the function gives every
superficial impression of being something you could actually use. Why
wouldn't a user believe that if they periodically connect and run
pg_sync_replication_slots(), things will be OK? I can certainly
imagine a user *wanting* that to work. I'd like that to work. But it
seems like either it's impossible for some reason that isn't clear to
me, and we just went ahead and shipped it in a non-working state
anyway, or it is possible to make it work and we didn't do the
necessary engineering before something got committed. Either way,
that's really disappointing.

I think the issue occurs because unlike the slotsync worker, the SQL API
removes temporary slots when the function ends, so it cannot hold back the
standby's catalog_xmin. If transactions on the primary keep advancing xids, the
source slot's catalog_xmin on the primary fails to catch up with the standby's
nextXid, causing sync failure.

I still don't understand how this problem arises in the first place.
It seems like you're describing a situation where we need to prevent
the standby from getting ahead of the primary, but that should be
impossible by definition.

The reason is that we do not allow creating a synced slot if the
required WAL or catalog rows for this slot have been removed or are at
risk of removal. The way we achieve it is that during the first
sync_slot call, either via slotsync worker or API, we create a
temporary slot on the standby with xmin pointed to the safest possible
xmin (catalog_xmin) on standby computed by
GetOldestSafeDecodingTransactionId() and WAL (restart_lsn) pointed to
by the oldest WAL present on standby. Now, if the source slot's (slot
on primary) corresponding location/xmin are prior to the location/xmin
on the standby then we can't sync the slot immediately because there
is no guarantee that required resources (WAL/catalog_rows) will be
available when we try to use the synced slot after promotion. The
slotsync worker will keep retrying to sync the slot and will
eventually succeed once the source slot's values are safe to be synced
to the standby. Now, with API, we didn't implement this retry logic
due to which we see the behaviour currently reported. Note that once
the first time sync is successful, the consecutive times, even the
API, should work similar to the worker.

I agree that the current use of API is limited, such that one can use
it in a controlled environment (e.g., the first time sync happens
before other operations on primary), or to debug this functionality,
or to write tests. It is not clear to me why someone would not use the
built-in functionality to sync slots and prefer this API. But going
forward (as we see people would like to use this API to sync slots),
it is not that difficult to improve this API to match its behaviour
with the built-in worker for initial/first sync.

I see that we separately document functions [1]https://www.postgresql.org/docs/current/functions-textsearch.html#TEXTSEARCH-FUNCTIONS-DEBUG-TABLE used for
development/debug, and this API could be documented in that way.

[1]: https://www.postgresql.org/docs/current/functions-textsearch.html#TEXTSEARCH-FUNCTIONS-DEBUG-TABLE

--
With Regards,
Amit Kapila.

#13

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

9 months ago

In reply to: Masahiko Sawada (#7)

RE: Replication slot is not able to sync up

On Wed, May 28, 2025 at 2:09 AM Masahiko Sawada wrote:

On Fri, May 23, 2025 at 10:07 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

In the case presented here, the logical slot is expected to keep
forwarding, and in the consecutive sync cycle, the sync should be
successful. Users using logical decoding APIs should also be aware
that if due for some reason, the logical slot is not moving forward,
the master/publisher node will start accumulating dead rows and WAL,
which can create bigger problems.

I've tried this case and am concerned that the slot synchronization using
pg_sync_replication_slots() would never succeed while the primary keeps
getting write transactions. Even if the user manually consumes changes on the
primary, the primary server keeps advancing its XID in the meanwhile. On the
standby, we ensure that the
TransamVariables->nextXid is beyond the XID of WAL record that it's
going to apply so the xmin horizon calculated by
GetOldestSafeDecodingTransactionId() ends up always being higher than the
slot's catalog_xmin on the primary. We get the log message "could not
synchronize replication slot "s" because remote slot precedes local slot" and
cleanup the slot on the standby at the end of pg_sync_replication_slots().

To improve this workload scenario, we can modify pg_sync_replication_slots() to
wait for the primary slot to advance to a suitable position before completing
synchronization and removing the temporary slot. This would allow the sync to
complete as soon as the primary slot advances, whether through
pg_logical_xx_get_changes() or other ways.

I've created a POC (attached) that currently waits indefinitely for the remote
slot to catch up. We could later add a timeout parameter to control maximum
wait time if this approach seems acceptable.

I tested that, when pgbench TPC-B is running on the primary, calling
pg_sync_replication_slots() on the standby correctly blocks until I advance the
primary slot position by calling pg_logical_xx_get_changes().

if the basic idea sounds reasonable then I can start a separate
thread to extend this API. Thoughts ?

Best Regards,
Hou zj

#14

Amul Sul

sulamul@gmail.com

9 months ago

In reply to: Zhijie Hou (Fujitsu) (#13)

Re: Replication slot is not able to sync up

On Fri, May 30, 2025 at 3:38 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Wed, May 28, 2025 at 2:09 AM Masahiko Sawada wrote:

On Fri, May 23, 2025 at 10:07 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

In the case presented here, the logical slot is expected to keep
forwarding, and in the consecutive sync cycle, the sync should be
successful. Users using logical decoding APIs should also be aware
that if due for some reason, the logical slot is not moving forward,
the master/publisher node will start accumulating dead rows and WAL,
which can create bigger problems.

I've tried this case and am concerned that the slot synchronization using
pg_sync_replication_slots() would never succeed while the primary keeps
getting write transactions. Even if the user manually consumes changes on the
primary, the primary server keeps advancing its XID in the meanwhile. On the
standby, we ensure that the
TransamVariables->nextXid is beyond the XID of WAL record that it's
going to apply so the xmin horizon calculated by
GetOldestSafeDecodingTransactionId() ends up always being higher than the
slot's catalog_xmin on the primary. We get the log message "could not
synchronize replication slot "s" because remote slot precedes local slot" and
cleanup the slot on the standby at the end of pg_sync_replication_slots().

To improve this workload scenario, we can modify pg_sync_replication_slots() to
wait for the primary slot to advance to a suitable position before completing
synchronization and removing the temporary slot. This would allow the sync to
complete as soon as the primary slot advances, whether through
pg_logical_xx_get_changes() or other ways.

I've created a POC (attached) that currently waits indefinitely for the remote
slot to catch up. We could later add a timeout parameter to control maximum
wait time if this approach seems acceptable.

Quick question -- due to my limited understanding of this area: why
can't we perform an action similar to pg_logical_slot_get_changes()
implicitly from pg_sync_replication_slots()? Would there be any
implications of doing so?

Regards,
Amul

#15

Amit Kapila

amit.kapila16@gmail.com

9 months ago

In reply to: Amul Sul (#14)

Re: Replication slot is not able to sync up

On Fri, May 30, 2025 at 4:05 PM Amul Sul <sulamul@gmail.com> wrote:

Quick question -- due to my limited understanding of this area: why
can't we perform an action similar to pg_logical_slot_get_changes()
implicitly from pg_sync_replication_slots()? Would there be any
implications of doing so?

Yes, there would be implications if we did it that way. It would mean
that the consumer of the slot may not process those changes (for which
sync_slot API has done the get_changes) and send it to the client.
Consider a publisher-subscriber and physical standby setup. In this
setup, the subscriber creates a logical slot corresponding to the
subscription on the publisher. Now, the publisher process changes and
sends it to the subscriber; then the slot is advanced (both its xmin
and WAL locations) once the corresponding changes are sent to the
client.

If we allow pg_sync_replication_slots() to do
pg_logical_slot_get_changes or equivalent in some way, then we may end
up advancing the slot without sending the changes to the subscriber,
which would be considered a data loss for the subscriber.

I have explained in terms of built-in logical replication, but the
external plugins using these APIs (pg_logical_*) should be doing
something similar to process the changes and advance the slot.

Does this answer your question and make sense to you?

--
With Regards,
Amit Kapila.

#16

Amul Sul

sulamul@gmail.com

9 months ago

In reply to: Amit Kapila (#15)

Re: Replication slot is not able to sync up

On Fri, May 30, 2025 at 4:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, May 30, 2025 at 4:05 PM Amul Sul <sulamul@gmail.com> wrote:

Quick question -- due to my limited understanding of this area: why
can't we perform an action similar to pg_logical_slot_get_changes()
implicitly from pg_sync_replication_slots()? Would there be any
implications of doing so?

Yes, there would be implications if we did it that way. It would mean
that the consumer of the slot may not process those changes (for which
sync_slot API has done the get_changes) and send it to the client.
Consider a publisher-subscriber and physical standby setup. In this
setup, the subscriber creates a logical slot corresponding to the
subscription on the publisher. Now, the publisher process changes and
sends it to the subscriber; then the slot is advanced (both its xmin
and WAL locations) once the corresponding changes are sent to the
client.

If we allow pg_sync_replication_slots() to do
pg_logical_slot_get_changes or equivalent in some way, then we may end
up advancing the slot without sending the changes to the subscriber,
which would be considered a data loss for the subscriber.

I have explained in terms of built-in logical replication, but the
external plugins using these APIs (pg_logical_*) should be doing
something similar to process the changes and advance the slot.

Does this answer your question and make sense to you?

Yes, understood. Thank you!

Regards,
Amul

#17

Robert Haas

robertmhaas@gmail.com

9 months ago

In reply to: Zhijie Hou (Fujitsu) (#13)

Re: Replication slot is not able to sync up

On Fri, May 30, 2025 at 6:08 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

To improve this workload scenario, we can modify pg_sync_replication_slots() to
wait for the primary slot to advance to a suitable position before completing
synchronization and removing the temporary slot. This would allow the sync to
complete as soon as the primary slot advances, whether through
pg_logical_xx_get_changes() or other ways.

My understanding of this area is limited, but this sounds potentially
promising to me. The current approach seems very timing-dependent.
Depending on the state of the primary vs. the state of the standby, a
call to pg_sync_replication_slots() may either create a slot or fail
to do so. A call at a slightly earlier or later time might have had a
different result. IIUC, this proposal would make different results due
to minor timing variations less probable.

--
Robert Haas
EDB: http://www.enterprisedb.com

#18

Amit Kapila

amit.kapila16@gmail.com

9 months ago

In reply to: shveta malik (#10)

Re: Replication slot is not able to sync up

On Thu, May 29, 2025 at 8:39 AM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, May 28, 2025 at 11:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I didn't know it was intended for testing and debugging purposes so
clearilying it in the documentation would be a good idea.

I have added the suggested docs in v3.

- errmsg("could not synchronize replication slot \"%s\"", remote_slot->name),
- errdetail("Logical decoding could not find consistent point from
local slot's LSN %X/%X.",
+ errmsg("could not synchronize replication slot \"%s\" to prevent
data loss", remote_slot->name),
+ errdetail("Standby does not have enough data to decode WALs at LSN %X/%X.",
    LSN_FORMAT_ARGS(slot->data.restart_lsn)));

I find the errdetail is not clear about the current state, which is
that we can't yet build a consistent snapshot on the standby to allow
decoding. Would it be better to have errdetail like: "Standby could
not build a consistent snapshot to decode WALs at LSN %X/%X.?

--
With Regards,
Amit Kapila.

#19

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

9 months ago

In reply to: shveta malik (#10)

RE: Replication slot is not able to sync up

On Thu, May 29, 2025 at 11:09 AM shveta malik wrote:

On Wed, May 28, 2025 at 11:56 AM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

I didn't know it was intended for testing and debugging purposes so
clearilying it in the documentation would be a good idea.

I have added the suggested docs in v3.

Thanks for updating the patch.

I have few suggestions for the document from a user's perspective.

... , one
condition must be met. The logical replication slot on primary must be advanced
to such a catalog change position (catalog_xmin) and WAL's LSN (restart_lsn) for
which sufficient data is retained on the corresponding standby server.

The term "catalog change position" might be not be very eaiser for some readers
to grasp. Would it be clearer to phrase it as follows ?

"The logical replication slot on the primary must reach a state where the WALs
and system catalog rows retained by the slot are also present on the
corresponding standby server. "

If the primary slot is still lagging behind and synchronization is attempted
for the first time, then to prevent the data loss as explained, persistence
and synchronization of newly created slot will be skipped, and the following
log message may appear on standby.

The phrase "lagging behind" typically refers to the standby, which can be a bit
confusing. I understand that user can context around to understand it, but
would it be eaiser to undertand by providing a more detailed description like
below ?

"If the WALs and system catalog rows retained by the slot on the primary have
already been purged from the standby server, ..."

3.
<programlisting>
LOG: could not synchronize replication slot "failover_slot" to prevent data loss
DETAIL: The remote slot needs WAL at LSN 0/3003F28 and catalog xmin 754, but the standby has LSN 0/3003F28 and catalog xmin 766.
</programlisting>

It seems that it lacks one space between "LOG:" and the message

Best Regards,
Hou zj

#20

shveta malik

shveta.malik@gmail.com

9 months ago

In reply to: Zhijie Hou (Fujitsu) (#19)

Re: Replication slot is not able to sync up

On Tue, Jun 10, 2025 at 3:20 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote: