Improve conflict detection when replication origins are reused

Started by Nisha Moond11 days ago8 messageshackers

nisha.moond412@gmail.com

11 days ago

Hi hackers,

While reviewing the issue reported at [1]/messages/by-id/CALDaNm3Y6Y4Mub6QC8fZKnNy5jZspELQYCoQF_FL2Zwzweu=og@mail.gmail.com and the proposed solutions
at [2]/messages/by-id/CAA4eK1LxGXR7jOAKh0B8N362S-Q3b6GhBxxcV_HxUaicEPq5Cg@mail.gmail.com, I noticed a related problem: false negative conflict detection
when a 'ReplOriginId' gets reused.

In logical replication, conflict detection relies on the tuple’s
replication origin ('roident'). The problem is that if a subscription
is dropped and a new subscription later reuses the same origin ID, the
apply worker may incorrectly treat incoming changes as “its own”
changes and skip conflict detection.

A simple example:
1. Create subscription sub1 with 'roident = 1'
2. Replicate some rows into table 't1'
3. Drop 'sub1'
4. Create another subscription 'sub2'
5. `sub2` reuses 'roident = 1'
6. New updates arrive for rows previously written by 'sub1'
At this point, conflict detection sees:
tuple_origin == current_origin

and incorrectly assumes the row was written by the current
subscription instance, so no 'update_origin_differ' conflict is
raised.

This may look harmless in this simple setup, but it becomes
problematic if the new subscription is connected to a different
publisher, because real conflicts can then be silently missed.

I explored two possible approaches to solve this:

Approach 1. Zero out old origin IDs in commit_ts data when dropping a
subscription
----------------------
- When a subscription is dropped and its replication origin becomes
free, scan all 'commit_ts' SLRU entries and replace that old origin ID
with 'InvalidRepOriginId (0)'.
- So rows previously written by the old subscription would no longer
appear to belong to any active replication origin.
- A new subscription reusing the same 'roident' will always conflict
with origin '0'.

Pros:
- Fixes the stale-origin problem completely and may also help solve
the tablesync-origin issue discussed in [1]/messages/by-id/CALDaNm3Y6Y4Mub6QC8fZKnNy5jZspELQYCoQF_FL2Zwzweu=og@mail.gmail.com
- No additional checks needed during conflict detection

Cons:
- Requires scanning the entire 'commit_ts' SLRU during DROP
SUBSCRIPTION, so it can become very expensive on large systems
- Not crash-safe currently(patch):
- if the server crashes midway, some entries may still contain the
old origin ID
- after restart, reused origins can again lead to missed conflicts
- Making this fully crash-safe would likely require WAL logging or
recovery-time reprocessing.

Approach 2. Store replication origin creation time
----------------------
- Add a creation timestamp for each replication origin
- During conflict check:
if tuple_origin != current_origin
-> existing behavior
if tuple_origin == current_origin
-> compare tuple commit timestamp with origin creation time
if tuple_commit_ts <= origin_creation_time
-> treat as an origin reuse case and raise conflict

Pros:
-------
- No additional processing during DROP SUBSCRIPTION
- Lightweight runtime check (just one timestamp comparison)
- Naturally crash-safe since origin creation is WAL-logged already

Cons:
- Requires a catalog schema change
- The <= comparison can produce false-positive conflicts for rows
committed at the exact same microsecond as origin creation
- May require additional handling for upgraded origins

IMO, the second approach currently looks more practical because it
avoids the heavy SLRU scan and crash-recovery complexity.

Attached:
- Patch for approach 1
- Patch for approach 2
- A TAP test reproducing the issue

Note: The patches are manually tested for the reported issue, but not
yet tested for performance or additional edge cases.

Feedback and suggestions are welcome.

[1]: /messages/by-id/CALDaNm3Y6Y4Mub6QC8fZKnNy5jZspELQYCoQF_FL2Zwzweu=og@mail.gmail.com
[2]: /messages/by-id/CAA4eK1LxGXR7jOAKh0B8N362S-Q3b6GhBxxcV_HxUaicEPq5Cg@mail.gmail.com

--
Thanks,
Nisha

shveta malik

shveta.malik@gmail.com

10 days ago

In reply to: Nisha Moond (#1)

Re: Improve conflict detection when replication origins are reused

On Thu, May 14, 2026 at 8:35 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

Hi hackers,

While reviewing the issue reported at [1] and the proposed solutions
at [2], I noticed a related problem: false negative conflict detection
when a 'ReplOriginId' gets reused.

In logical replication, conflict detection relies on the tuple’s
replication origin ('roident'). The problem is that if a subscription
is dropped and a new subscription later reuses the same origin ID, the
apply worker may incorrectly treat incoming changes as “its own”
changes and skip conflict detection.

A simple example:
1. Create subscription sub1 with 'roident = 1'
2. Replicate some rows into table 't1'
3. Drop 'sub1'
4. Create another subscription 'sub2'
5. `sub2` reuses 'roident = 1'
6. New updates arrive for rows previously written by 'sub1'
At this point, conflict detection sees:
tuple_origin == current_origin

and incorrectly assumes the row was written by the current
subscription instance, so no 'update_origin_differ' conflict is
raised.

I agree with the problem sattement. I will prioritize the review soon.

Show quoted text

This may look harmless in this simple setup, but it becomes
problematic if the new subscription is connected to a different
publisher, because real conflicts can then be silently missed.

I explored two possible approaches to solve this:

Approach 1. Zero out old origin IDs in commit_ts data when dropping a
subscription
----------------------
- When a subscription is dropped and its replication origin becomes
free, scan all 'commit_ts' SLRU entries and replace that old origin ID
with 'InvalidRepOriginId (0)'.
- So rows previously written by the old subscription would no longer
appear to belong to any active replication origin.
- A new subscription reusing the same 'roident' will always conflict
with origin '0'.

Pros:
- Fixes the stale-origin problem completely and may also help solve
the tablesync-origin issue discussed in [1]
- No additional checks needed during conflict detection

Cons:
- Requires scanning the entire 'commit_ts' SLRU during DROP
SUBSCRIPTION, so it can become very expensive on large systems
- Not crash-safe currently(patch):
- if the server crashes midway, some entries may still contain the
old origin ID
- after restart, reused origins can again lead to missed conflicts
- Making this fully crash-safe would likely require WAL logging or
recovery-time reprocessing.

Approach 2. Store replication origin creation time
----------------------
- Add a creation timestamp for each replication origin
- During conflict check:
if tuple_origin != current_origin
-> existing behavior
if tuple_origin == current_origin
-> compare tuple commit timestamp with origin creation time
if tuple_commit_ts <= origin_creation_time
-> treat as an origin reuse case and raise conflict

Pros:
-------
- No additional processing during DROP SUBSCRIPTION
- Lightweight runtime check (just one timestamp comparison)
- Naturally crash-safe since origin creation is WAL-logged already

Cons:
- Requires a catalog schema change
- The <= comparison can produce false-positive conflicts for rows
committed at the exact same microsecond as origin creation
- May require additional handling for upgraded origins

IMO, the second approach currently looks more practical because it
avoids the heavy SLRU scan and crash-recovery complexity.

Attached:
- Patch for approach 1
- Patch for approach 2
- A TAP test reproducing the issue

Note: The patches are manually tested for the reported issue, but not
yet tested for performance or additional edge cases.

Feedback and suggestions are welcome.

[1] /messages/by-id/CALDaNm3Y6Y4Mub6QC8fZKnNy5jZspELQYCoQF_FL2Zwzweu=og@mail.gmail.com
[2] /messages/by-id/CAA4eK1LxGXR7jOAKh0B8N362S-Q3b6GhBxxcV_HxUaicEPq5Cg@mail.gmail.com

--
Thanks,
Nisha

shveta malik

shveta.malik@gmail.com

10 days ago

In reply to: shveta malik (#2)

Re: Improve conflict detection when replication origins are reused

On Fri, May 15, 2026 at 8:56 AM shveta malik <shveta.malik@gmail.com> wrote:

On Thu, May 14, 2026 at 8:35 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

Hi hackers,

While reviewing the issue reported at [1] and the proposed solutions
at [2], I noticed a related problem: false negative conflict detection
when a 'ReplOriginId' gets reused.

In logical replication, conflict detection relies on the tuple’s
replication origin ('roident'). The problem is that if a subscription
is dropped and a new subscription later reuses the same origin ID, the
apply worker may incorrectly treat incoming changes as “its own”
changes and skip conflict detection.

A simple example:
1. Create subscription sub1 with 'roident = 1'
2. Replicate some rows into table 't1'
3. Drop 'sub1'
4. Create another subscription 'sub2'
5. `sub2` reuses 'roident = 1'
6. New updates arrive for rows previously written by 'sub1'
At this point, conflict detection sees:
tuple_origin == current_origin

and incorrectly assumes the row was written by the current
subscription instance, so no 'update_origin_differ' conflict is
raised.

I agree with the problem sattement. I will prioritize the review soon.

This may look harmless in this simple setup, but it becomes
problematic if the new subscription is connected to a different
publisher, because real conflicts can then be silently missed.

I explored two possible approaches to solve this:

Approach 1. Zero out old origin IDs in commit_ts data when dropping a
subscription
----------------------
- When a subscription is dropped and its replication origin becomes
free, scan all 'commit_ts' SLRU entries and replace that old origin ID
with 'InvalidRepOriginId (0)'.
- So rows previously written by the old subscription would no longer
appear to belong to any active replication origin.
- A new subscription reusing the same 'roident' will always conflict
with origin '0'.

Pros:
- Fixes the stale-origin problem completely and may also help solve
the tablesync-origin issue discussed in [1]
- No additional checks needed during conflict detection

Cons:
- Requires scanning the entire 'commit_ts' SLRU during DROP
SUBSCRIPTION, so it can become very expensive on large systems
- Not crash-safe currently(patch):
- if the server crashes midway, some entries may still contain the
old origin ID
- after restart, reused origins can again lead to missed conflicts
- Making this fully crash-safe would likely require WAL logging or
recovery-time reprocessing.

Approach 2. Store replication origin creation time
----------------------
- Add a creation timestamp for each replication origin
- During conflict check:
if tuple_origin != current_origin
-> existing behavior
if tuple_origin == current_origin
-> compare tuple commit timestamp with origin creation time
if tuple_commit_ts <= origin_creation_time
-> treat as an origin reuse case and raise conflict

Pros:
-------
- No additional processing during DROP SUBSCRIPTION
- Lightweight runtime check (just one timestamp comparison)
- Naturally crash-safe since origin creation is WAL-logged already

Cons:
- Requires a catalog schema change
- The <= comparison can produce false-positive conflicts for rows
committed at the exact same microsecond as origin creation
- May require additional handling for upgraded origins

IMO, the second approach currently looks more practical because it
avoids the heavy SLRU scan and crash-recovery complexity.

Attached:
- Patch for approach 1
- Patch for approach 2
- A TAP test reproducing the issue

Note: The patches are manually tested for the reported issue, but not
yet tested for performance or additional edge cases.

Feedback and suggestions are welcome.

[1] /messages/by-id/CALDaNm3Y6Y4Mub6QC8fZKnNy5jZspELQYCoQF_FL2Zwzweu=og@mail.gmail.com
[2] /messages/by-id/CAA4eK1LxGXR7jOAKh0B8N362S-Q3b6GhBxxcV_HxUaicEPq5Cg@mail.gmail.com

--

Nisha, I think we will get the same problem in another scenario too:

create pub1-server1
create pub1-server2
create sub1-server3; subscribing to pub1-server1

--On both server1 and server2, insert same set of rows:
insert into tab1 values (10), (20), (30);

Sub1 (server3) will get the rows from server1.
Now alter sub1 to connect to server2 (you will have to create slot
manually on server2)
SELECT pg_create_logical_replication_slot('sub1', 'pgoutput', false,
false, false);

--Now perform the update on server2:
update tab1 set i=11 where i=10;

The subscriber on server3 will receive update form server2 and will
update the row inserted by server1 origianlly without raising
update_origin_differ.

Can you please confirm if my understanding of the problem statement is
correct and if the scenario above will also result in a similar
situation? IIUC, in such a case, the proposed solutions may not work
directly and will need to be further evolved. I will think more once
you confirm my understanding.

thanks
Shveta

Nisha Moond

nisha.moond412@gmail.com

10 days ago

In reply to: shveta malik (#3)

Re: Improve conflict detection when replication origins are reused

On Fri, May 15, 2026 at 3:27 PM shveta malik <shveta.malik@gmail.com> wrote:

Nisha, I think we will get the same problem in another scenario too:

create pub1-server1
create pub1-server2
create sub1-server3; subscribing to pub1-server1

--On both server1 and server2, insert same set of rows:
insert into tab1 values (10), (20), (30);

Sub1 (server3) will get the rows from server1.
Now alter sub1 to connect to server2 (you will have to create slot
manually on server2)
SELECT pg_create_logical_replication_slot('sub1', 'pgoutput', false,
false, false);

--Now perform the update on server2:
update tab1 set i=11 where i=10;

The subscriber on server3 will receive update form server2 and will
update the row inserted by server1 origianlly without raising
update_origin_differ.

Can you please confirm if my understanding of the problem statement is
correct and if the scenario above will also result in a similar
situation? IIUC, in such a case, the proposed solutions may not work
directly and will need to be further evolved. I will think more once
you confirm my understanding.

I agree that the above scenario will not raise a conflict, and I think
that is expected with the current replication model, which tracks
which subscription stream applied a row, not which publisher server it
originally came from.

With the existing replication model, we can also see the opposite
scenario of what you mentioned: if two subscriptions replicate the
same table from the same publisher, update_origin_differs conflicts
can still be raised even though both changes come from the same
source. This again shows that origin identity today is effectively
tied to the subscription stream, not the publisher server.

If we want conflict detection based on publisher identity, that would
require a different model altogether, closer to systems like
BDR/pglogical, which track global node identities across the
replication chain.

So for now, I think the above scenario is outside the scope of the
current subscription-level origin tracking design.

Thoughts?

--
Thanks,
Nisha

shveta malik

shveta.malik@gmail.com

6 days ago

In reply to: Nisha Moond (#4)

Re: Improve conflict detection when replication origins are reused

On Fri, May 15, 2026 at 4:45 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

On Fri, May 15, 2026 at 3:27 PM shveta malik <shveta.malik@gmail.com> wrote:

Nisha, I think we will get the same problem in another scenario too:

create pub1-server1
create pub1-server2
create sub1-server3; subscribing to pub1-server1

--On both server1 and server2, insert same set of rows:
insert into tab1 values (10), (20), (30);

Sub1 (server3) will get the rows from server1.
Now alter sub1 to connect to server2 (you will have to create slot
manually on server2)
SELECT pg_create_logical_replication_slot('sub1', 'pgoutput', false,
false, false);

--Now perform the update on server2:
update tab1 set i=11 where i=10;

The subscriber on server3 will receive update form server2 and will
update the row inserted by server1 origianlly without raising
update_origin_differ.

Can you please confirm if my understanding of the problem statement is
correct and if the scenario above will also result in a similar
situation? IIUC, in such a case, the proposed solutions may not work
directly and will need to be further evolved. I will think more once
you confirm my understanding.

I agree that the above scenario will not raise a conflict, and I think
that is expected with the current replication model, which tracks
which subscription stream applied a row, not which publisher server it
originally came from.

With the existing replication model, we can also see the opposite
scenario of what you mentioned: if two subscriptions replicate the
same table from the same publisher, update_origin_differs conflicts
can still be raised even though both changes come from the same
source. This again shows that origin identity today is effectively
tied to the subscription stream, not the publisher server.

Yes, I agree. Thansk for details.

If we want conflict detection based on publisher identity, that would
require a different model altogether, closer to systems like
BDR/pglogical, which track global node identities across the
replication chain.

So for now, I think the above scenario is outside the scope of the
current subscription-level origin tracking design.

Yes, looks like so.

thanks
Shveta

shveta malik

shveta.malik@gmail.com

6 days ago

In reply to: Nisha Moond (#1)

Re: Improve conflict detection when replication origins are reused

On Thu, May 14, 2026 at 8:35 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

Hi hackers,

While reviewing the issue reported at [1] and the proposed solutions
at [2], I noticed a related problem: false negative conflict detection
when a 'ReplOriginId' gets reused.

In logical replication, conflict detection relies on the tuple’s
replication origin ('roident'). The problem is that if a subscription
is dropped and a new subscription later reuses the same origin ID, the
apply worker may incorrectly treat incoming changes as “its own”
changes and skip conflict detection.

A simple example:
1. Create subscription sub1 with 'roident = 1'
2. Replicate some rows into table 't1'
3. Drop 'sub1'
4. Create another subscription 'sub2'
5. `sub2` reuses 'roident = 1'
6. New updates arrive for rows previously written by 'sub1'
At this point, conflict detection sees:
tuple_origin == current_origin

and incorrectly assumes the row was written by the current
subscription instance, so no 'update_origin_differ' conflict is
raised.

This may look harmless in this simple setup, but it becomes
problematic if the new subscription is connected to a different
publisher, because real conflicts can then be silently missed.

I explored two possible approaches to solve this:

Approach 1. Zero out old origin IDs in commit_ts data when dropping a
subscription
----------------------
- When a subscription is dropped and its replication origin becomes
free, scan all 'commit_ts' SLRU entries and replace that old origin ID
with 'InvalidRepOriginId (0)'.
- So rows previously written by the old subscription would no longer
appear to belong to any active replication origin.
- A new subscription reusing the same 'roident' will always conflict
with origin '0'.

Pros:
- Fixes the stale-origin problem completely and may also help solve
the tablesync-origin issue discussed in [1]
- No additional checks needed during conflict detection

Cons:
- Requires scanning the entire 'commit_ts' SLRU during DROP
SUBSCRIPTION, so it can become very expensive on large systems
- Not crash-safe currently(patch):
- if the server crashes midway, some entries may still contain the
old origin ID
- after restart, reused origins can again lead to missed conflicts
- Making this fully crash-safe would likely require WAL logging or
recovery-time reprocessing.

Approach 2. Store replication origin creation time
----------------------
- Add a creation timestamp for each replication origin
- During conflict check:
if tuple_origin != current_origin
-> existing behavior
if tuple_origin == current_origin
-> compare tuple commit timestamp with origin creation time
if tuple_commit_ts <= origin_creation_time
-> treat as an origin reuse case and raise conflict

Pros:
-------
- No additional processing during DROP SUBSCRIPTION
- Lightweight runtime check (just one timestamp comparison)
- Naturally crash-safe since origin creation is WAL-logged already

Cons:
- Requires a catalog schema change
- The <= comparison can produce false-positive conflicts for rows
committed at the exact same microsecond as origin creation
- May require additional handling for upgraded origins

IMO, the second approach currently looks more practical because it
avoids the heavy SLRU scan and crash-recovery complexity.

I find Approach 2 the most practical. I explored other ideas but none
seem completely reliable or worth the effort to justify this use-case.
A few ideas I considered are:

1) We could modify replorigin_create to exhaust the full range of IDs
sequentially before reusing them. But this is not a reliable solution.
It would make the bug much harder to hit, but a busy system could
still eventually exhaust the 2-byte limit of 65K IDs, after which the
problem may reappear.

2) Using LSN Matching instead of timestamp. To completely eliminate
the edge case where a timestamp results in a false-positive case, we
could track the origin_creation_lsn and compare it against the tuple's
commit LSN. IIUC, it would require extending commit_ts to include
8-byte of commit-lsn which might not be a good idea. So this idea may
also not be desirable unless there is an existing way to extract
commit-lsn (which I am not aware of) without extending the commit-ts
structure?

thanks
Shveta

Nisha Moond

nisha.moond412@gmail.com

6 days ago

In reply to: shveta malik (#6)

Re: Improve conflict detection when replication origins are reused

On Tue, May 19, 2026 at 2:52 PM shveta malik <shveta.malik@gmail.com> wrote:

I find Approach 2 the most practical. I explored other ideas but none
seem completely reliable or worth the effort to justify this use-case.
A few ideas I considered are:

1) We could modify replorigin_create to exhaust the full range of IDs
sequentially before reusing them. But this is not a reliable solution.
It would make the bug much harder to hit, but a busy system could
still eventually exhaust the 2-byte limit of 65K IDs, after which the
problem may reappear.

2) Using LSN Matching instead of timestamp. To completely eliminate
the edge case where a timestamp results in a false-positive case, we
could track the origin_creation_lsn and compare it against the tuple's
commit LSN. IIUC, it would require extending commit_ts to include
8-byte of commit-lsn which might not be a good idea. So this idea may
also not be desirable unless there is an existing way to extract
commit-lsn (which I am not aware of) without extending the commit-ts
structure?

Using LSN is a good idea. I looked through the code a bit, and
extending `commit_ts` seems like the only option. I also could not
find anything existing from which we can extract the commit LSN of a
tuple while applying a change.
Every heap page has pd_lsn (accessible via PageGetLSN(page)), which
stores the LSN of the most recent WAL record that modified the page.
But this doesn't help, as there is no correlation to a specific
tuple's xmin.

--
Thanks,
Nisha

shveta malik

shveta.malik@gmail.com

5 days ago

In reply to: Nisha Moond (#7)

Re: Improve conflict detection when replication origins are reused

On Tue, May 19, 2026 at 7:08 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

On Tue, May 19, 2026 at 2:52 PM shveta malik <shveta.malik@gmail.com> wrote:

I find Approach 2 the most practical. I explored other ideas but none
seem completely reliable or worth the effort to justify this use-case.
A few ideas I considered are:

1) We could modify replorigin_create to exhaust the full range of IDs
sequentially before reusing them. But this is not a reliable solution.
It would make the bug much harder to hit, but a busy system could
still eventually exhaust the 2-byte limit of 65K IDs, after which the
problem may reappear.

2) Using LSN Matching instead of timestamp. To completely eliminate
the edge case where a timestamp results in a false-positive case, we
could track the origin_creation_lsn and compare it against the tuple's
commit LSN. IIUC, it would require extending commit_ts to include
8-byte of commit-lsn which might not be a good idea. So this idea may
also not be desirable unless there is an existing way to extract
commit-lsn (which I am not aware of) without extending the commit-ts
structure?

Using LSN is a good idea. I looked through the code a bit, and
extending `commit_ts` seems like the only option. I also could not
find anything existing from which we can extract the commit LSN of a
tuple while applying a change.
Every heap page has pd_lsn (accessible via PageGetLSN(page)), which
stores the LSN of the most recent WAL record that modified the page.
But this doesn't help, as there is no correlation to a specific
tuple's xmin.

Even I could not find any existing way to get the commit-LSN. We have
TransactionIdGetCommitLSN() but this does not return exact commit-lsn.

thanks
Shveta

Improve conflict detection when replication origins are reused

Attachments: