Conflict Detection and Resolution

tomas.vondra@2ndquadrant.com

about 2 years ago

In reply to: shveta malik (#1)

Re: Conflict Detection and Resolution

On 5/23/24 08:36, shveta malik wrote:

Hello hackers,

Please find the proposal for Conflict Detection and Resolution (CDR)
for Logical replication.
<Thanks to Nisha, Hou-San, and Amit who helped in figuring out the
below details.>

Introduction
================
In case the node is subscribed to multiple providers, or when local
writes happen on a subscriber, conflicts can arise for the incoming
changes. CDR is the mechanism to automatically detect and resolve
these conflicts depending on the application and configurations.
CDR is not applicable for the initial table sync. If locally, there
exists conflicting data on the table, the table sync worker will fail.
Please find the details on CDR in apply worker for INSERT, UPDATE and
DELETE operations:

Which architecture are you aiming for? Here you talk about multiple
providers, but the wiki page mentions active-active. I'm not sure how
much this matters, but it might.

Also, what kind of consistency you expect from this? Because none of
these simple conflict resolution methods can give you the regular
consistency models we're used to, AFAICS.

INSERT
================
To resolve INSERT conflict on subscriber, it is important to find out
the conflicting row (if any) before we attempt an insertion. The
indexes or search preference for the same will be:
First check for replica identity (RI) index.
- if not found, check for the primary key (PK) index.
- if not found, then check for unique indexes (individual ones or
added by unique constraints)
- if unique index also not found, skip CDR

Note: if no RI index, PK, or unique index is found but
REPLICA_IDENTITY_FULL is defined, CDR will still be skipped.
The reason being that even though a row can be identified with
REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate
rows. Hence, we should not go for conflict detection in such a case.

It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is
allowed to have duplicate values? It just means the upstream is sending
the whole original row, there can still be a PK/UNIQUE index on both the
publisher and subscriber.

In case of replica identity ‘nothing’ and in absence of any suitable
index (as defined above), CDR will be skipped for INSERT.

Conflict Type:
----------------
insert_exists: A conflict is detected when the table has the same
value for a key column as the new value in the incoming row.

Conflict Resolution
----------------
a) latest_timestamp_wins: The change with later commit timestamp wins.
b) earliest_timestamp_wins: The change with earlier commit timestamp wins.
c) apply: Always apply the remote change.
d) skip: Remote change is skipped.
e) error: Error out on conflict. Replication is stopped, manual
action is needed.

Why not to have some support for user-defined conflict resolution
methods, allowing to do more complex stuff (e.g. merging the rows in
some way, perhaps even with datatype-specific behavior)?

The change will be converted to 'UPDATE' and applied if the decision
is in favor of applying remote change.

It is important to have commit timestamp info available on subscriber
when latest_timestamp_wins or earliest_timestamp_wins method is chosen
as resolution method. Thus ‘track_commit_timestamp’ must be enabled
on subscriber, in absence of which, configuring the said
timestamp-based resolution methods will result in error.

Note: If the user has chosen the latest or earliest_timestamp_wins,
and the remote and local timestamps are the same, then it will go by
system identifier. The change with a higher system identifier will
win. This will ensure that the same change is picked on all the nodes.

How is this going to deal with the fact that commit LSN and timestamps
may not correlate perfectly? That is, commits may happen with LSN1 <
LSN2 but with T1 > T2.

UPDATE
================

Conflict Detection Method:
--------------------------------
Origin conflict detection: The ‘origin’ info is used to detect
conflict which can be obtained from commit-timestamp generated for
incoming txn at the source node. To compare remote’s origin with the
local’s origin, we must have origin information for local txns as well
which can be obtained from commit-timestamp after enabling
‘track_commit_timestamp’ locally.
The one drawback here is the ‘origin’ information cannot be obtained
once the row is frozen and the commit-timestamp info is removed by
vacuum. For a frozen row, conflicts cannot be raised, and thus the
incoming changes will be applied in all the cases.

Conflict Types:
----------------
a) update_differ: The origin of an incoming update's key row differs
from the local row i.e.; the row has already been updated locally or
by different nodes.
b) update_missing: The row with the same value as that incoming
update's key does not exist. Remote is trying to update a row which
does not exist locally.
c) update_deleted: The row with the same value as that incoming
update's key does not exist. The row is already deleted. This conflict
type is generated only if the deleted row is still detectable i.e., it
is not removed by VACUUM yet. If the row is removed by VACUUM already,
it cannot detect this conflict. It will detect it as update_missing
and will follow the default or configured resolver of update_missing
itself.

I don't understand the why should update_missing or update_deleted be
different, especially considering it's not detected reliably. And also
that even if we happen to find the row the associated TOAST data may
have already been removed. So why would this matter?

Conflict Resolutions:
----------------
a) latest_timestamp_wins: The change with later commit timestamp
wins. Can be used for ‘update_differ’.
b) earliest_timestamp_wins: The change with earlier commit
timestamp wins. Can be used for ‘update_differ’.
c) apply: The remote change is always applied. Can be used for
‘update_differ’.
d) apply_or_skip: Remote change is converted to INSERT and is
applied. If the complete row cannot be constructed from the info
provided by the publisher, then the change is skipped. Can be used for
‘update_missing’ or ‘update_deleted’.
e) apply_or_error: Remote change is converted to INSERT and is
applied. If the complete row cannot be constructed from the info
provided by the publisher, then error is raised. Can be used for
‘update_missing’ or ‘update_deleted’.
f) skip: Remote change is skipped and local one is retained. Can be
used for any conflict type.
g) error: Error out of conflict. Replication is stopped, manual
action is needed. Can be used for any conflict type.

To support UPDATE CDR, the presence of either replica identity Index
or primary key is required on target node. Update CDR will not be
supported in absence of replica identity index or primary key even
though REPLICA IDENTITY FULL is set. Please refer to "UPDATE" in
"Noteworthey Scenarios" section in [1] for further details.

DELETE
================
Conflict Type:
----------------
delete_missing: An incoming delete is trying to delete a row on a
target node which does not exist.

Conflict Resolutions:
----------------
a) error : Error out on conflict. Replication is stopped, manual
action is needed.
b) skip : The remote change is skipped.

Configuring Conflict Resolution:
------------------------------------------------
There are two parts when it comes to configuring CDR:

a) Enabling/Disabling conflict detection.
b) Configuring conflict resolvers for different conflict types.

Users can sometimes create multiple subscriptions on the same node,
subscribing to different tables to improve replication performance by
starting multiple apply workers. If the tables in one subscription are
less likely to cause conflict, then it is possible that user may want
conflict detection disabled for that subscription to avoid detection
latency while enabling it for other subscriptions. This generates a
requirement to make ‘conflict detection’ configurable per
subscription. While the conflict resolver configuration can remain
global. All the subscriptions which opt for ‘conflict detection’ will
follow global conflict resolver configuration.

To implement the above, subscription commands will be changed to have
one more parameter 'conflict_resolution=on/off', default will be OFF.

To configure global resolvers, new DDL command will be introduced:

CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>

I very much doubt we want a single global conflict resolver, or even one
resolver per subscription. It seems like a very table-specific thing.

Also, doesn't all this whole design ignore the concurrency between
publishers? Isn't this problematic considering the commit timestamps may
go backwards (for a given publisher), which means the conflict
resolution is not deterministic (as it depends on how exactly it
interleaves)?

-------------------------

Apart from the above three main operations and resolver configuration,
there are more conflict types like primary-key updates, multiple
unique constraints etc and some special scenarios to be considered.
Complete design details can be found in [1].

[1]: https://wiki.postgresql.org/wiki/Conflict_Detection_and_Resolution

Hmmm, not sure it's good to have a "complete" design on wiki, and only
some subset posted to the mailing list. I haven't compared what the
differences are, though.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

shveta.malik@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#2)

Re: Conflict Detection and Resolution

On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/23/24 08:36, shveta malik wrote:

Hello hackers,

Please find the proposal for Conflict Detection and Resolution (CDR)
for Logical replication.
<Thanks to Nisha, Hou-San, and Amit who helped in figuring out the
below details.>

Introduction
================
In case the node is subscribed to multiple providers, or when local
writes happen on a subscriber, conflicts can arise for the incoming
changes. CDR is the mechanism to automatically detect and resolve
these conflicts depending on the application and configurations.
CDR is not applicable for the initial table sync. If locally, there
exists conflicting data on the table, the table sync worker will fail.
Please find the details on CDR in apply worker for INSERT, UPDATE and
DELETE operations:

Which architecture are you aiming for? Here you talk about multiple
providers, but the wiki page mentions active-active. I'm not sure how
much this matters, but it might.

Currently, we are working for multi providers case but ideally it
should work for active-active also. During further discussion and
implementation phase, if we find that, there are cases which will not
work in straight-forward way for active-active, then our primary focus
will remain to first implement it for multiple providers architecture.

Also, what kind of consistency you expect from this? Because none of
these simple conflict resolution methods can give you the regular
consistency models we're used to, AFAICS.

Can you please explain a little bit more on this.

INSERT
================
To resolve INSERT conflict on subscriber, it is important to find out
the conflicting row (if any) before we attempt an insertion. The
indexes or search preference for the same will be:
First check for replica identity (RI) index.
- if not found, check for the primary key (PK) index.
- if not found, then check for unique indexes (individual ones or
added by unique constraints)
- if unique index also not found, skip CDR

Note: if no RI index, PK, or unique index is found but
REPLICA_IDENTITY_FULL is defined, CDR will still be skipped.
The reason being that even though a row can be identified with
REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate
rows. Hence, we should not go for conflict detection in such a case.

It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is
allowed to have duplicate values? It just means the upstream is sending
the whole original row, there can still be a PK/UNIQUE index on both the
publisher and subscriber.

Yes, right. Sorry for confusion. I meant the same i.e. in absence of
'RI index, PK, or unique index', tables can have duplicates. So even
in presence of Replica-identity (FULL in this case) but in absence of
unique/primary index, CDR will be skipped for INSERT.

In case of replica identity ‘nothing’ and in absence of any suitable
index (as defined above), CDR will be skipped for INSERT.

Conflict Type:
----------------
insert_exists: A conflict is detected when the table has the same
value for a key column as the new value in the incoming row.

Conflict Resolution
----------------
a) latest_timestamp_wins: The change with later commit timestamp wins.
b) earliest_timestamp_wins: The change with earlier commit timestamp wins.
c) apply: Always apply the remote change.
d) skip: Remote change is skipped.
e) error: Error out on conflict. Replication is stopped, manual
action is needed.

Why not to have some support for user-defined conflict resolution
methods, allowing to do more complex stuff (e.g. merging the rows in
some way, perhaps even with datatype-specific behavior)?

Initially, for the sake of simplicity, we are targeting to support
built-in resolvers. But we have a plan to work on user-defined
resolvers as well. We shall propose that separately.

The change will be converted to 'UPDATE' and applied if the decision
is in favor of applying remote change.

It is important to have commit timestamp info available on subscriber
when latest_timestamp_wins or earliest_timestamp_wins method is chosen
as resolution method. Thus ‘track_commit_timestamp’ must be enabled
on subscriber, in absence of which, configuring the said
timestamp-based resolution methods will result in error.

Note: If the user has chosen the latest or earliest_timestamp_wins,
and the remote and local timestamps are the same, then it will go by
system identifier. The change with a higher system identifier will
win. This will ensure that the same change is picked on all the nodes.

How is this going to deal with the fact that commit LSN and timestamps
may not correlate perfectly? That is, commits may happen with LSN1 <
LSN2 but with T1 > T2.

Are you pointing to the issue where a session/txn has taken
'xactStopTimestamp' timestamp earlier but is delayed to insert record
in XLOG, while another session/txn which has taken timestamp slightly
later succeeded to insert the record IN XLOG sooner than the session1,
making LSN and Timestamps out of sync? Going by this scenario, the
commit-timestamp may not be reflective of actual commits and thus
timestamp-based resolvers may take wrong decisions. Or do you mean
something else?

If this is the problem you are referring to, then I think this needs a
fix at the publisher side. Let me think more about it . Kindly let me
know if you have ideas on how to tackle it.

UPDATE
================

Conflict Detection Method:
--------------------------------
Origin conflict detection: The ‘origin’ info is used to detect
conflict which can be obtained from commit-timestamp generated for
incoming txn at the source node. To compare remote’s origin with the
local’s origin, we must have origin information for local txns as well
which can be obtained from commit-timestamp after enabling
‘track_commit_timestamp’ locally.
The one drawback here is the ‘origin’ information cannot be obtained
once the row is frozen and the commit-timestamp info is removed by
vacuum. For a frozen row, conflicts cannot be raised, and thus the
incoming changes will be applied in all the cases.

Conflict Types:
----------------
a) update_differ: The origin of an incoming update's key row differs
from the local row i.e.; the row has already been updated locally or
by different nodes.
b) update_missing: The row with the same value as that incoming
update's key does not exist. Remote is trying to update a row which
does not exist locally.
c) update_deleted: The row with the same value as that incoming
update's key does not exist. The row is already deleted. This conflict
type is generated only if the deleted row is still detectable i.e., it
is not removed by VACUUM yet. If the row is removed by VACUUM already,
it cannot detect this conflict. It will detect it as update_missing
and will follow the default or configured resolver of update_missing
itself.

I don't understand the why should update_missing or update_deleted be
different, especially considering it's not detected reliably. And also
that even if we happen to find the row the associated TOAST data may
have already been removed. So why would this matter?

Here, we are trying to tackle the case where the row is 'recently'
deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
want to opt for a different resolution in such a case as against the
one where the corresponding row was not even present in the first
place. The case where the row was deleted long back may not fall into
this category as there are higher chances that they have been removed
by vacuum and can be considered equivalent to the update_ missing
case.

Regarding "TOAST column" for deleted row cases, we may need to dig
more. Thanks for bringing this case. Let me analyze more here.

Conflict Resolutions:
----------------
a) latest_timestamp_wins: The change with later commit timestamp
wins. Can be used for ‘update_differ’.
b) earliest_timestamp_wins: The change with earlier commit
timestamp wins. Can be used for ‘update_differ’.
c) apply: The remote change is always applied. Can be used for
‘update_differ’.
d) apply_or_skip: Remote change is converted to INSERT and is
applied. If the complete row cannot be constructed from the info
provided by the publisher, then the change is skipped. Can be used for
‘update_missing’ or ‘update_deleted’.
e) apply_or_error: Remote change is converted to INSERT and is
applied. If the complete row cannot be constructed from the info
provided by the publisher, then error is raised. Can be used for
‘update_missing’ or ‘update_deleted’.
f) skip: Remote change is skipped and local one is retained. Can be
used for any conflict type.
g) error: Error out of conflict. Replication is stopped, manual
action is needed. Can be used for any conflict type.

To support UPDATE CDR, the presence of either replica identity Index
or primary key is required on target node. Update CDR will not be
supported in absence of replica identity index or primary key even
though REPLICA IDENTITY FULL is set. Please refer to "UPDATE" in
"Noteworthey Scenarios" section in [1] for further details.

DELETE
================
Conflict Type:
----------------
delete_missing: An incoming delete is trying to delete a row on a
target node which does not exist.

Conflict Resolutions:
----------------
a) error : Error out on conflict. Replication is stopped, manual
action is needed.
b) skip : The remote change is skipped.

Configuring Conflict Resolution:
------------------------------------------------
There are two parts when it comes to configuring CDR:

a) Enabling/Disabling conflict detection.
b) Configuring conflict resolvers for different conflict types.

Users can sometimes create multiple subscriptions on the same node,
subscribing to different tables to improve replication performance by
starting multiple apply workers. If the tables in one subscription are
less likely to cause conflict, then it is possible that user may want
conflict detection disabled for that subscription to avoid detection
latency while enabling it for other subscriptions. This generates a
requirement to make ‘conflict detection’ configurable per
subscription. While the conflict resolver configuration can remain
global. All the subscriptions which opt for ‘conflict detection’ will
follow global conflict resolver configuration.

To implement the above, subscription commands will be changed to have
one more parameter 'conflict_resolution=on/off', default will be OFF.

To configure global resolvers, new DDL command will be introduced:

CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>

I very much doubt we want a single global conflict resolver, or even one
resolver per subscription. It seems like a very table-specific thing.

Even we thought about this. We feel that even if we go for table based
or subscription based resolvers configuration, there may be use case
scenarios where the user is not interested in configuring resolvers
for each table and thus may want to give global ones. Thus, we should
provide a way for users to do global configuration. Thus we started
with global one. I have noted your point here and would also like to
know the opinion of others. We are open to discussion. We can either
opt for any of these 2 options (global or table) or we can opt for
both global and table/sub based one.

Also, doesn't all this whole design ignore the concurrency between
publishers? Isn't this problematic considering the commit timestamps may
go backwards (for a given publisher), which means the conflict
resolution is not deterministic (as it depends on how exactly it
interleaves)?

-------------------------

Apart from the above three main operations and resolver configuration,
there are more conflict types like primary-key updates, multiple
unique constraints etc and some special scenarios to be considered.
Complete design details can be found in [1].

[1]: https://wiki.postgresql.org/wiki/Conflict_Detection_and_Resolution

Hmmm, not sure it's good to have a "complete" design on wiki, and only
some subset posted to the mailing list. I haven't compared what the
differences are, though.

It would have been difficult to mention all the details in email
(including examples and corner scenarios) and thus we thought that it
will be better to document everything in wiki page for the time being.
We can keep on discussing the design and all the scenarios on need
basis (before implementation phase of that part) and thus eventually
everything will come in email on hackers. With out first patch, we
plan to provide everything in a README as well.

thanks
Shveta

nisha.moond412@gmail.com

about 2 years ago

In reply to: shveta malik (#3)

Re: Conflict Detection and Resolution

On Mon, May 27, 2024 at 11:19 AM shveta malik <shveta.malik@gmail.com> wrote:

On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/23/24 08:36, shveta malik wrote:

Hello hackers,

Please find the proposal for Conflict Detection and Resolution (CDR)
for Logical replication.
<Thanks to Nisha, Hou-San, and Amit who helped in figuring out the
below details.>

Introduction
================
In case the node is subscribed to multiple providers, or when local
writes happen on a subscriber, conflicts can arise for the incoming
changes. CDR is the mechanism to automatically detect and resolve
these conflicts depending on the application and configurations.
CDR is not applicable for the initial table sync. If locally, there
exists conflicting data on the table, the table sync worker will fail.
Please find the details on CDR in apply worker for INSERT, UPDATE and
DELETE operations:

Which architecture are you aiming for? Here you talk about multiple
providers, but the wiki page mentions active-active. I'm not sure how
much this matters, but it might.

Currently, we are working for multi providers case but ideally it
should work for active-active also. During further discussion and
implementation phase, if we find that, there are cases which will not
work in straight-forward way for active-active, then our primary focus
will remain to first implement it for multiple providers architecture.

Also, what kind of consistency you expect from this? Because none of
these simple conflict resolution methods can give you the regular
consistency models we're used to, AFAICS.

Can you please explain a little bit more on this.

INSERT
================
To resolve INSERT conflict on subscriber, it is important to find out
the conflicting row (if any) before we attempt an insertion. The
indexes or search preference for the same will be:
First check for replica identity (RI) index.
- if not found, check for the primary key (PK) index.
- if not found, then check for unique indexes (individual ones or
added by unique constraints)
- if unique index also not found, skip CDR

Note: if no RI index, PK, or unique index is found but
REPLICA_IDENTITY_FULL is defined, CDR will still be skipped.
The reason being that even though a row can be identified with
REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate
rows. Hence, we should not go for conflict detection in such a case.

It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is
allowed to have duplicate values? It just means the upstream is sending
the whole original row, there can still be a PK/UNIQUE index on both the
publisher and subscriber.

Yes, right. Sorry for confusion. I meant the same i.e. in absence of
'RI index, PK, or unique index', tables can have duplicates. So even
in presence of Replica-identity (FULL in this case) but in absence of
unique/primary index, CDR will be skipped for INSERT.

In case of replica identity ‘nothing’ and in absence of any suitable
index (as defined above), CDR will be skipped for INSERT.

Conflict Type:
----------------
insert_exists: A conflict is detected when the table has the same
value for a key column as the new value in the incoming row.

Conflict Resolution
----------------
a) latest_timestamp_wins: The change with later commit timestamp wins.
b) earliest_timestamp_wins: The change with earlier commit timestamp wins.
c) apply: Always apply the remote change.
d) skip: Remote change is skipped.
e) error: Error out on conflict. Replication is stopped, manual
action is needed.

Why not to have some support for user-defined conflict resolution
methods, allowing to do more complex stuff (e.g. merging the rows in
some way, perhaps even with datatype-specific behavior)?

Initially, for the sake of simplicity, we are targeting to support
built-in resolvers. But we have a plan to work on user-defined
resolvers as well. We shall propose that separately.

The change will be converted to 'UPDATE' and applied if the decision
is in favor of applying remote change.

It is important to have commit timestamp info available on subscriber
when latest_timestamp_wins or earliest_timestamp_wins method is chosen
as resolution method. Thus ‘track_commit_timestamp’ must be enabled
on subscriber, in absence of which, configuring the said
timestamp-based resolution methods will result in error.

Note: If the user has chosen the latest or earliest_timestamp_wins,
and the remote and local timestamps are the same, then it will go by
system identifier. The change with a higher system identifier will
win. This will ensure that the same change is picked on all the nodes.

How is this going to deal with the fact that commit LSN and timestamps
may not correlate perfectly? That is, commits may happen with LSN1 <
LSN2 but with T1 > T2.

Are you pointing to the issue where a session/txn has taken
'xactStopTimestamp' timestamp earlier but is delayed to insert record
in XLOG, while another session/txn which has taken timestamp slightly
later succeeded to insert the record IN XLOG sooner than the session1,
making LSN and Timestamps out of sync? Going by this scenario, the
commit-timestamp may not be reflective of actual commits and thus
timestamp-based resolvers may take wrong decisions. Or do you mean
something else?

If this is the problem you are referring to, then I think this needs a
fix at the publisher side. Let me think more about it . Kindly let me
know if you have ideas on how to tackle it.

UPDATE
================

Conflict Detection Method:
--------------------------------
Origin conflict detection: The ‘origin’ info is used to detect
conflict which can be obtained from commit-timestamp generated for
incoming txn at the source node. To compare remote’s origin with the
local’s origin, we must have origin information for local txns as well
which can be obtained from commit-timestamp after enabling
‘track_commit_timestamp’ locally.
The one drawback here is the ‘origin’ information cannot be obtained
once the row is frozen and the commit-timestamp info is removed by
vacuum. For a frozen row, conflicts cannot be raised, and thus the
incoming changes will be applied in all the cases.

Conflict Types:
----------------
a) update_differ: The origin of an incoming update's key row differs
from the local row i.e.; the row has already been updated locally or
by different nodes.
b) update_missing: The row with the same value as that incoming
update's key does not exist. Remote is trying to update a row which
does not exist locally.
c) update_deleted: The row with the same value as that incoming
update's key does not exist. The row is already deleted. This conflict
type is generated only if the deleted row is still detectable i.e., it
is not removed by VACUUM yet. If the row is removed by VACUUM already,
it cannot detect this conflict. It will detect it as update_missing
and will follow the default or configured resolver of update_missing
itself.

I don't understand the why should update_missing or update_deleted be
different, especially considering it's not detected reliably. And also
that even if we happen to find the row the associated TOAST data may
have already been removed. So why would this matter?

Here, we are trying to tackle the case where the row is 'recently'
deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
want to opt for a different resolution in such a case as against the
one where the corresponding row was not even present in the first
place. The case where the row was deleted long back may not fall into
this category as there are higher chances that they have been removed
by vacuum and can be considered equivalent to the update_ missing
case.

Regarding "TOAST column" for deleted row cases, we may need to dig
more. Thanks for bringing this case. Let me analyze more here.

I tested a simple case with a table with one TOAST column and found
that when a tuple with a TOAST column is deleted, both the tuple and
corresponding pg_toast entries are marked as ‘deleted’ (dead) but not
removed immediately. The main tuple and respective pg_toast entry are
permanently deleted only during vacuum. First, the main table’s dead
tuples are vacuumed, followed by the secondary TOAST relation ones (if
available).
Please let us know if you have a specific scenario in mind where the
TOAST column data is deleted immediately upon ‘delete’ operation,
rather than during vacuum, which we are missing.

Thanks,
Nisha

amit.kapila16@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#2)

Re: Conflict Detection and Resolution

On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/23/24 08:36, shveta malik wrote:

Conflict Resolution
----------------
a) latest_timestamp_wins: The change with later commit timestamp wins.
b) earliest_timestamp_wins: The change with earlier commit timestamp wins.
c) apply: Always apply the remote change.
d) skip: Remote change is skipped.
e) error: Error out on conflict. Replication is stopped, manual
action is needed.

Why not to have some support for user-defined conflict resolution
methods, allowing to do more complex stuff (e.g. merging the rows in
some way, perhaps even with datatype-specific behavior)?

The change will be converted to 'UPDATE' and applied if the decision
is in favor of applying remote change.

It is important to have commit timestamp info available on subscriber
when latest_timestamp_wins or earliest_timestamp_wins method is chosen
as resolution method. Thus ‘track_commit_timestamp’ must be enabled
on subscriber, in absence of which, configuring the said
timestamp-based resolution methods will result in error.

Note: If the user has chosen the latest or earliest_timestamp_wins,
and the remote and local timestamps are the same, then it will go by
system identifier. The change with a higher system identifier will
win. This will ensure that the same change is picked on all the nodes.

How is this going to deal with the fact that commit LSN and timestamps
may not correlate perfectly? That is, commits may happen with LSN1 <
LSN2 but with T1 > T2.

One of the possible scenarios discussed at pgconf.dev with Tomas for
this was as follows:

Say there are two publisher nodes PN1, PN2, and subscriber node SN3.
The logical replication is configured such that a subscription on SN3
has publications from both PN1 and PN2. For example, SN3 (sub) -> PN1,
PN2 (p1, p2)

Now, on PN1, we have the following operations that update the same row:

T1
Update-1 on table t1 at LSN1 (1000) on time (200)

T2
Update-2 on table t1 at LSN2 (2000) on time (100)

Then in parallel, we have the following operation on node PN2 that
updates the same row as Update-1, and Update-2 on node PN1.

T3
Update-3 on table t1 at LSN(1500) on time (150)

By theory, we can have a different state on subscribers depending on
the order of updates arriving at SN3 which shouldn't happen. Say, the
order in which they reach SN3 is: Update-1, Update-2, Update-3 then
the final row we have is by Update-3 considering we have configured
last_update_wins as a conflict resolution method. Now, consider the
other order: Update-1, Update-3, Update-2, in this case, the final
row will be by Update-2 because when we try to apply Update-3, it will
generate a conflict and as per the resolution method
(last_update_wins) we need to retain Update-1.

On further thinking, the operations on node-1 PN-1 as defined above
seem impossible because one of the Updates needs to wait for the other
to write a commit record. So the commits may happen with LSN1 < LSN2
but with T1 > T2 but they can't be on the same row due to locks. So,
the order of apply should still be consistent. Am, I missing
something?

--
With Regards,
Amit Kapila.

[1]: /messages/by-id/CAA4eK1JTMiBOoGqkt=aLPLU8Rs45ihbLhXaGHsz8XC76+OG3+Q@mail.gmail.com

amit.kapila16@gmail.com

about 2 years ago

In reply to: shveta malik (#3)

Re: Conflict Detection and Resolution

On Mon, May 27, 2024 at 11:19 AM shveta malik <shveta.malik@gmail.com> wrote:

On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Conflict Resolution
----------------
a) latest_timestamp_wins: The change with later commit timestamp wins.
b) earliest_timestamp_wins: The change with earlier commit timestamp wins.

Can you share the use case of "earliest_timestamp_wins" resolution
method? It seems after the initial update on the local node, it will
never allow remote update to succeed which sounds a bit odd. Jan has
shared this and similar concerns about this resolution method, so I
have added him to the email as well.

Conflict Types:
----------------
a) update_differ: The origin of an incoming update's key row differs
from the local row i.e.; the row has already been updated locally or
by different nodes.
b) update_missing: The row with the same value as that incoming
update's key does not exist. Remote is trying to update a row which
does not exist locally.
c) update_deleted: The row with the same value as that incoming
update's key does not exist. The row is already deleted. This conflict
type is generated only if the deleted row is still detectable i.e., it
is not removed by VACUUM yet. If the row is removed by VACUUM already,
it cannot detect this conflict. It will detect it as update_missing
and will follow the default or configured resolver of update_missing
itself.

I don't understand the why should update_missing or update_deleted be
different, especially considering it's not detected reliably. And also
that even if we happen to find the row the associated TOAST data may
have already been removed. So why would this matter?

Here, we are trying to tackle the case where the row is 'recently'
deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
want to opt for a different resolution in such a case as against the
one where the corresponding row was not even present in the first
place. The case where the row was deleted long back may not fall into
this category as there are higher chances that they have been removed
by vacuum and can be considered equivalent to the update_ missing
case.

I think to make 'update_deleted' work, we need another scan with a
different snapshot type to find the recently deleted row. I don't know
if it is a good idea to scan the index twice with different snapshots,
so for the sake of simplicity, can we consider 'updated_deleted' same
as 'update_missing'? If we think it is an important case to consider
then we can try to accomplish this once we finalize the
design/implementation of other resolution methods.

To implement the above, subscription commands will be changed to have
one more parameter 'conflict_resolution=on/off', default will be OFF.

To configure global resolvers, new DDL command will be introduced:

CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>

I very much doubt we want a single global conflict resolver, or even one
resolver per subscription. It seems like a very table-specific thing.

+1 to make it a table-level configuration but we probably need
something at the global level as well such that by default if users
don't define anything at table-level global-level configuration will
be used.

Also, doesn't all this whole design ignore the concurrency between
publishers? Isn't this problematic considering the commit timestamps may
go backwards (for a given publisher), which means the conflict
resolution is not deterministic (as it depends on how exactly it
interleaves)?

I am not able to imagine the cases you are worried about. Can you
please be specific? Is it similar to the case I described in
yesterday's email [1]/messages/by-id/CAA4eK1JTMiBOoGqkt=aLPLU8Rs45ihbLhXaGHsz8XC76+OG3+Q@mail.gmail.com?

--
With Regards,
Amit Kapila.

shveta.malik@gmail.com

about 2 years ago

In reply to: Amit Kapila (#6)

Re: Conflict Detection and Resolution

On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Conflict Resolution
----------------
a) latest_timestamp_wins: The change with later commit timestamp wins.
b) earliest_timestamp_wins: The change with earlier commit timestamp wins.

Can you share the use case of "earliest_timestamp_wins" resolution
method? It seems after the initial update on the local node, it will
never allow remote update to succeed which sounds a bit odd. Jan has
shared this and similar concerns about this resolution method, so I
have added him to the email as well.

I do not have the exact scenario for this. But I feel, if 2 nodes are
concurrently inserting different data against a primary key, then some
users may have preferences that retain the row which was inserted
earlier. It is no different from latest_timestamp_wins. It totally
depends upon what kind of application and requirement the user may
have, based on which, he may discard the later coming rows (specially
for INSERT case).

Conflict Types:
----------------
a) update_differ: The origin of an incoming update's key row differs
from the local row i.e.; the row has already been updated locally or
by different nodes.
b) update_missing: The row with the same value as that incoming
update's key does not exist. Remote is trying to update a row which
does not exist locally.
c) update_deleted: The row with the same value as that incoming
update's key does not exist. The row is already deleted. This conflict
type is generated only if the deleted row is still detectable i.e., it
is not removed by VACUUM yet. If the row is removed by VACUUM already,
it cannot detect this conflict. It will detect it as update_missing
and will follow the default or configured resolver of update_missing
itself.

I don't understand the why should update_missing or update_deleted be
different, especially considering it's not detected reliably. And also
that even if we happen to find the row the associated TOAST data may
have already been removed. So why would this matter?

Here, we are trying to tackle the case where the row is 'recently'
deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
want to opt for a different resolution in such a case as against the
one where the corresponding row was not even present in the first
place. The case where the row was deleted long back may not fall into
this category as there are higher chances that they have been removed
by vacuum and can be considered equivalent to the update_ missing
case.

I think to make 'update_deleted' work, we need another scan with a
different snapshot type to find the recently deleted row. I don't know
if it is a good idea to scan the index twice with different snapshots,
so for the sake of simplicity, can we consider 'updated_deleted' same
as 'update_missing'? If we think it is an important case to consider
then we can try to accomplish this once we finalize the
design/implementation of other resolution methods.

I think it is important for scenarios when data is being updated and
deleted concurrently. But yes, I agree that implementation may have
some performance hit for this case. We can tackle this scenario at a
later stage.

To implement the above, subscription commands will be changed to have
one more parameter 'conflict_resolution=on/off', default will be OFF.

To configure global resolvers, new DDL command will be introduced:

CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>

I very much doubt we want a single global conflict resolver, or even one
resolver per subscription. It seems like a very table-specific thing.

+1 to make it a table-level configuration but we probably need
something at the global level as well such that by default if users
don't define anything at table-level global-level configuration will
be used.

Also, doesn't all this whole design ignore the concurrency between
publishers? Isn't this problematic considering the commit timestamps may
go backwards (for a given publisher), which means the conflict
resolution is not deterministic (as it depends on how exactly it
interleaves)?

I am not able to imagine the cases you are worried about. Can you
please be specific? Is it similar to the case I described in
yesterday's email [1]?

[1] - /messages/by-id/CAA4eK1JTMiBOoGqkt=aLPLU8Rs45ihbLhXaGHsz8XC76+OG3+Q@mail.gmail.com

thanks
Shveta

houzj.fnst@fujitsu.com

about 2 years ago

In reply to: shveta malik (#7)

RE: Conflict Detection and Resolution

Hi,

This time at PGconf.dev[1]https://2024.pgconf.dev/, we had some discussions regarding this
project. The proposed approach is to split the work into two main
components. The first part focuses on conflict detection, which aims to
identify and report conflicts in logical replication. This feature will
enable users to monitor the unexpected conflicts that may occur. The
second part involves the actual conflict resolution. Here, we will provide
built-in resolutions for each conflict and allow user to choose which
resolution will be used for which conflict(as described in the initial
email of this thread).

Of course, we are open to alternative ideas and suggestions, and the
strategy above can be changed based on ongoing discussions and feedback
received.

Here is the patch of the first part work, which adds a new parameter
detect_conflict for CREATE and ALTER subscription commands. This new
parameter will decide if subscription will go for conflict detection. By
default, conflict detection will be off for a subscription.

When conflict detection is enabled, additional logging is triggered in the
following conflict scenarios:

* updating a row that was previously modified by another origin.
* The tuple to be updated is not found.
* The tuple to be deleted is not found.

While there exist other conflict types in logical replication, such as an
incoming insert conflicting with an existing row due to a primary key or
unique index, these cases already result in constraint violation errors.
Therefore, additional conflict detection for these cases is currently
omitted to minimize potential overhead. However, the pre-detection for
conflict in these error cases is still essential to support automatic
conflict resolution in the future.

[1]: https://2024.pgconf.dev/

Best Regards,
Hou zj

dilipbalaut@gmail.com

about 2 years ago

In reply to: shveta malik (#7)

Re: Conflict Detection and Resolution

On Wed, Jun 5, 2024 at 9:12 AM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Conflict Resolution
----------------
a) latest_timestamp_wins: The change with later commit timestamp wins.
b) earliest_timestamp_wins: The change with earlier commit timestamp wins.

Can you share the use case of "earliest_timestamp_wins" resolution
method? It seems after the initial update on the local node, it will
never allow remote update to succeed which sounds a bit odd. Jan has
shared this and similar concerns about this resolution method, so I
have added him to the email as well.

I do not have the exact scenario for this. But I feel, if 2 nodes are
concurrently inserting different data against a primary key, then some
users may have preferences that retain the row which was inserted
earlier. It is no different from latest_timestamp_wins. It totally
depends upon what kind of application and requirement the user may
have, based on which, he may discard the later coming rows (specially
for INSERT case).

I haven't read the complete design yet, but have we discussed how we
plan to deal with clock drift if we use timestamp-based conflict
resolution? For example, a user might insert something conflicting on
node1 first and then on node2. However, due to clock drift, the
timestamp from node2 might appear earlier. In this case, if we choose
"earliest timestamp wins," we would keep the changes from node2.

I haven't fully considered if this would cause any problems, but users
might detect this issue. For instance, a client machine might send a
change to node1 first and then, upon confirmation, send it to node2.
If the clocks on node1 and node2 are not synchronized, the changes
might appear in a different order. Does this seem like a potential
problem?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#10

dilipbalaut@gmail.com

about 2 years ago

In reply to: Amit Kapila (#6)

Re: Conflict Detection and Resolution

On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Can you share the use case of "earliest_timestamp_wins" resolution
method? It seems after the initial update on the local node, it will
never allow remote update to succeed which sounds a bit odd. Jan has
shared this and similar concerns about this resolution method, so I
have added him to the email as well.

I can not think of a use case exactly in this context but it's very
common to have such a use case while designing a distributed
application with multiple clients. For example, when we are doing git
push concurrently from multiple clients it is expected that the
earliest commit wins.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#11

[1]: /messages/by-id/CAJpy0uC4riK8e6hQt8jcU+nXYmRRjnbFEapYNbmxVYjENxTw2g@mail.gmail.com

amit.kapila16@gmail.com

about 2 years ago

In reply to: Dilip Kumar (#10)

Re: Conflict Detection and Resolution

On Wed, Jun 5, 2024 at 7:29 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Can you share the use case of "earliest_timestamp_wins" resolution
method? It seems after the initial update on the local node, it will
never allow remote update to succeed which sounds a bit odd. Jan has
shared this and similar concerns about this resolution method, so I
have added him to the email as well.

I can not think of a use case exactly in this context but it's very
common to have such a use case while designing a distributed
application with multiple clients. For example, when we are doing git
push concurrently from multiple clients it is expected that the
earliest commit wins.

Okay, I think it mostly boils down to something like what Shveta
mentioned where Inserts for a primary key can use
"earliest_timestamp_wins" resolution method [1]/messages/by-id/CAJpy0uC4riK8e6hQt8jcU+nXYmRRjnbFEapYNbmxVYjENxTw2g@mail.gmail.com. So, it seems useful
to support this method as well.

--
With Regards,
Amit Kapila.

#12

dilipbalaut@gmail.com

about 2 years ago

In reply to: Amit Kapila (#11)

Re: Conflict Detection and Resolution

On Thu, Jun 6, 2024 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 5, 2024 at 7:29 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Can you share the use case of "earliest_timestamp_wins" resolution
method? It seems after the initial update on the local node, it will
never allow remote update to succeed which sounds a bit odd. Jan has
shared this and similar concerns about this resolution method, so I
have added him to the email as well.

I can not think of a use case exactly in this context but it's very
common to have such a use case while designing a distributed
application with multiple clients. For example, when we are doing git
push concurrently from multiple clients it is expected that the
earliest commit wins.

Okay, I think it mostly boils down to something like what Shveta
mentioned where Inserts for a primary key can use
"earliest_timestamp_wins" resolution method [1]. So, it seems useful
to support this method as well.

Correct, but we still need to think about how to make it work
correctly in the presence of a clock skew as I mentioned in one of my
previous emails.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#13

nisha.moond412@gmail.com

about 2 years ago

In reply to: Dilip Kumar (#10)

Re: Conflict Detection and Resolution

On Wed, Jun 5, 2024 at 7:29 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Can you share the use case of "earliest_timestamp_wins" resolution
method? It seems after the initial update on the local node, it will
never allow remote update to succeed which sounds a bit odd. Jan has
shared this and similar concerns about this resolution method, so I
have added him to the email as well.

I can not think of a use case exactly in this context but it's very
common to have such a use case while designing a distributed
application with multiple clients. For example, when we are doing git
push concurrently from multiple clients it is expected that the
earliest commit wins.

Here are more use cases of the "earliest_timestamp_wins" resolution method:
1) Applications where the record of first occurrence of an event is
important. For example, sensor based applications like earthquake
detection systems, capturing the first seismic wave's time is crucial.
2) Scheduling systems, like appointment booking, prioritize the
earliest request when handling concurrent ones.
3) In contexts where maintaining chronological order is important -
a) Social media platforms display comments ensuring that the
earliest ones are visible first.
b) Finance transaction processing systems rely on timestamps to
prioritize the processing of transactions, ensuring that the earliest
transaction is handled first

--
Thanks,
Nisha

#14

ashutosh.bapat@enterprisedb.com

about 2 years ago

In reply to: Nisha Moond (#13)

Re: Conflict Detection and Resolution

On Thu, Jun 6, 2024 at 5:16 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Here are more use cases of the "earliest_timestamp_wins" resolution method:
1) Applications where the record of first occurrence of an event is
important. For example, sensor based applications like earthquake
detection systems, capturing the first seismic wave's time is crucial.
2) Scheduling systems, like appointment booking, prioritize the
earliest request when handling concurrent ones.
3) In contexts where maintaining chronological order is important -
a) Social media platforms display comments ensuring that the
earliest ones are visible first.
b) Finance transaction processing systems rely on timestamps to
prioritize the processing of transactions, ensuring that the earliest
transaction is handled first

Thanks for sharing examples. However, these scenarios would be handled by
the application and not during replication. What we are discussing here is
the timestamp when a row was updated/inserted/deleted (or rather when the
transaction that updated row committed/became visible) and not a DML on
column which is of type timestamp. Some implementations use a hidden
timestamp column but that's different from a user column which captures
timestamp of (say) an event. The conflict resolution will be based on the
timestamp when that column's value was recorded in the database which may
be different from the value of the column itself.

If we use the transaction commit timestamp as basis for resolution, a
transaction where multiple rows conflict may end up with different rows
affected by that transaction being resolved differently. Say three
transactions T1, T2 and T3 on separate origins with timestamps t1, t2, and
t3 respectively changed rows r1, r2 and r2, r3 and r1, r4 respectively.
Changes to r1 and r2 will conflict. Let's say T2 and T3 are applied first
and then T1 is applied. If t2 < t1 < t3, r1 will end up with version of T3
and r2 will end up with version of T1 after applying all the three
transactions. Would that introduce an inconsistency between r1 and r2?

--
Best Wishes,
Ashutosh Bapat

#15

tomas.vondra@2ndquadrant.com

about 2 years ago

In reply to: shveta malik (#3)

Re: Conflict Detection and Resolution

On 5/27/24 07:48, shveta malik wrote:

On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/23/24 08:36, shveta malik wrote:

Hello hackers,

Please find the proposal for Conflict Detection and Resolution (CDR)
for Logical replication.
<Thanks to Nisha, Hou-San, and Amit who helped in figuring out the
below details.>

Introduction
================
In case the node is subscribed to multiple providers, or when local
writes happen on a subscriber, conflicts can arise for the incoming
changes. CDR is the mechanism to automatically detect and resolve
these conflicts depending on the application and configurations.
CDR is not applicable for the initial table sync. If locally, there
exists conflicting data on the table, the table sync worker will fail.
Please find the details on CDR in apply worker for INSERT, UPDATE and
DELETE operations:

Which architecture are you aiming for? Here you talk about multiple
providers, but the wiki page mentions active-active. I'm not sure how
much this matters, but it might.

Currently, we are working for multi providers case but ideally it
should work for active-active also. During further discussion and
implementation phase, if we find that, there are cases which will not
work in straight-forward way for active-active, then our primary focus
will remain to first implement it for multiple providers architecture.

Also, what kind of consistency you expect from this? Because none of
these simple conflict resolution methods can give you the regular
consistency models we're used to, AFAICS.

Can you please explain a little bit more on this.

I was referring to the well established consistency models / isolation
levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what
guarantees the application developer can expect, what anomalies can
happen, etc.

I don't think any such isolation level can be implemented with a simple
conflict resolution methods like last-update-wins etc. For example,
consider an active-active where both nodes do

UPDATE accounts SET balance=balance+1000 WHERE id=1

This will inevitably lead to a conflict, and while the last-update-wins
resolves this "consistently" on both nodes (e.g. ending with the same
result), it's essentially a lost update.

This is a very simplistic example of course, I recall there are various
more complex examples involving foreign keys, multi-table transactions,
constraints, etc. But in principle it's a manifestation of the same
inherent limitation of conflict detection and resolution etc.

Similarly, I believe this affects not just active-active, but also the
case where one node aggregates data from multiple publishers. Maybe not
to the same extent / it might be fine for that use case, but you said
the end goal is to use this for active-active. So I'm wondering what's
the plan, there.

If I'm writing an application for active-active using this conflict
handling, what assumptions can I make? Will Can I just do stuff as if on
a single node, or do I need to be super conscious about the zillion ways
things can misbehave in a distributed system?

My personal opinion is that the closer this will be to the regular
consistency levels, the better. If past experience taught me anything,
it's very hard to predict how distributed systems with eventual
consistency behave, and even harder to actually test the application in
such environment.

In any case, if there are any differences compared to the usual
behavior, it needs to be very clearly explained in the docs.

INSERT
================
To resolve INSERT conflict on subscriber, it is important to find out
the conflicting row (if any) before we attempt an insertion. The
indexes or search preference for the same will be:
First check for replica identity (RI) index.
- if not found, check for the primary key (PK) index.
- if not found, then check for unique indexes (individual ones or
added by unique constraints)
- if unique index also not found, skip CDR

Note: if no RI index, PK, or unique index is found but
REPLICA_IDENTITY_FULL is defined, CDR will still be skipped.
The reason being that even though a row can be identified with
REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate
rows. Hence, we should not go for conflict detection in such a case.

It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is
allowed to have duplicate values? It just means the upstream is sending
the whole original row, there can still be a PK/UNIQUE index on both the
publisher and subscriber.

Yes, right. Sorry for confusion. I meant the same i.e. in absence of
'RI index, PK, or unique index', tables can have duplicates. So even
in presence of Replica-identity (FULL in this case) but in absence of
unique/primary index, CDR will be skipped for INSERT.

In case of replica identity ‘nothing’ and in absence of any suitable
index (as defined above), CDR will be skipped for INSERT.

Conflict Type:
----------------
insert_exists: A conflict is detected when the table has the same
value for a key column as the new value in the incoming row.

Conflict Resolution
----------------
a) latest_timestamp_wins: The change with later commit timestamp wins.
b) earliest_timestamp_wins: The change with earlier commit timestamp wins.
c) apply: Always apply the remote change.
d) skip: Remote change is skipped.
e) error: Error out on conflict. Replication is stopped, manual
action is needed.

Why not to have some support for user-defined conflict resolution
methods, allowing to do more complex stuff (e.g. merging the rows in
some way, perhaps even with datatype-specific behavior)?

Initially, for the sake of simplicity, we are targeting to support
built-in resolvers. But we have a plan to work on user-defined
resolvers as well. We shall propose that separately.

The change will be converted to 'UPDATE' and applied if the decision
is in favor of applying remote change.

It is important to have commit timestamp info available on subscriber
when latest_timestamp_wins or earliest_timestamp_wins method is chosen
as resolution method. Thus ‘track_commit_timestamp’ must be enabled
on subscriber, in absence of which, configuring the said
timestamp-based resolution methods will result in error.

Note: If the user has chosen the latest or earliest_timestamp_wins,
and the remote and local timestamps are the same, then it will go by
system identifier. The change with a higher system identifier will
win. This will ensure that the same change is picked on all the nodes.

How is this going to deal with the fact that commit LSN and timestamps
may not correlate perfectly? That is, commits may happen with LSN1 <
LSN2 but with T1 > T2.

Are you pointing to the issue where a session/txn has taken
'xactStopTimestamp' timestamp earlier but is delayed to insert record
in XLOG, while another session/txn which has taken timestamp slightly
later succeeded to insert the record IN XLOG sooner than the session1,
making LSN and Timestamps out of sync? Going by this scenario, the
commit-timestamp may not be reflective of actual commits and thus
timestamp-based resolvers may take wrong decisions. Or do you mean
something else?

If this is the problem you are referring to, then I think this needs a
fix at the publisher side. Let me think more about it . Kindly let me
know if you have ideas on how to tackle it.

Yes, this is the issue I'm talking about. We're acquiring the timestamp
when not holding the lock to reserve space in WAL, so the LSN and the
commit LSN may not actually correlate.

Consider this example I discussed with Amit last week:

node A:

XACT1: UPDATE t SET v = 1; LSN1 / T1

XACT2: UPDATE t SET v = 2; LSN2 / T2

node B

XACT3: UPDATE t SET v = 3; LSN3 / T3

And assume LSN1 < LSN2, T1 > T2 (i.e. the commit timestamp inversion),
and T2 < T3 < T1. Now consider that the messages may arrive in different
orders, due to async replication. Unfortunately, this would lead to
different results of the conflict resolution:

XACT1 - XACT2 - XACT3 => v=3 (T3 wins)

XACT3 - XACT1 - XACT2 => v=2 (T2 wins)

Now, I realize there's a flaw in this example - the (T1 > T2) inversion
can't actually happen, because these transactions have a dependency, and
thus won't commit concurrently. XACT1 will complete the commit, because
XACT2 starts to commit. And with monotonic clock (which is a requirement
for any timestamp-based resolution), that should guarantee (T1 < T2).

However, I doubt this is sufficient to declare victory. It's more likely
that there still are problems, but the examples are likely more complex
(changes to multiple tables, etc.).

I vaguely remember there were more issues with timestamp inversion, but
those might have been related to parallel apply etc.

UPDATE
================

Conflict Detection Method:
--------------------------------
Origin conflict detection: The ‘origin’ info is used to detect
conflict which can be obtained from commit-timestamp generated for
incoming txn at the source node. To compare remote’s origin with the
local’s origin, we must have origin information for local txns as well
which can be obtained from commit-timestamp after enabling
‘track_commit_timestamp’ locally.
The one drawback here is the ‘origin’ information cannot be obtained
once the row is frozen and the commit-timestamp info is removed by
vacuum. For a frozen row, conflicts cannot be raised, and thus the
incoming changes will be applied in all the cases.

Conflict Types:
----------------
a) update_differ: The origin of an incoming update's key row differs
from the local row i.e.; the row has already been updated locally or
by different nodes.
b) update_missing: The row with the same value as that incoming
update's key does not exist. Remote is trying to update a row which
does not exist locally.
c) update_deleted: The row with the same value as that incoming
update's key does not exist. The row is already deleted. This conflict
type is generated only if the deleted row is still detectable i.e., it
is not removed by VACUUM yet. If the row is removed by VACUUM already,
it cannot detect this conflict. It will detect it as update_missing
and will follow the default or configured resolver of update_missing
itself.

I don't understand the why should update_missing or update_deleted be
different, especially considering it's not detected reliably. And also
that even if we happen to find the row the associated TOAST data may
have already been removed. So why would this matter?

Here, we are trying to tackle the case where the row is 'recently'
deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
want to opt for a different resolution in such a case as against the
one where the corresponding row was not even present in the first
place. The case where the row was deleted long back may not fall into
this category as there are higher chances that they have been removed
by vacuum and can be considered equivalent to the update_ missing
case.

My point is that if we can't detect the difference reliably, it's not
very useful. Consider this example:

Node A:

T1: INSERT INTO t (id, value) VALUES (1,1);

T2: DELETE FROM t WHERE id = 1;

Node B:

T3: UPDATE t SET value = 2 WHERE id = 1;

The "correct" order of received messages on a third node is T1-T3-T2.
But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues
and so on. For T1-T2-T3 the right decision is to discard the update,
while for T3-T1-T2 it's to either wait for the INSERT or wait for the
insert to arrive.

But if we misdetect the situation, we either end up with a row that
shouldn't be there, or losing an update.

Regarding "TOAST column" for deleted row cases, we may need to dig
more. Thanks for bringing this case. Let me analyze more here.

Conflict Resolutions:
----------------
a) latest_timestamp_wins: The change with later commit timestamp
wins. Can be used for ‘update_differ’.
b) earliest_timestamp_wins: The change with earlier commit
timestamp wins. Can be used for ‘update_differ’.
c) apply: The remote change is always applied. Can be used for
‘update_differ’.
d) apply_or_skip: Remote change is converted to INSERT and is
applied. If the complete row cannot be constructed from the info
provided by the publisher, then the change is skipped. Can be used for
‘update_missing’ or ‘update_deleted’.
e) apply_or_error: Remote change is converted to INSERT and is
applied. If the complete row cannot be constructed from the info
provided by the publisher, then error is raised. Can be used for
‘update_missing’ or ‘update_deleted’.
f) skip: Remote change is skipped and local one is retained. Can be
used for any conflict type.
g) error: Error out of conflict. Replication is stopped, manual
action is needed. Can be used for any conflict type.

To support UPDATE CDR, the presence of either replica identity Index
or primary key is required on target node. Update CDR will not be
supported in absence of replica identity index or primary key even
though REPLICA IDENTITY FULL is set. Please refer to "UPDATE" in
"Noteworthey Scenarios" section in [1] for further details.

DELETE
================
Conflict Type:
----------------
delete_missing: An incoming delete is trying to delete a row on a
target node which does not exist.

Conflict Resolutions:
----------------
a) error : Error out on conflict. Replication is stopped, manual
action is needed.
b) skip : The remote change is skipped.

Configuring Conflict Resolution:
------------------------------------------------
There are two parts when it comes to configuring CDR:

a) Enabling/Disabling conflict detection.
b) Configuring conflict resolvers for different conflict types.

Users can sometimes create multiple subscriptions on the same node,
subscribing to different tables to improve replication performance by
starting multiple apply workers. If the tables in one subscription are
less likely to cause conflict, then it is possible that user may want
conflict detection disabled for that subscription to avoid detection
latency while enabling it for other subscriptions. This generates a
requirement to make ‘conflict detection’ configurable per
subscription. While the conflict resolver configuration can remain
global. All the subscriptions which opt for ‘conflict detection’ will
follow global conflict resolver configuration.

To implement the above, subscription commands will be changed to have
one more parameter 'conflict_resolution=on/off', default will be OFF.

To configure global resolvers, new DDL command will be introduced:

CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>

I very much doubt we want a single global conflict resolver, or even one
resolver per subscription. It seems like a very table-specific thing.

Even we thought about this. We feel that even if we go for table based
or subscription based resolvers configuration, there may be use case
scenarios where the user is not interested in configuring resolvers
for each table and thus may want to give global ones. Thus, we should
provide a way for users to do global configuration. Thus we started
with global one. I have noted your point here and would also like to
know the opinion of others. We are open to discussion. We can either
opt for any of these 2 options (global or table) or we can opt for
both global and table/sub based one.

I have no problem with a default / global conflict handler, as long as
there's a way to override this per table. This is especially important
for cases with custom conflict handler at table / column level.

Also, doesn't all this whole design ignore the concurrency between
publishers? Isn't this problematic considering the commit timestamps may
go backwards (for a given publisher), which means the conflict
resolution is not deterministic (as it depends on how exactly it
interleaves)?

-------------------------

Apart from the above three main operations and resolver configuration,
there are more conflict types like primary-key updates, multiple
unique constraints etc and some special scenarios to be considered.
Complete design details can be found in [1].

[1]: https://wiki.postgresql.org/wiki/Conflict_Detection_and_Resolution

Hmmm, not sure it's good to have a "complete" design on wiki, and only
some subset posted to the mailing list. I haven't compared what the
differences are, though.

It would have been difficult to mention all the details in email
(including examples and corner scenarios) and thus we thought that it
will be better to document everything in wiki page for the time being.
We can keep on discussing the design and all the scenarios on need
basis (before implementation phase of that part) and thus eventually
everything will come in email on hackers. With out first patch, we
plan to provide everything in a README as well.

The challenge with having this on wiki is that it's unlikely people will
notice any changes made to the wiki.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

tomas.vondra@2ndquadrant.com

about 2 years ago

In reply to: Nisha Moond (#4)

Re: Conflict Detection and Resolution

On 5/28/24 11:17, Nisha Moond wrote:

On Mon, May 27, 2024 at 11:19 AM shveta malik <shveta.malik@gmail.com> wrote:

On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

...

I don't understand the why should update_missing or update_deleted be
different, especially considering it's not detected reliably. And also
that even if we happen to find the row the associated TOAST data may
have already been removed. So why would this matter?

Here, we are trying to tackle the case where the row is 'recently'
deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
want to opt for a different resolution in such a case as against the
one where the corresponding row was not even present in the first
place. The case where the row was deleted long back may not fall into
this category as there are higher chances that they have been removed
by vacuum and can be considered equivalent to the update_ missing
case.

Regarding "TOAST column" for deleted row cases, we may need to dig
more. Thanks for bringing this case. Let me analyze more here.

I tested a simple case with a table with one TOAST column and found
that when a tuple with a TOAST column is deleted, both the tuple and
corresponding pg_toast entries are marked as ‘deleted’ (dead) but not
removed immediately. The main tuple and respective pg_toast entry are
permanently deleted only during vacuum. First, the main table’s dead
tuples are vacuumed, followed by the secondary TOAST relation ones (if
available).
Please let us know if you have a specific scenario in mind where the
TOAST column data is deleted immediately upon ‘delete’ operation,
rather than during vacuum, which we are missing.

I'm pretty sure you can vacuum the TOAST table directly, which means
you'll end up with a deleted tuple with TOAST pointers, but with the
TOAST entries already gone.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17

tomas.vondra@2ndquadrant.com

about 2 years ago

In reply to: Amit Kapila (#5)

Re: Conflict Detection and Resolution

On 6/3/24 09:30, Amit Kapila wrote:

On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/23/24 08:36, shveta malik wrote:

Conflict Resolution
----------------
a) latest_timestamp_wins: The change with later commit timestamp wins.
b) earliest_timestamp_wins: The change with earlier commit timestamp wins.
c) apply: Always apply the remote change.
d) skip: Remote change is skipped.
e) error: Error out on conflict. Replication is stopped, manual
action is needed.

Why not to have some support for user-defined conflict resolution
methods, allowing to do more complex stuff (e.g. merging the rows in
some way, perhaps even with datatype-specific behavior)?

The change will be converted to 'UPDATE' and applied if the decision
is in favor of applying remote change.

It is important to have commit timestamp info available on subscriber
when latest_timestamp_wins or earliest_timestamp_wins method is chosen
as resolution method. Thus ‘track_commit_timestamp’ must be enabled
on subscriber, in absence of which, configuring the said
timestamp-based resolution methods will result in error.

Note: If the user has chosen the latest or earliest_timestamp_wins,
and the remote and local timestamps are the same, then it will go by
system identifier. The change with a higher system identifier will
win. This will ensure that the same change is picked on all the nodes.

How is this going to deal with the fact that commit LSN and timestamps
may not correlate perfectly? That is, commits may happen with LSN1 <
LSN2 but with T1 > T2.

One of the possible scenarios discussed at pgconf.dev with Tomas for
this was as follows:

Say there are two publisher nodes PN1, PN2, and subscriber node SN3.
The logical replication is configured such that a subscription on SN3
has publications from both PN1 and PN2. For example, SN3 (sub) -> PN1,
PN2 (p1, p2)

Now, on PN1, we have the following operations that update the same row:

T1
Update-1 on table t1 at LSN1 (1000) on time (200)

T2
Update-2 on table t1 at LSN2 (2000) on time (100)

Then in parallel, we have the following operation on node PN2 that
updates the same row as Update-1, and Update-2 on node PN1.

T3
Update-3 on table t1 at LSN(1500) on time (150)

By theory, we can have a different state on subscribers depending on
the order of updates arriving at SN3 which shouldn't happen. Say, the
order in which they reach SN3 is: Update-1, Update-2, Update-3 then
the final row we have is by Update-3 considering we have configured
last_update_wins as a conflict resolution method. Now, consider the
other order: Update-1, Update-3, Update-2, in this case, the final
row will be by Update-2 because when we try to apply Update-3, it will
generate a conflict and as per the resolution method
(last_update_wins) we need to retain Update-1.

On further thinking, the operations on node-1 PN-1 as defined above
seem impossible because one of the Updates needs to wait for the other
to write a commit record. So the commits may happen with LSN1 < LSN2
but with T1 > T2 but they can't be on the same row due to locks. So,
the order of apply should still be consistent. Am, I missing
something?

Sorry, I should have read your message before responding a couple
minutes ago. I think you're right this exact example can't happen, due
to the dependency between transactions.

But as I wrote, I'm not quite convinced this means there are not other
issues with this way of resolving conflicts. It's more likely a more
complex scenario is required.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18

amit.kapila16@gmail.com

about 2 years ago

In reply to: Ashutosh Bapat (#14)

Re: Conflict Detection and Resolution

On Fri, Jun 7, 2024 at 5:39 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Jun 6, 2024 at 5:16 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Here are more use cases of the "earliest_timestamp_wins" resolution method:
1) Applications where the record of first occurrence of an event is
important. For example, sensor based applications like earthquake
detection systems, capturing the first seismic wave's time is crucial.
2) Scheduling systems, like appointment booking, prioritize the
earliest request when handling concurrent ones.
3) In contexts where maintaining chronological order is important -
a) Social media platforms display comments ensuring that the
earliest ones are visible first.
b) Finance transaction processing systems rely on timestamps to
prioritize the processing of transactions, ensuring that the earliest
transaction is handled first

Thanks for sharing examples. However, these scenarios would be handled by the application and not during replication. What we are discussing here is the timestamp when a row was updated/inserted/deleted (or rather when the transaction that updated row committed/became visible) and not a DML on column which is of type timestamp. Some implementations use a hidden timestamp column but that's different from a user column which captures timestamp of (say) an event. The conflict resolution will be based on the timestamp when that column's value was recorded in the database which may be different from the value of the column itself.

It depends on how these operations are performed. For example, the
appointment booking system could be prioritized via a transaction
updating a row with columns emp_name, emp_id, reserved, time_slot.
Now, if two employees at different geographical locations try to book
the calendar, the earlier transaction will win.

If we use the transaction commit timestamp as basis for resolution, a transaction where multiple rows conflict may end up with different rows affected by that transaction being resolved differently. Say three transactions T1, T2 and T3 on separate origins with timestamps t1, t2, and t3 respectively changed rows r1, r2 and r2, r3 and r1, r4 respectively. Changes to r1 and r2 will conflict. Let's say T2 and T3 are applied first and then T1 is applied. If t2 < t1 < t3, r1 will end up with version of T3 and r2 will end up with version of T1 after applying all the three transactions.

Are you telling the results based on latest_timestamp_wins? If so,
then it is correct. OTOH, if the user has configured
"earliest_timestamp_wins" resolution method, then we should end up
with a version of r1 from T1 because t1 < t3. Also, due to the same
reason, we should have version r2 from T2.

Would that introduce an inconsistency between r1 and r2?

As per my understanding, this shouldn't be an inconsistency. Won't it
be true even when the transactions are performed on a single node with
the same timing?

--
With Regards,
Amit Kapila.

#19

amit.kapila16@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#15)

Re: Conflict Detection and Resolution

On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/27/24 07:48, shveta malik wrote:

On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Which architecture are you aiming for? Here you talk about multiple
providers, but the wiki page mentions active-active. I'm not sure how
much this matters, but it might.

Currently, we are working for multi providers case but ideally it
should work for active-active also. During further discussion and
implementation phase, if we find that, there are cases which will not
work in straight-forward way for active-active, then our primary focus
will remain to first implement it for multiple providers architecture.

Also, what kind of consistency you expect from this? Because none of
these simple conflict resolution methods can give you the regular
consistency models we're used to, AFAICS.

Can you please explain a little bit more on this.

I was referring to the well established consistency models / isolation
levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what
guarantees the application developer can expect, what anomalies can
happen, etc.

I don't think any such isolation level can be implemented with a simple
conflict resolution methods like last-update-wins etc. For example,
consider an active-active where both nodes do

UPDATE accounts SET balance=balance+1000 WHERE id=1

This will inevitably lead to a conflict, and while the last-update-wins
resolves this "consistently" on both nodes (e.g. ending with the same
result), it's essentially a lost update.

The idea to solve such conflicts is using the delta apply technique
where the delta from both sides will be applied to the respective
columns. We do plan to target this as a separate patch. Now, if the
basic conflict resolution and delta apply both can't go in one
release, we shall document such cases clearly to avoid misuse of the
feature.

This is a very simplistic example of course, I recall there are various
more complex examples involving foreign keys, multi-table transactions,
constraints, etc. But in principle it's a manifestation of the same
inherent limitation of conflict detection and resolution etc.

Similarly, I believe this affects not just active-active, but also the
case where one node aggregates data from multiple publishers. Maybe not
to the same extent / it might be fine for that use case,

I am not sure how much it is a problem for general logical replication
solution but we do intend to work on solving such problems in
step-wise manner. Trying to attempt everything in one patch doesn't
seem advisable to me.

but you said

the end goal is to use this for active-active. So I'm wondering what's
the plan, there.

I think at this stage we are not ready for active-active because
leaving aside this feature we need many other features like
replication of all commands/objects (DDL replication, replicate large
objects, etc.), Global sequences, some sort of global two_phase
transaction management for data consistency, etc. So, it would be
better to consider logical replication cases intending to extend it
for active-active when we have other required pieces.

If I'm writing an application for active-active using this conflict
handling, what assumptions can I make? Will Can I just do stuff as if on
a single node, or do I need to be super conscious about the zillion ways
things can misbehave in a distributed system?

My personal opinion is that the closer this will be to the regular
consistency levels, the better. If past experience taught me anything,
it's very hard to predict how distributed systems with eventual
consistency behave, and even harder to actually test the application in
such environment.

I don't think in any way this can enable users to start writing
applications for active-active workloads. For something like what you
are saying, we probably need a global transaction manager (or a global
two_pc) as well to allow transactions to behave as they are on
single-node or achieve similar consistency levels. With such
transaction management, we can allow transactions to commit on a node
only when it doesn't lead to a conflict on the peer node.

In any case, if there are any differences compared to the usual
behavior, it needs to be very clearly explained in the docs.

I agree that docs should be clear about the cases that this can and
can't support.

How is this going to deal with the fact that commit LSN and timestamps
may not correlate perfectly? That is, commits may happen with LSN1 <
LSN2 but with T1 > T2.

Are you pointing to the issue where a session/txn has taken
'xactStopTimestamp' timestamp earlier but is delayed to insert record
in XLOG, while another session/txn which has taken timestamp slightly
later succeeded to insert the record IN XLOG sooner than the session1,
making LSN and Timestamps out of sync? Going by this scenario, the
commit-timestamp may not be reflective of actual commits and thus
timestamp-based resolvers may take wrong decisions. Or do you mean
something else?

If this is the problem you are referring to, then I think this needs a
fix at the publisher side. Let me think more about it . Kindly let me
know if you have ideas on how to tackle it.

Yes, this is the issue I'm talking about. We're acquiring the timestamp
when not holding the lock to reserve space in WAL, so the LSN and the
commit LSN may not actually correlate.

Consider this example I discussed with Amit last week:

node A:

XACT1: UPDATE t SET v = 1; LSN1 / T1

XACT2: UPDATE t SET v = 2; LSN2 / T2

node B

XACT3: UPDATE t SET v = 3; LSN3 / T3

And assume LSN1 < LSN2, T1 > T2 (i.e. the commit timestamp inversion),
and T2 < T3 < T1. Now consider that the messages may arrive in different
orders, due to async replication. Unfortunately, this would lead to
different results of the conflict resolution:

XACT1 - XACT2 - XACT3 => v=3 (T3 wins)

XACT3 - XACT1 - XACT2 => v=2 (T2 wins)

Now, I realize there's a flaw in this example - the (T1 > T2) inversion
can't actually happen, because these transactions have a dependency, and
thus won't commit concurrently. XACT1 will complete the commit, because
XACT2 starts to commit. And with monotonic clock (which is a requirement
for any timestamp-based resolution), that should guarantee (T1 < T2).

However, I doubt this is sufficient to declare victory. It's more likely
that there still are problems, but the examples are likely more complex
(changes to multiple tables, etc.).

Fair enough, I think we need to analyze this more to find actual
problems or in some way try to prove that there is no problem.

I vaguely remember there were more issues with timestamp inversion, but
those might have been related to parallel apply etc.

Okay, so considering there are problems due to timestamp inversion, I
think the solution to that problem would probably be somehow
generating commit LSN and timestamp in order. I don't have a solution
at this stage but will think more both on the actual problem and
solution. In the meantime, if you get a chance to refer to the place
where you have seen such a problem please try to share the same with
us. It would be helpful.

--
With Regards,
Amit Kapila.

#20

shveta.malik@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#15)

Re: Conflict Detection and Resolution

On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

UPDATE
================

Conflict Detection Method:
--------------------------------
Origin conflict detection: The ‘origin’ info is used to detect
conflict which can be obtained from commit-timestamp generated for
incoming txn at the source node. To compare remote’s origin with the
local’s origin, we must have origin information for local txns as well
which can be obtained from commit-timestamp after enabling
‘track_commit_timestamp’ locally.
The one drawback here is the ‘origin’ information cannot be obtained
once the row is frozen and the commit-timestamp info is removed by
vacuum. For a frozen row, conflicts cannot be raised, and thus the
incoming changes will be applied in all the cases.

Conflict Types:
----------------
a) update_differ: The origin of an incoming update's key row differs
from the local row i.e.; the row has already been updated locally or
by different nodes.
b) update_missing: The row with the same value as that incoming
update's key does not exist. Remote is trying to update a row which
does not exist locally.
c) update_deleted: The row with the same value as that incoming
update's key does not exist. The row is already deleted. This conflict
type is generated only if the deleted row is still detectable i.e., it
is not removed by VACUUM yet. If the row is removed by VACUUM already,
it cannot detect this conflict. It will detect it as update_missing
and will follow the default or configured resolver of update_missing
itself.

I don't understand the why should update_missing or update_deleted be
different, especially considering it's not detected reliably. And also
that even if we happen to find the row the associated TOAST data may
have already been removed. So why would this matter?

Here, we are trying to tackle the case where the row is 'recently'
deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
want to opt for a different resolution in such a case as against the
one where the corresponding row was not even present in the first
place. The case where the row was deleted long back may not fall into
this category as there are higher chances that they have been removed
by vacuum and can be considered equivalent to the update_ missing
case.

My point is that if we can't detect the difference reliably, it's not
very useful. Consider this example:

Node A:

T1: INSERT INTO t (id, value) VALUES (1,1);

T2: DELETE FROM t WHERE id = 1;

Node B:

T3: UPDATE t SET value = 2 WHERE id = 1;

The "correct" order of received messages on a third node is T1-T3-T2.
But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues
and so on. For T1-T2-T3 the right decision is to discard the update,
while for T3-T1-T2 it's to either wait for the INSERT or wait for the
insert to arrive.

But if we misdetect the situation, we either end up with a row that
shouldn't be there, or losing an update.

Doesn't the above example indicate that 'update_deleted' should also
be considered a necessary conflict type? Please see the possibilities
of conflicts in all three cases:

The "correct" order of receiving messages on node C (as suggested
above) is T1-T3-T2 (case1)
----------
T1 will insert the row.
T3 will have update_differ conflict; latest_timestamp wins or apply
will apply it. earliest_timestamp_wins or skip will skip it.
T2 will delete the row (irrespective of whether the update happened or not).
End Result: No Data.

T1-T2-T3
----------
T1 will insert the row.
T2 will delete the row.
T3 will have conflict update_deleted. If it is 'update_deleted', the
chances are that the resolver set here is to 'skip' (default is also
'skip' in this case).

If vacuum has deleted that row (or if we don't support
'update_deleted' conflict), it will be 'update_missing' conflict. In
that case, the user may end up inserting the row if resolver chosen is
in favor of apply (which seems an obvious choice for 'update_missing'
conflict; default is also 'apply_or_skip').

End result:
Row inserted with 'update_missing'.
Row correctly skipped with 'update_deleted' (assuming the obvious
choice seems to be 'skip' for update_deleted case).

So it seems that with 'update_deleted' conflict, there are higher
chances of opting for right decision here (which is to discard the
update), as 'update_deleted' conveys correct info to the user. The
'update_missing' OTOH does not convey correct info and user may end up
inserting the data by choosing apply favoring resolvers for
'update_missing'. Again, we get benefit of 'update_deleted' for
*recently* deleted rows only.

T3-T1-T2
----------
T3 may end up inserting the record if the resolver is in favor of
'apply' and all the columns are received from remote.
T1 will have' insert_exists' conflict and thus may either overwrite
'updated' values or may leave the data as is (based on whether
resolver is in favor of apply or not)
T2 will end up deleting it.
End Result: No Data.

I feel for second case (and similar cases), 'update_deleted' serves a
better conflict type.

thanks
Shveta

#21

shveta.malik@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#16)

#22

tomas.vondra@2ndquadrant.com

about 2 years ago

In reply to: Amit Kapila (#19)

#23

tomas.vondra@2ndquadrant.com

about 2 years ago

In reply to: shveta malik (#20)

#24

shveta.malik@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#23)

#25

ashutosh.bapat@enterprisedb.com

about 2 years ago

In reply to: Amit Kapila (#18)

#26

tomas.vondra@2ndquadrant.com

about 2 years ago

In reply to: shveta malik (#24)

#27

dilipbalaut@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#26)

#28

amit.kapila16@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#22)

#29

shveta.malik@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#26)

#30

tomas.vondra@2ndquadrant.com

about 2 years ago

In reply to: Dilip Kumar (#27)

#31

dilipbalaut@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#30)

#32

sawada.mshk@gmail.com

about 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#8)

#33

amit.kapila16@gmail.com

about 2 years ago

In reply to: Masahiko Sawada (#32)

#34

Peter Eisentraut

peter_e@gmx.net

about 2 years ago

In reply to: shveta malik (#1)

#35

Alvaro Herrera

alvherre@2ndquadrant.com

about 2 years ago

In reply to: Tomas Vondra (#17)

#36

Jonathan S. Katz

jkatz@postgresql.org

about 2 years ago

In reply to: Amit Kapila (#33)

#37

Robert Haas

robertmhaas@gmail.com

about 2 years ago

In reply to: shveta malik (#1)

#38

amit.kapila16@gmail.com

about 2 years ago

In reply to: Alvaro Herrera (#35)

#39

amit.kapila16@gmail.com

about 2 years ago

In reply to: Jonathan S. Katz (#36)

#40

amit.kapila16@gmail.com

about 2 years ago

In reply to: Robert Haas (#37)

#41

houzj.fnst@fujitsu.com

about 2 years ago

In reply to: Peter Eisentraut (#34)

#42

tomas.vondra@2ndquadrant.com

about 2 years ago

In reply to: Amit Kapila (#40)

#43

tomas.vondra@2ndquadrant.com

about 2 years ago

In reply to: Dilip Kumar (#31)

#44

amit.kapila16@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#42)

#45

dilipbalaut@gmail.com

about 2 years ago

In reply to: Tomas Vondra (#43)

#46

amit.kapila16@gmail.com

about 2 years ago

In reply to: Dilip Kumar (#27)

#47

Robert Haas

robertmhaas@gmail.com

about 2 years ago

In reply to: Amit Kapila (#44)

#48

houzj.fnst@fujitsu.com

about 2 years ago

In reply to: Masahiko Sawada (#32)

#49

dilipbalaut@gmail.com

about 2 years ago

In reply to: Amit Kapila (#46)

#50

dilipbalaut@gmail.com

about 2 years ago

In reply to: Dilip Kumar (#49)

#51

dilipbalaut@gmail.com

about 2 years ago

In reply to: Robert Haas (#47)

#52

amit.kapila16@gmail.com

about 2 years ago

In reply to: Dilip Kumar (#51)

#53

dilipbalaut@gmail.com

about 2 years ago

In reply to: Amit Kapila (#52)

#54

amit.kapila16@gmail.com

about 2 years ago

In reply to: Dilip Kumar (#53)

#55

shveta.malik@gmail.com

about 2 years ago

In reply to: Dilip Kumar (#50)

#56

amit.kapila16@gmail.com

about 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#48)

#57

amit.kapila16@gmail.com

about 2 years ago

In reply to: Ashutosh Bapat (#25)

#58

dilipbalaut@gmail.com

about 2 years ago

In reply to: shveta malik (#55)

#59

shveta.malik@gmail.com

about 2 years ago

In reply to: Dilip Kumar (#58)

#60

ashutosh.bapat@enterprisedb.com

about 2 years ago

In reply to: Amit Kapila (#57)

#61

dilipbalaut@gmail.com

about 2 years ago

In reply to: shveta malik (#59)

#62

amit.kapila16@gmail.com

about 2 years ago

In reply to: Ashutosh Bapat (#60)

#63

amit.kapila16@gmail.com

about 2 years ago

In reply to: Dilip Kumar (#50)

#64

ashutosh.bapat@enterprisedb.com

about 2 years ago

In reply to: Amit Kapila (#62)

#65

amit.kapila16@gmail.com

about 2 years ago

In reply to: Ashutosh Bapat (#64)

#66

shveta.malik@gmail.com

about 2 years ago

In reply to: Amit Kapila (#65)

#67

amit.kapila16@gmail.com

almost 2 years ago

In reply to: shveta malik (#66)

#68

shveta.malik@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#67)

#69

amit.kapila16@gmail.com

almost 2 years ago

In reply to: shveta malik (#68)

#70

nisha.moond412@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#69)

#71

shveta.malik@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#69)

#72

shveta.malik@gmail.com

almost 2 years ago

In reply to: Nisha Moond (#70)

#73

shveta.malik@gmail.com

almost 2 years ago

In reply to: shveta malik (#72)

#74

sawada.mshk@gmail.com

almost 2 years ago

In reply to: shveta malik (#1)

#75

itsajin@gmail.com

almost 2 years ago

In reply to: Nisha Moond (#70)

#76

sawada.mshk@gmail.com

almost 2 years ago

In reply to: shveta malik (#71)

#77

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Masahiko Sawada (#74)

#78

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Masahiko Sawada (#76)

#79

shveta.malik@gmail.com

almost 2 years ago

In reply to: Masahiko Sawada (#74)

#80

shveta.malik@gmail.com

almost 2 years ago

In reply to: Dilip Kumar (#58)

#81

dilipbalaut@gmail.com

almost 2 years ago

In reply to: shveta malik (#80)

#82

shveta.malik@gmail.com

almost 2 years ago

In reply to: Dilip Kumar (#81)

#83

dilipbalaut@gmail.com

almost 2 years ago

In reply to: shveta malik (#82)

#84

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Dilip Kumar (#83)

#85

dilipbalaut@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#84)

#86

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Dilip Kumar (#85)

#87

shveta.malik@gmail.com

almost 2 years ago

In reply to: Dilip Kumar (#83)

#88

dilipbalaut@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#86)

#89

shveta.malik@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#86)

#90

dilipbalaut@gmail.com

almost 2 years ago

In reply to: shveta malik (#87)

#91

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Dilip Kumar (#88)

#92

dilipbalaut@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#91)

#93

shveta.malik@gmail.com

almost 2 years ago

In reply to: Dilip Kumar (#90)

#94

dilipbalaut@gmail.com

almost 2 years ago

In reply to: shveta malik (#93)

#95

shveta.malik@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#84)

#96

sawada.mshk@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#78)

#97

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Dilip Kumar (#92)

#98

dilipbalaut@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#97)

#99

nisha.moond412@gmail.com

almost 2 years ago

In reply to: Dilip Kumar (#98)

#100

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Dilip Kumar (#98)

#101

dilipbalaut@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#100)

#102

nisha.moond412@gmail.com

almost 2 years ago

In reply to: Ajin Cherian (#75)

#103

houzj.fnst@fujitsu.com

almost 2 years ago

In reply to: Nisha Moond (#102)

#104

houzj.fnst@fujitsu.com

almost 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#103)

#105

shveta.malik@gmail.com

almost 2 years ago

In reply to: Nisha Moond (#102)

#106

shveta.malik@gmail.com

almost 2 years ago

In reply to: shveta malik (#105)

#107

itsajin@gmail.com

almost 2 years ago

In reply to: shveta malik (#106)

#108

shveta.malik@gmail.com

almost 2 years ago

In reply to: Ajin Cherian (#107)

#109

dilipbalaut@gmail.com

almost 2 years ago

In reply to: Ajin Cherian (#107)

#110

shveta.malik@gmail.com

almost 2 years ago

In reply to: Dilip Kumar (#109)

#111

dilipbalaut@gmail.com

almost 2 years ago

In reply to: shveta malik (#110)

#112

itsajin@gmail.com

almost 2 years ago

In reply to: shveta malik (#108)

#113

nisha.moond412@gmail.com

almost 2 years ago

In reply to: shveta malik (#1)

#114

shveta.malik@gmail.com

almost 2 years ago

In reply to: Nisha Moond (#113)

#115

shveta.malik@gmail.com

almost 2 years ago

In reply to: shveta malik (#114)

#116

smithpb2250@gmail.com

almost 2 years ago

In reply to: shveta malik (#114)

#117

shveta.malik@gmail.com

almost 2 years ago

In reply to: Nisha Moond (#113)

#118

shveta.malik@gmail.com

almost 2 years ago

In reply to: Peter Smith (#116)

#119

nisha.moond412@gmail.com

almost 2 years ago

In reply to: shveta malik (#114)

#120

amit.kapila16@gmail.com

almost 2 years ago

In reply to: shveta malik (#114)

#121

amit.kapila16@gmail.com

almost 2 years ago

In reply to: Peter Smith (#116)

#122

shveta.malik@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#120)

#123

nisha.moond412@gmail.com

almost 2 years ago

In reply to: shveta malik (#122)

#124

nisha.moond412@gmail.com

almost 2 years ago

In reply to: shveta malik (#117)

#125

nisha.moond412@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#120)

#126

nisha.moond412@gmail.com

almost 2 years ago

In reply to: shveta malik (#115)

#127

shveta.malik@gmail.com

almost 2 years ago

In reply to: Nisha Moond (#123)

#128

itsajin@gmail.com

almost 2 years ago

In reply to: shveta malik (#127)

#129

shveta.malik@gmail.com

almost 2 years ago

In reply to: Ajin Cherian (#128)

#130

shveta.malik@gmail.com

almost 2 years ago

In reply to: shveta malik (#129)

#131

shveta.malik@gmail.com

almost 2 years ago

In reply to: shveta malik (#130)

#132

amit.kapila16@gmail.com

almost 2 years ago

In reply to: shveta malik (#115)

#133

nisha.moond412@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#132)

#134

nisha.moond412@gmail.com

almost 2 years ago

In reply to: shveta malik (#115)

#135

nisha.moond412@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#120)

#136

amit.kapila16@gmail.com

almost 2 years ago

In reply to: shveta malik (#129)

#137

shveta.malik@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#136)

#138

amit.kapila16@gmail.com

almost 2 years ago

In reply to: shveta malik (#130)

#139

vignesh21@gmail.com

almost 2 years ago

In reply to: Nisha Moond (#133)

#140

vignesh21@gmail.com

almost 2 years ago

In reply to: Nisha Moond (#133)

#141

itsajin@gmail.com

almost 2 years ago

In reply to: shveta malik (#131)

#142

nisha.moond412@gmail.com

almost 2 years ago

In reply to: Amit Kapila (#132)

#143

shveta.malik@gmail.com

almost 2 years ago

In reply to: Ajin Cherian (#141)

#144

nisha.moond412@gmail.com

almost 2 years ago

In reply to: Ajin Cherian (#141)

#145

shveta.malik@gmail.com

almost 2 years ago

In reply to: shveta malik (#143)

#146

itsajin@gmail.com

almost 2 years ago

In reply to: shveta malik (#143)

#147

vignesh21@gmail.com

almost 2 years ago

In reply to: Ajin Cherian (#146)

#148

vignesh21@gmail.com

almost 2 years ago

In reply to: Ajin Cherian (#146)

#149

vignesh21@gmail.com

almost 2 years ago

In reply to: Ajin Cherian (#146)

#150

Michail Nikolaev

michail.nikolaev@gmail.com

almost 2 years ago

In reply to: vignesh C (#149)

#151

nisha.moond412@gmail.com

almost 2 years ago

In reply to: vignesh C (#148)

#152

shveta.malik@gmail.com

almost 2 years ago

In reply to: vignesh C (#149)

#153

nisha.moond412@gmail.com

almost 2 years ago

In reply to: Nisha Moond (#151)

#154

itsajin@gmail.com

almost 2 years ago

In reply to: vignesh C (#147)

#155

nisha.moond412@gmail.com

almost 2 years ago

In reply to: Nisha Moond (#151)

#156

shveta.malik@gmail.com

over 1 year ago

In reply to: Nisha Moond (#151)

#157

shveta.malik@gmail.com

over 1 year ago

In reply to: shveta malik (#156)

#158

smithpb2250@gmail.com

over 1 year ago

In reply to: Nisha Moond (#151)

#159

shveta.malik@gmail.com

over 1 year ago

In reply to: shveta malik (#157)

#160

shveta.malik@gmail.com

over 1 year ago

In reply to: Peter Smith (#158)

#161

smithpb2250@gmail.com

over 1 year ago

In reply to: shveta malik (#160)

#162

shveta.malik@gmail.com

over 1 year ago

In reply to: Peter Smith (#161)

#163

shveta.malik@gmail.com

over 1 year ago

In reply to: shveta malik (#159)

#164

smithpb2250@gmail.com

over 1 year ago

In reply to: shveta malik (#162)

#165

shveta.malik@gmail.com

over 1 year ago

In reply to: Peter Smith (#164)

#166

shveta.malik@gmail.com

over 1 year ago

In reply to: shveta malik (#165)

#167

shveta.malik@gmail.com

over 1 year ago

In reply to: shveta malik (#166)

#168

smithpb2250@gmail.com

over 1 year ago

In reply to: Peter Smith (#158)

#169

itsajin@gmail.com

over 1 year ago

In reply to: Peter Smith (#168)

#170

nisha.moond412@gmail.com

over 1 year ago

In reply to: Peter Smith (#158)

#171

nisha.moond412@gmail.com

over 1 year ago

In reply to: shveta malik (#162)

#172

nisha.moond412@gmail.com

over 1 year ago

In reply to: shveta malik (#165)

#173

shveta.malik@gmail.com

over 1 year ago

In reply to: Nisha Moond (#172)

#174

shveta.malik@gmail.com

over 1 year ago

In reply to: shveta malik (#173)

#175

nisha.moond412@gmail.com

over 1 year ago

In reply to: shveta malik (#174)

#176

smithpb2250@gmail.com

over 1 year ago

In reply to: Nisha Moond (#175)

#177

houzj.fnst@fujitsu.com

over 1 year ago

In reply to: shveta malik (#174)

#178

itsajin@gmail.com

over 1 year ago

In reply to: Peter Smith (#176)

#179

Michail Nikolaev

michail.nikolaev@gmail.com

over 1 year ago

In reply to: Zhijie Hou (Fujitsu) (#177)

#180

shveta.malik@gmail.com

over 1 year ago

In reply to: Zhijie Hou (Fujitsu) (#177)

#181

Diego Fronza

diego.fronza@percona.com

over 1 year ago

In reply to: shveta malik (#180)

#182

Michail Nikolaev

michail.nikolaev@gmail.com

over 1 year ago

In reply to: Diego Fronza (#181)

#183