Question on error code selection in conflict detection

Started by Dilip Kumar9 months ago15 messages

dilipbalaut@gmail.com

9 months ago

Hi,

I was reviewing the code for conflict reporting and became curious
about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
typically signifies a serialization failure within a transaction under
serializable isolation, so its use here for a different type of
conflict seems somewhat out of place. I did notice its use in other
contexts for recovery conflicts in physical replication, which also
struck me as a bit unusual.

Given these observations, I'm wondering if it would be more
appropriate to introduce a new, more specific error code for this
purpose?

static int
errcode_apply_conflict(ConflictType type)
{
switch (type)
{
case CT_INSERT_EXISTS:
case CT_UPDATE_EXISTS:
case CT_MULTIPLE_UNIQUE_CONFLICTS:
return errcode(ERRCODE_UNIQUE_VIOLATION);
case CT_UPDATE_ORIGIN_DIFFERS:
case CT_UPDATE_MISSING:
case CT_DELETE_ORIGIN_DIFFERS:
case CT_DELETE_MISSING:
return errcode(ERRCODE_T_R_SERIALIZATION_FAILURE);
}

Assert(false);
return 0; /* silence compiler warning */
}

--
Regards,
Dilip Kumar
Google

Robert Haas

robertmhaas@gmail.com

9 months ago

In reply to: Dilip Kumar (#1)

Re: Question on error code selection in conflict detection

On Mon, Jun 9, 2025 at 9:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I was reviewing the code for conflict reporting and became curious
about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
typically signifies a serialization failure within a transaction under
serializable isolation, so its use here for a different type of
conflict seems somewhat out of place. I did notice its use in other
contexts for recovery conflicts in physical replication, which also
struck me as a bit unusual.

Given these observations, I'm wondering if it would be more
appropriate to introduce a new, more specific error code for this
purpose?

Makes sense to me. I'm not sure if we should use a new error code or
some other existing one, but conflating other things with serializable
failures seems like a bad plan.

--
Robert Haas
EDB: http://www.enterprisedb.com

Amit Kapila

amit.kapila16@gmail.com

9 months ago

In reply to: Dilip Kumar (#1)

Re: Question on error code selection in conflict detection

On Mon, Jun 9, 2025 at 7:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I was reviewing the code for conflict reporting and became curious
about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
typically signifies a serialization failure within a transaction under
serializable isolation, so its use here for a different type of
conflict seems somewhat out of place. I did notice its use in other
contexts for recovery conflicts in physical replication, which also
struck me as a bit unusual.

Given these observations, I'm wondering if it would be more
appropriate to introduce a new, more specific error code for this
purpose?

Can we instead try to use other suitable existing error codes?

CT_UPDATE_ORIGIN_DIFFERS, CT_DELETE_ORIGIN_DIFFERS →
ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION (27000)
These represent cases where the row exists but differs from the
expected state, conceptually similar to a triggered data change
invalidating the operation.

I have also considered using ERRCODE_TRIGGERED_ACTION_EXCEPTION for
the above, but that sounds to be fit for a generic error that occurs
during the execution of a triggered action (e.g., a BEFORE or AFTER
trigger).

CT_UPDATE_MISSING, CT_DELETE_MISSING → ERRCODE_NO_DATA_FOUND (02000)
These are straightforward cases where the target row is missing,
aligning well with the standard meaning of 02000.

I don't have good ideas on the cases for physical replication, as
those seem quite different; we can consider those separately.

Thoughts?

--
With Regards,
Amit Kapila.

Dilip Kumar

dilipbalaut@gmail.com

9 months ago

In reply to: Robert Haas (#2)

Re: Question on error code selection in conflict detection

On Tue, Jun 10, 2025 at 1:25 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jun 9, 2025 at 9:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I was reviewing the code for conflict reporting and became curious
about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
typically signifies a serialization failure within a transaction under
serializable isolation, so its use here for a different type of
conflict seems somewhat out of place. I did notice its use in other
contexts for recovery conflicts in physical replication, which also
struck me as a bit unusual.

Given these observations, I'm wondering if it would be more
appropriate to introduce a new, more specific error code for this
purpose?

Makes sense to me. I'm not sure if we should use a new error code or
some other existing one, but conflating other things with serializable
failures seems like a bad plan.

Yeah we may use existing as well if we find some appropriate error codes.

--
Regards,
Dilip Kumar
Google

Dilip Kumar

dilipbalaut@gmail.com

9 months ago

In reply to: Amit Kapila (#3)

Re: Question on error code selection in conflict detection

On Tue, Jun 10, 2025 at 11:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 9, 2025 at 7:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I was reviewing the code for conflict reporting and became curious
about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
typically signifies a serialization failure within a transaction under
serializable isolation, so its use here for a different type of
conflict seems somewhat out of place. I did notice its use in other
contexts for recovery conflicts in physical replication, which also
struck me as a bit unusual.

Given these observations, I'm wondering if it would be more
appropriate to introduce a new, more specific error code for this
purpose?

Can we instead try to use other suitable existing error codes?

Yeah we can try to do that as well.

CT_UPDATE_ORIGIN_DIFFERS, CT_DELETE_ORIGIN_DIFFERS →
ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION (27000)
These represent cases where the row exists but differs from the
expected state, conceptually similar to a triggered data change
invalidating the operation.

Yeah this looks much better than what we already have.

I have also considered using ERRCODE_TRIGGERED_ACTION_EXCEPTION for
the above, but that sounds to be fit for a generic error that occurs
during the execution of a triggered action (e.g., a BEFORE or AFTER
trigger).

Right

CT_UPDATE_MISSING, CT_DELETE_MISSING → ERRCODE_NO_DATA_FOUND (02000)
These are straightforward cases where the target row is missing,
aligning well with the standard meaning of 02000.

Yeah this looks good.

I don't have good ideas on the cases for physical replication, as
those seem quite different; we can consider those separately.

Yeah we can do that separately, maybe I put more thought on that and
send my proposal.

Thoughts?

Okay I will put more thought about the proposed error code and also
see what others have to say and if we have a consensus I can provide
the patch.

--
Regards,
Dilip Kumar
Google

Dilip Kumar

dilipbalaut@gmail.com

9 months ago

In reply to: Dilip Kumar (#5)

Re: Question on error code selection in conflict detection

On Tue, Jun 10, 2025 at 12:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 10, 2025 at 11:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 9, 2025 at 7:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I was reviewing the code for conflict reporting and became curious
about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
typically signifies a serialization failure within a transaction under
serializable isolation, so its use here for a different type of
conflict seems somewhat out of place. I did notice its use in other
contexts for recovery conflicts in physical replication, which also
struck me as a bit unusual.

Given these observations, I'm wondering if it would be more
appropriate to introduce a new, more specific error code for this
purpose?

Can we instead try to use other suitable existing error codes?

Yeah we can try to do that as well.

CT_UPDATE_ORIGIN_DIFFERS, CT_DELETE_ORIGIN_DIFFERS →
ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION (27000)
These represent cases where the row exists but differs from the
expected state, conceptually similar to a triggered data change
invalidating the operation.

Yeah this looks much better than what we already have.

I have also considered using ERRCODE_TRIGGERED_ACTION_EXCEPTION for
the above, but that sounds to be fit for a generic error that occurs
during the execution of a triggered action (e.g., a BEFORE or AFTER
trigger).

Right

CT_UPDATE_MISSING, CT_DELETE_MISSING → ERRCODE_NO_DATA_FOUND (02000)
These are straightforward cases where the target row is missing,
aligning well with the standard meaning of 02000.

Yeah this looks good.

I don't have good ideas on the cases for physical replication, as
those seem quite different; we can consider those separately.

Yeah we can do that separately, maybe I put more thought on that and
send my proposal.

Thoughts?

Okay I will put more thought about the proposed error code and also
see what others have to say and if we have a consensus I can provide
the patch.

After reviewing other error codes, these appear to be the most
relevant in this context. PFA patch for the same, I will analyze the
error code in physical replication recovery conflict and propose what
is relevant there in a separate patch.

--
Regards,
Dilip Kumar
Google

shveta malik

shveta.malik@gmail.com

9 months ago

In reply to: Dilip Kumar (#6)

Re: Question on error code selection in conflict detection

On Wed, Jun 11, 2025 at 11:05 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 10, 2025 at 12:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 10, 2025 at 11:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 9, 2025 at 7:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I was reviewing the code for conflict reporting and became curious
about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
typically signifies a serialization failure within a transaction under
serializable isolation, so its use here for a different type of
conflict seems somewhat out of place. I did notice its use in other
contexts for recovery conflicts in physical replication, which also
struck me as a bit unusual.

Given these observations, I'm wondering if it would be more
appropriate to introduce a new, more specific error code for this
purpose?

Can we instead try to use other suitable existing error codes?

Yeah we can try to do that as well.

CT_UPDATE_ORIGIN_DIFFERS, CT_DELETE_ORIGIN_DIFFERS →
ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION (27000)
These represent cases where the row exists but differs from the
expected state, conceptually similar to a triggered data change
invalidating the operation.

Yeah this looks much better than what we already have.

I have also considered using ERRCODE_TRIGGERED_ACTION_EXCEPTION for
the above, but that sounds to be fit for a generic error that occurs
during the execution of a triggered action (e.g., a BEFORE or AFTER
trigger).

Right

CT_UPDATE_MISSING, CT_DELETE_MISSING → ERRCODE_NO_DATA_FOUND (02000)
These are straightforward cases where the target row is missing,
aligning well with the standard meaning of 02000.

Yeah this looks good.

I don't have good ideas on the cases for physical replication, as
those seem quite different; we can consider those separately.

Yeah we can do that separately, maybe I put more thought on that and
send my proposal.

Thoughts?

Okay I will put more thought about the proposed error code and also
see what others have to say and if we have a consensus I can provide
the patch.

After reviewing other error codes, these appear to be the most
relevant in this context.

+1. On digging deep, among existing codes, these are the most relevant ones.

PFA patch for the same, I will analyze the
error code in physical replication recovery conflict and propose what
is relevant there in a separate patch.

The patch looks good.

thanks
Shveta

Robert Haas

robertmhaas@gmail.com

9 months ago

In reply to: Amit Kapila (#3)

Re: Question on error code selection in conflict detection

On Tue, Jun 10, 2025 at 2:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Can we instead try to use other suitable existing error codes?

Why?

I mean, I'm not 100% against using existing error codes, but I feel
like we might be distorting the meanings of the existing error codes.
If we used new error codes, then people could test for those and know
that they would get exactly these conditions and nothing else.

--
Robert Haas
EDB: http://www.enterprisedb.com

Dilip Kumar

dilipbalaut@gmail.com

9 months ago

In reply to: Robert Haas (#8)

Re: Question on error code selection in conflict detection

On Wed, Jun 11, 2025 at 7:33 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jun 10, 2025 at 2:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Can we instead try to use other suitable existing error codes?

Why?

I mean, I'm not 100% against using existing error codes, but I feel
like we might be distorting the meanings of the existing error codes.
If we used new error codes, then people could test for those and know
that they would get exactly these conditions and nothing else.

To enhance the clarity and specificity of our error reporting,
particularly for logical replication conflicts, I suggest we consider
defining a dedicated class of error codes, much like we have for FDWs.
IMHO this would be a more future-proof approach, given the potential
for many new conflict detection types in the future.

--
Regards,
Dilip Kumar
Google

#10

Amit Kapila

amit.kapila16@gmail.com

9 months ago

In reply to: Robert Haas (#8)

Re: Question on error code selection in conflict detection

On Wed, Jun 11, 2025 at 7:33 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jun 10, 2025 at 2:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Can we instead try to use other suitable existing error codes?

Why?

I mean, I'm not 100% against using existing error codes, but I feel
like we might be distorting the meanings of the existing error codes.
If we used new error codes, then people could test for those and know
that they would get exactly these conditions and nothing else.

I am okay with introducing a new error code as well. However, for some
cases like UPDATE_MISSING, DELETE_MISSING, the existing code
ERRCODE_NO_DATA_FOUND seems to be an exact match. The LOG message
appears when we don't find the row to be updated or deleted while
applying changes. This can happen if someone deletes the required rows
on the subscriber. This case is similar to unique_key_violation where
we report ERRCODE_UNIQUE_VIOLATION when, during apply, we found the
row with the same key exists (for example, cases like INSERT_EXISTS or
UPDATE_EXISTS). So, I can't think of a reason to use a different
error_code for these cases.

Now, the error_code proposed for the other two cases
UPDATE_ORIGIN_DIFFERS, DELETE_ORIGIN_DIFFERS is debatable. The
situation in these cases is that the row exists but differs from the
expected state (already changed by a different origin). As of now, for
these cases, we LOG the message and update the existing rows. I
thought of using ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION because we
are using it for the case where the tuple to be updated is already
updated/deleted, which is also the case here, but the reason for the
'already update/delete' is different. So, this is not a perfect match,
but it is worth considering.

The other code worth considering is
ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE for UPDATE_ORIGIN_DIFFERS and
DELETE_ORIGIN_DIFFERS cases. We use this error code when we try to
perform some operation, but the required object (say a tuple) is not
in the expected state. Currently, we use it for tuples in the
following cases:

ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("attempted to delete invisible tuple")));

ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("attempted to update invisible tuple")));

ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg_internal("attempted to overwrite invisible tuple")));

ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("tuple to be updated was already modified by an
operation triggered by the current command")));

As mentioned above, the current case under discussion also falls in
the same category, where the tuple to be updated/deleted is not in an
expected state.

--
With Regards,
Amit Kapila.

#11

shveta malik

shveta.malik@gmail.com

9 months ago

In reply to: Amit Kapila (#10)

Re: Question on error code selection in conflict detection

On Thu, Jun 12, 2025 at 12:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 11, 2025 at 7:33 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jun 10, 2025 at 2:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Can we instead try to use other suitable existing error codes?

Why?

I mean, I'm not 100% against using existing error codes, but I feel
like we might be distorting the meanings of the existing error codes.
If we used new error codes, then people could test for those and know
that they would get exactly these conditions and nothing else.

I am okay with introducing a new error code as well. However, for some
cases like UPDATE_MISSING, DELETE_MISSING, the existing code
ERRCODE_NO_DATA_FOUND seems to be an exact match. The LOG message
appears when we don't find the row to be updated or deleted while
applying changes. This can happen if someone deletes the required rows
on the subscriber. This case is similar to unique_key_violation where
we report ERRCODE_UNIQUE_VIOLATION when, during apply, we found the
row with the same key exists (for example, cases like INSERT_EXISTS or
UPDATE_EXISTS). So, I can't think of a reason to use a different
error_code for these cases.

Now, the error_code proposed for the other two cases
UPDATE_ORIGIN_DIFFERS, DELETE_ORIGIN_DIFFERS is debatable. The
situation in these cases is that the row exists but differs from the
expected state (already changed by a different origin). As of now, for
these cases, we LOG the message and update the existing rows. I
thought of using ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION because we
are using it for the case where the tuple to be updated is already
updated/deleted, which is also the case here, but the reason for the
'already update/delete' is different. So, this is not a perfect match,
but it is worth considering.

The other code worth considering is
ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE for UPDATE_ORIGIN_DIFFERS and
DELETE_ORIGIN_DIFFERS cases. We use this error code when we try to
perform some operation, but the required object (say a tuple) is not
in the expected state. Currently, we use it for tuples in the
following cases:

ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("attempted to delete invisible tuple")));

ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("attempted to update invisible tuple")));

ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg_internal("attempted to overwrite invisible tuple")));

ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("tuple to be updated was already modified by an
operation triggered by the current command")));

As mentioned above, the current case under discussion also falls in
the same category, where the tuple to be updated/deleted is not in an
expected state.

We could consider introducing new error codes for origin_differs cases
in the following classes:

1) Class 55 — Object Not In Prerequisite State.
This class currently includes errors like
object_not_in_prerequisite_state, object_in_use,
unsafe_new_enum_value_usage etc. We can consider adding origin_differs
to this list.

2) Class 42 — Syntax Error or Access Rule Violation. This class
currently includes errors like datatype_mismatch, collation_mismatch,
wrong_object_type etc. We can consider adding origin_mismatch to this
list.

Please note that while Class 42 is a SQL-defined class (as indicated
in errcodes.txt, see [1]# Classes that begin with 0-4 or A-H are defined by the # standard. Within such a class, subclass values defined by the # standard must begin with 0-4 or A-H. To define a new error code, # ensure that it is either in an "implementation-defined class" (it # begins with 5-9 or I-Z), or its subclass falls outside the range of # error codes that could be present in future versions of the # standard (i.e. the subclass value begins with 5-9 or I-Z).), it does allow for implementation-defined
additions. So adding origin_mismatch should be acceptable under those
provisions.

Thoughts?

[1]: # Classes that begin with 0-4 or A-H are defined by the # standard. Within such a class, subclass values defined by the # standard must begin with 0-4 or A-H. To define a new error code, # ensure that it is either in an "implementation-defined class" (it # begins with 5-9 or I-Z), or its subclass falls outside the range of # error codes that could be present in future versions of the # standard (i.e. the subclass value begins with 5-9 or I-Z).
# Classes that begin with 0-4 or A-H are defined by the
# standard. Within such a class, subclass values defined by the
# standard must begin with 0-4 or A-H. To define a new error code,
# ensure that it is either in an "implementation-defined class" (it
# begins with 5-9 or I-Z), or its subclass falls outside the range of
# error codes that could be present in future versions of the
# standard (i.e. the subclass value begins with 5-9 or I-Z).

thanks
Shveta

#12

Robert Haas

robertmhaas@gmail.com

9 months ago

In reply to: Amit Kapila (#10)

Re: Question on error code selection in conflict detection

On Thu, Jun 12, 2025 at 3:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

cases like UPDATE_MISSING, DELETE_MISSING, the existing code
ERRCODE_NO_DATA_FOUND seems to be an exact match. The LOG message
appears when we don't find the row to be updated or deleted while
applying changes. This can happen if someone deletes the required rows
on the subscriber. This case is similar to unique_key_violation where
we report ERRCODE_UNIQUE_VIOLATION when, during apply, we found the
row with the same key exists (for example, cases like INSERT_EXISTS or
UPDATE_EXISTS). So, I can't think of a reason to use a different
error_code for these cases.

Well, ERRCODE_NO_DATA_FOUND is "Class P0 - PL/pgSQL Error," and it
normally occurs when STRICT was used to say that SELECT INTO should
return exactly one row. This is a completely different part of the
system and a completely different situation. I see that one use of
ERRCODE_NO_DATA_FOUND has also found its way into tablecmds.c, but
that is probably also a mistake that should be fixed.

Now, the error_code proposed for the other two cases
UPDATE_ORIGIN_DIFFERS, DELETE_ORIGIN_DIFFERS is debatable. The
situation in these cases is that the row exists but differs from the
expected state (already changed by a different origin). As of now, for
these cases, we LOG the message and update the existing rows. I
thought of using ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION because we
are using it for the case where the tuple to be updated is already
updated/deleted, which is also the case here, but the reason for the
'already update/delete' is different. So, this is not a perfect match,
but it is worth considering.

This error is currently used for situations involving triggers, which
may be why it has TRIGGERED in the name. This situation is different.

The other code worth considering is
ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE for UPDATE_ORIGIN_DIFFERS and
DELETE_ORIGIN_DIFFERS cases. We use this error code when we try to
perform some operation, but the required object (say a tuple) is not
in the expected state. Currently, we use it for tuples in the
following cases:

The point is that we shouldn't just consider the name of the error
code. We should consider more broadly: what is the error code
category? How is the error code currently used and is this consistent?
People want to be able to filter logs by error code to find the events
that they care about, and if you just lump a bunch of quasi-related
things under one error code because technically the wording of the
error code would fit the situation, then they can't do that.

--
Robert Haas
EDB: http://www.enterprisedb.com

#13

Dilip Kumar

dilipbalaut@gmail.com

9 months ago

In reply to: Dilip Kumar (#9)

Re: Question on error code selection in conflict detection

On Wed, Jun 11, 2025 at 8:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Jun 11, 2025 at 7:33 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jun 10, 2025 at 2:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Can we instead try to use other suitable existing error codes?

Why?

I mean, I'm not 100% against using existing error codes, but I feel
like we might be distorting the meanings of the existing error codes.
If we used new error codes, then people could test for those and know
that they would get exactly these conditions and nothing else.

To enhance the clarity and specificity of our error reporting,
particularly for logical replication conflicts, I suggest we consider
defining a dedicated class of error codes, much like we have for FDWs.
IMHO this would be a more future-proof approach, given the potential
for many new conflict detection types in the future.

So here is my proposal for adding this new class of error codes and
also the error codes as listed below. I have also changed the same in
the patch. Let me know your thoughts?

Section: Class LC - Logical replication conflict logging Error

# (PostgreSQL-specific error class)
LC001 E ERRCODE_L_R_APPLY_CONFLICT_UNIQUE_KEY_CONFLICT
unique_key_conflict
LC002 E ERRCODE_L_R_APPLY_CONFLICT_TARGET_ROW_MISSING
target_row_missing
LC003 E ERRCODE_L_R_APPLY_CONFLICT_ORIGIN_DIFFER
target_row_origin_differ

--
Regards,
Dilip Kumar
Google

#14

Amit Kapila

amit.kapila16@gmail.com

9 months ago

In reply to: Dilip Kumar (#1)

Re: Question on error code selection in conflict detection

On Mon, Jun 9, 2025 at 7:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I was reviewing the code for conflict reporting and became curious
about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
typically signifies a serialization failure within a transaction under
serializable isolation, so its use here for a different type of
conflict seems somewhat out of place. I did notice its use in other
contexts for recovery conflicts in physical replication, which also
struck me as a bit unusual.

I examined the usage of ERRCODE_T_R_SERIALIZATION_FAILURE in the code,
and it appears that, apart from logical and recovery conflicts, it is
also used in places other than the cases under the serializable
isolation level. For example,

if (ItemPointerIndicatesMovedPartitions(tid))
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("tuple to be locked was already moved to another partition due
to concurrent update")));

if (ItemPointerIndicatesMovedPartitions(&context->tmfd.ctid))
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("tuple to be merged was already moved to another partition due
to concurrent update")));

Then it is also used in ExecCheckTupleVisible and execReplication.c
without serializable isolation. Seeing this, it seems SERIALIZATION in
the errod_code refers to the case where the transaction could not be
serialized due to concurrent access conflicts. Now, we typically use
it for serializable isolation, but its use at other places where
transactions attempt to modify the same data simultaneously doesn't
seem unreasonable.

Given these observations, I'm wondering if it would be more
appropriate to introduce a new, more specific error code for this
purpose?

If we want to have separate errorcode(s) for logical and recovery
conflicts, one possibility is to add new codes in Class 55 (Object Not
In Prerequisite State (class borrowed from DB2)).

ERRCODE_CONFLICT_CANT_ACCESS: all recovery conflicts, delete_missing,
and update_missing
ERRCODE_CONFLICT_CANT_MODIFY : update_origin_differ and delete_origin_differ

The reason I thought class 55 is suitable because it has other codes
like ERRCODE_CANT_CHANGE_RUNTIME_PARAM,
ERRCODE_UNSAFE_NEW_ENUM_VALUE_USAGE,
ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE, which to me appear to have
some similarity. I have checked the same class in DB2 [1]https://www.ibm.com/docs/en/db2-for-zos/12.0.0?topic=codes-sqlstate-values-common-error#db2z_sqlstatevalues__classcode55 and it had a
few related codes like:
55019 The object is in an invalid state for the operation.
55079 The operation cannot be performed because the XML column is not
in the versioning format.

[1]: https://www.ibm.com/docs/en/db2-for-zos/12.0.0?topic=codes-sqlstate-values-common-error#db2z_sqlstatevalues__classcode55

--
With Regards,
Amit Kapila.

#15

Amit Kapila

amit.kapila16@gmail.com

9 months ago

In reply to: Robert Haas (#12)

Re: Question on error code selection in conflict detection

On Thu, Jun 12, 2025 at 6:56 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 12, 2025 at 3:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

cases like UPDATE_MISSING, DELETE_MISSING, the existing code
ERRCODE_NO_DATA_FOUND seems to be an exact match. The LOG message
appears when we don't find the row to be updated or deleted while
applying changes. This can happen if someone deletes the required rows
on the subscriber. This case is similar to unique_key_violation where
we report ERRCODE_UNIQUE_VIOLATION when, during apply, we found the
row with the same key exists (for example, cases like INSERT_EXISTS or
UPDATE_EXISTS). So, I can't think of a reason to use a different
error_code for these cases.

Well, ERRCODE_NO_DATA_FOUND is "Class P0 - PL/pgSQL Error," and it
normally occurs when STRICT was used to say that SELECT INTO should
return exactly one row. This is a completely different part of the
system and a completely different situation. I see that one use of
ERRCODE_NO_DATA_FOUND has also found its way into tablecmds.c, but
that is probably also a mistake that should be fixed.

Right, and I agree we should work on changing the use of error code
ERRCODE_NO_DATA_FOUND in tablecmds.c.

--
With Regards,
Amit Kapila.

Question on error code selection in conflict detection

Attachments:

Attachments: