Error code for "terminating connection due to conflict with recovery"
While looking at the backend code, I realized that error code for
"terminating connection due to conflict with recovery" is
ERRCODE_ADMIN_SHUTDOWN.
I thought the error code is for somewhat a human interruption, such as
shutdown command issued by pg_ctl. Is the usage of the error code
appropreate?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
On Sat, Jan 8, 2011 at 9:52 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
While looking at the backend code, I realized that error code for
"terminating connection due to conflict with recovery" is
ERRCODE_ADMIN_SHUTDOWN.I thought the error code is for somewhat a human interruption, such as
shutdown command issued by pg_ctl. Is the usage of the error code
appropreate?
That doesn't sound right to me. I'd have thought something in class 40.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, Jan 8, 2011 at 9:52 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
While looking at the backend code, I realized that error code for
"terminating connection due to conflict with recovery" is
ERRCODE_ADMIN_SHUTDOWN.I thought the error code is for somewhat a human interruption, such as
shutdown command issued by pg_ctl. Is the usage of the error code
appropreate?That doesn't sound right to me. I'd have thought something in class 40.
What about:
40004 CONFLICT WITH RECOVERY conflict_with_recovery
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
On Tue, Jan 11, 2011 at 1:30 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
On Sat, Jan 8, 2011 at 9:52 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
While looking at the backend code, I realized that error code for
"terminating connection due to conflict with recovery" is
ERRCODE_ADMIN_SHUTDOWN.I thought the error code is for somewhat a human interruption, such as
shutdown command issued by pg_ctl. Is the usage of the error code
appropreate?That doesn't sound right to me. I'd have thought something in class 40.
What about:
40004 CONFLICT WITH RECOVERY conflict_with_recovery
We should respect the following convention, from errcodes.h:
* The convention is that new error codes defined by PostgreSQL in a
* class defined by the standard have a subclass value that begins
* with 'P'. In addition, error codes defined by PostgreSQL clients
* (such as ecpg) have a class value that begins with 'Y'.
And don't forget there are three places where the new error code would
need to be added.
Otherwise, +1.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
That doesn't sound right to me. I'd have thought something in class 40.
What about:
40004 CONFLICT WITH RECOVERY conflict_with_recovery
We should respect the following convention, from errcodes.h:
* The convention is that new error codes defined by PostgreSQL in a
* class defined by the standard have a subclass value that begins
* with 'P'. In addition, error codes defined by PostgreSQL clients
* (such as ecpg) have a class value that begins with 'Y'.And don't forget there are three places where the new error code would
need to be added.Otherwise, +1.
Ok. Here is the patch for this. I use 40P02, instead of 40004.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
Attachments:
sr_error_code.patchtext/x-patch; charset=us-asciiDownload+12-2
On Thu, Jan 13, 2011 at 2:13 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
Ok. Here is the patch for this. I use 40P02, instead of 40004.
Please add this to the currently open CommitFest:
https://commitfest.postgresql.org/action/commitfest_view/open
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Jan 13, 2011 at 2:13 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
Ok. Here is the patch for this. I use 40P02, instead of 40004.
Please add this to the currently open CommitFest:
https://commitfest.postgresql.org/action/commitfest_view/open
Done. Comments are welcome. Unless there's objection, I will commit it
this weekend.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
Tatsuo Ishii <ishii@postgresql.org> writes:
Please add this to the currently open CommitFest:
https://commitfest.postgresql.org/action/commitfest_view/open
Done. Comments are welcome. Unless there's objection, I will commit it
this weekend.
If you're expecting anyone to actually *review* it during the CF,
that's a bit premature.
regards, tom lane
Tatsuo Ishii <ishii@postgresql.org> writes:
Please add this to the currently open CommitFest:
https://commitfest.postgresql.org/action/commitfest_view/openDone. Comments are welcome. Unless there's objection, I will commit it
this weekend.If you're expecting anyone to actually *review* it during the CF,
that's a bit premature.
No problem to wait for longer. I will wait by the end of January for
the present.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
On Thu, Jan 13, 2011 at 7:31 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:
Tatsuo Ishii <ishii@postgresql.org> writes:
Please add this to the currently open CommitFest:
https://commitfest.postgresql.org/action/commitfest_view/openDone. Comments are welcome. Unless there's objection, I will commit it
this weekend.If you're expecting anyone to actually *review* it during the CF,
that's a bit premature.No problem to wait for longer. I will wait by the end of January for
the present.
Review:
The only possible point of concern I see here is the naming of the C
identifier. Everything else in class 40 uses ERRCODE_T_R_whatever,
with T_R standing for transaction rollback. It's not obvious to me
that that convention has any real value, but perhaps we ought to
follow it here for the sake of consistency?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Review:
The only possible point of concern I see here is the naming of the C
identifier. Everything else in class 40 uses ERRCODE_T_R_whatever,
with T_R standing for transaction rollback. It's not obvious to me
that that convention has any real value, but perhaps we ought to
follow it here for the sake of consistency?
Yeah. Actually at first I used "T_R" convention. After a few seconds
thought, I realized that "T_R" is not appropreate by the same reason
you feel. Possible other argument might be "Terminating connection
always involves transaction rollback. So using T_R is ok". I'm not
sure this argument is reasonable enough though.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
On Fri, Jan 14, 2011 at 10:53 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
Review:
The only possible point of concern I see here is the naming of the C
identifier. Everything else in class 40 uses ERRCODE_T_R_whatever,
with T_R standing for transaction rollback. It's not obvious to me
that that convention has any real value, but perhaps we ought to
follow it here for the sake of consistency?Yeah. Actually at first I used "T_R" convention. After a few seconds
thought, I realized that "T_R" is not appropreate by the same reason
you feel. Possible other argument might be "Terminating connection
always involves transaction rollback. So using T_R is ok". I'm not
sure this argument is reasonable enough though.
Looking at this a bit more carefully, I notice that there are two
cases when a recovery conflict occurs:
- we cancel the currently running statement, or
- we kill the whole connection
Should those use the same error code, or different ones? This patch
doesn't appear to update all the places where recovery conflicts can
occur, which is probably not ideal.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Tatsuo Ishii <ishii@postgresql.org> writes:
Review:
The only possible point of concern I see here is the naming of the C
identifier. Everything else in class 40 uses ERRCODE_T_R_whatever,
with T_R standing for transaction rollback. It's not obvious to me
that that convention has any real value, but perhaps we ought to
follow it here for the sake of consistency?
Yeah. Actually at first I used "T_R" convention. After a few seconds
thought, I realized that "T_R" is not appropreate by the same reason
you feel. Possible other argument might be "Terminating connection
always involves transaction rollback. So using T_R is ok". I'm not
sure this argument is reasonable enough though.
This is not only a matter of some macro name or other. According to the
SQL standard, class 40 itself is defined as "transaction rollback".
If the error condition can't reasonably be regarded as a subcase of
that, you're making a bad choice of SQLSTATE code.
regards, tom lane
On Fri, 2011-01-14 at 12:04 -0500, Tom Lane wrote:
Tatsuo Ishii <ishii@postgresql.org> writes:
Review:
The only possible point of concern I see here is the naming of the C
identifier. Everything else in class 40 uses ERRCODE_T_R_whatever,
with T_R standing for transaction rollback. It's not obvious to me
that that convention has any real value, but perhaps we ought to
follow it here for the sake of consistency?Yeah. Actually at first I used "T_R" convention. After a few seconds
thought, I realized that "T_R" is not appropreate by the same reason
you feel. Possible other argument might be "Terminating connection
always involves transaction rollback. So using T_R is ok". I'm not
sure this argument is reasonable enough though.This is not only a matter of some macro name or other. According to the
SQL standard, class 40 itself is defined as "transaction rollback".
If the error condition can't reasonably be regarded as a subcase of
that, you're making a bad choice of SQLSTATE code.
This whole thing is confused. No change is appropriate here at all.
We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for
recovery conflicts.
We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable,
which occurs if someone drops the database that the user was connected
to when they get kicked off. That code exists specifically to inform the
user that they *cannot* reconnect. So pgpool should not be trying to
trap that error and reconnect.
So the whole basis for the existence of this patch is moot.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Fri, Jan 14, 2011 at 12:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
This whole thing is confused. No change is appropriate here at all.
We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for
recovery conflicts.We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable,
which occurs if someone drops the database that the user was connected
to when they get kicked off. That code exists specifically to inform the
user that they *cannot* reconnect. So pgpool should not be trying to
trap that error and reconnect.
CheckRecoveryConflictDeadlock() uses ERRCODE_QUERY_CANCELLED.
ProcessInterrupts() sometimes uses ERRCODE_T_R_SERIALIZATION_FAILURE
and sometimes uses ERRCODE_ADMIN_SHUTDOWN. It seems to me that it
wouldn't be a bad thing to be a bit more consistent, and perhaps to
have dedicated error codes for recovery conflicts. This bit strikes
me as particularly strange:
else if (RecoveryConflictPending && RecoveryConflictRetryable)
{
pgstat_report_recovery_conflict(RecoveryConflictReason);
ereport(FATAL,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("terminating connection due to
conflict with recovery"),
errdetail_recovery_conflict()));
}
else if (RecoveryConflictPending)
{
pgstat_report_recovery_conflict(RecoveryConflictReason);
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating connection due to
conflict with recovery"),
errdetail_recovery_conflict()));
}
That's the same error message at the same severity level with two
different SQLSTATEs depending on RecoveryConflictRetryable. Seems a
bit cryptic.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Jan 14, 2011 at 1:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jan 14, 2011 at 12:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
This whole thing is confused. No change is appropriate here at all.
We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for
recovery conflicts.We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable,
which occurs if someone drops the database that the user was connected
to when they get kicked off. That code exists specifically to inform the
user that they *cannot* reconnect. So pgpool should not be trying to
trap that error and reconnect.CheckRecoveryConflictDeadlock() uses ERRCODE_QUERY_CANCELLED.
ProcessInterrupts() sometimes uses ERRCODE_T_R_SERIALIZATION_FAILURE
and sometimes uses ERRCODE_ADMIN_SHUTDOWN. It seems to me that it
wouldn't be a bad thing to be a bit more consistent, and perhaps to
have dedicated error codes for recovery conflicts. This bit strikes
me as particularly strange:else if (RecoveryConflictPending && RecoveryConflictRetryable)
{
pgstat_report_recovery_conflict(RecoveryConflictReason);
ereport(FATAL,(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("terminating connection due to
conflict with recovery"),
errdetail_recovery_conflict()));
}
else if (RecoveryConflictPending)
{
pgstat_report_recovery_conflict(RecoveryConflictReason);
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating connection due to
conflict with recovery"),
errdetail_recovery_conflict()));
}That's the same error message at the same severity level with two
different SQLSTATEs depending on RecoveryConflictRetryable. Seems a
bit cryptic.
So what we do want to do about this?
I'm pretty well convinced that we should NOT be issuing
ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be
fixed by a trivial simplification of the code posted above, without
introducing any new error code.
I'd also be in favor of changing the one that uses
ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as
the former might be taken to imply active user intervention, and for
consistency.
It's no longer clear to me that we actually need a new error code for
this - using the same one everywhere seems like it might be
sufficient, unless someone wants to make an argument why it isn't.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Jan 14, 2011 at 1:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jan 14, 2011 at 12:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
This whole thing is confused. No change is appropriate here at all.
We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for
recovery conflicts.We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable,
which occurs if someone drops the database that the user was connected
to when they get kicked off. That code exists specifically to inform the
user that they *cannot* reconnect. So pgpool should not be trying to
trap that error and reconnect.CheckRecoveryConflictDeadlock() uses ERRCODE_QUERY_CANCELLED.
ProcessInterrupts() sometimes uses ERRCODE_T_R_SERIALIZATION_FAILURE
and sometimes uses ERRCODE_ADMIN_SHUTDOWN. It seems to me that it
wouldn't be a bad thing to be a bit more consistent, and perhaps to
have dedicated error codes for recovery conflicts. This bit strikes
me as particularly strange:else if (RecoveryConflictPending && RecoveryConflictRetryable)
{
pgstat_report_recovery_conflict(RecoveryConflictReason);
ereport(FATAL,(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("terminating connection due to
conflict with recovery"),
errdetail_recovery_conflict()));
}
else if (RecoveryConflictPending)
{
pgstat_report_recovery_conflict(RecoveryConflictReason);
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating connection due to
conflict with recovery"),
errdetail_recovery_conflict()));
}That's the same error message at the same severity level with two
different SQLSTATEs depending on RecoveryConflictRetryable. Seems a
bit cryptic.So what we do want to do about this?
I'm pretty well convinced that we should NOT be issuing
ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be
fixed by a trivial simplification of the code posted above, without
introducing any new error code.
I agree with ERRCODE_ADMIN_SHUTDOWN should not be used for a recovery
conflict. And if your proposal does not need to introduce new error
code, I also agree with not inventing new error code.
I'd also be in favor of changing the one that uses
ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as
the former might be taken to imply active user intervention, and for
consistency.
+1.
Show quoted text
It's no longer clear to me that we actually need a new error code for
this - using the same one everywhere seems like it might be
sufficient, unless someone wants to make an argument why it isn't.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, 2011-01-21 at 13:49 +0900, Tatsuo Ishii wrote:
I'm pretty well convinced that we should NOT be issuing
ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be
fixed by a trivial simplification of the code posted above, without
introducing any new error code.I agree with ERRCODE_ADMIN_SHUTDOWN should not be used for a recovery
conflict. And if your proposal does not need to introduce new error
code, I also agree with not inventing new error code.
I'd also be in favor of changing the one that uses
ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as
the former might be taken to imply active user intervention, and for
consistency.+1.
We already use ERRCODE_T_R_SERIALIZATION_FAILURE for retryable errors,
which is almost every error. So no change required there.
ERRCODE_ADMIN_SHUTDOWN is used only in situations where we cannot
reconnect or retry because the database we said we wished to connect to
no longer exists. That needs to be a different error code to a normal,
retryable error, so that pgpool can tell the difference between things
it can help with and things it cannot help with.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Jan21, 2011, at 10:16 , Simon Riggs wrote:
On Fri, 2011-01-21 at 13:49 +0900, Tatsuo Ishii wrote:
I'm pretty well convinced that we should NOT be issuing
ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be
fixed by a trivial simplification of the code posted above, without
introducing any new error code.I agree with ERRCODE_ADMIN_SHUTDOWN should not be used for a recovery
conflict. And if your proposal does not need to introduce new error
code, I also agree with not inventing new error code.I'd also be in favor of changing the one that uses
ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as
the former might be taken to imply active user intervention, and for
consistency.+1.
We already use ERRCODE_T_R_SERIALIZATION_FAILURE for retryable errors,
which is almost every error. So no change required there.ERRCODE_ADMIN_SHUTDOWN is used only in situations where we cannot
reconnect or retry because the database we said we wished to connect to
no longer exists. That needs to be a different error code to a normal,
retryable error, so that pgpool can tell the difference between things
it can help with and things it cannot help with.
Yeah. Clients absolutely need to be able to distinguish transient and
permanent errors. Otherwise, how would a client know when to retry
a transaction (as he needs to in case of a serialization anomaly) and
when to report the error to the user?
ERRCODE_T_R_SERIALIZATION_FAILURE and ERRCODE_T_R_DEADLOCK_DETECTED
are probably both assumed to be transient failure by client aready. So
we should use those two for transient recovery conflicts (i.e. those
which go away if you retry) and something else for the others (like
database dropped)
This'd mean that the code is fine as it is, except that we should
raise ERRCODE_T_R_DEADLOCK_DETECTED instead of ERRCODE_QUERY_CANCELED
in CheckRecoveryConflictDeadlock(). I might be missing something though -
Simon, what were your reasons for using ERRCODE_QUERY_CANCELED there?
best regards,
Florian Pflug
On Fri, 2011-01-21 at 13:09 +0100, Florian Pflug wrote:
I'd also be in favor of changing the one that uses
ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as
the former might be taken to imply active user intervention, and for
consistency.+1.
We already use ERRCODE_T_R_SERIALIZATION_FAILURE for retryable errors,
which is almost every error. So no change required there.ERRCODE_ADMIN_SHUTDOWN is used only in situations where we cannot
reconnect or retry because the database we said we wished to connect to
no longer exists. That needs to be a different error code to a normal,
retryable error, so that pgpool can tell the difference between things
it can help with and things it cannot help with.Yeah. Clients absolutely need to be able to distinguish transient and
permanent errors. Otherwise, how would a client know when to retry
a transaction (as he needs to in case of a serialization anomaly) and
when to report the error to the user?ERRCODE_T_R_SERIALIZATION_FAILURE and ERRCODE_T_R_DEADLOCK_DETECTED
are probably both assumed to be transient failure by client aready. So
we should use those two for transient recovery conflicts (i.e. those
which go away if you retry) and something else for the others (like
database dropped)This'd mean that the code is fine as it is, except that we should
raise ERRCODE_T_R_DEADLOCK_DETECTED instead of ERRCODE_QUERY_CANCELED
in CheckRecoveryConflictDeadlock(). I might be missing something though -
Simon, what were your reasons for using ERRCODE_QUERY_CANCELED there?
Ah, thanks Florian. Now I understand. There are two related issues here.
1. The discussion around ERRCODE_ADMIN_SHUTDOWN is incorrect and the
specific patch should be rejected as is. No changes are required in
ProcessInterrupts(), nor new errcodes.
2. Robert is correct that CheckRecoveryConflictDeadlock() returns
ERRCODE_QUERY_CANCELED. Thanks to Florian for noting that we had
switched away from the original discussion onto another part of the
code, which confused me. I agree the use of ERRCODE_QUERY_CANCELED is a
mistake; CheckRecoveryConflictDeadlock() should return
ERRCODE_T_R_DEADLOCK_DETECTED. This was an omission from my commit of 12
May 2010.
Tatsuo, would you like to modify the patch to correct the issue in
CheckRecoveryConflictDeadlock() ? Or would you prefer me to fix?
This should be backpatched to 9.0.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services