Determine if an error is transient by its error code.
Hello folks,
I'm trying to define a transient fault detection strategy for a client
application when calling a postgres database.
Essentially I want to determine by the error code if it is worth retrying
the call (transient) or if the error was due to a bad query or programmer
error, in which case don't retry.
Going through the codes as posted here
https://www.postgresql.org/docs/9.6/static/errcodes-appendix.html I had a
go at making a list of error codes which may be transient:
53000: insufficient_resources
53100: disk_full
53200: out_of_memory
53300: too_many_connections
53400: configuration_limit_exceeded
57000: operator_intervention
57014: query_canceled
57P01: admin_shutdown
57P02: crash_shutdown
57P03: cannot_connect_now
57P04: database_dropped
58000: system_error
58030: io_error
These next few I am not sure whether they should be treated as transient or
not, but I am guessing so
55P03: lock_not_available
55006: object_in_use
55000: object_not_in_prerequisite_state
08000: connection_exception
08003: connection_does_not_exist
08006: connection_failure
08001: sqlclient_unable_to_establish_sqlconnection
08004: sqlserver_rejected_establishment_of_sqlconnection
08007: transaction_resolution_unknown
Are there any codes listed above where retrying would actually not be
helpful?
Are there any codes that I did not include that I should have?
Thanks,
-Dominick
On 20 March 2017 at 10:26, Dominick O'Dierno <odiernod@gmail.com> wrote:
Hello folks,
I'm trying to define a transient fault detection strategy for a client
application when calling a postgres database.Essentially I want to determine by the error code if it is worth retrying
the call (transient) or if the error was due to a bad query or programmer
error, in which case don't retry.Going through the codes as posted here
https://www.postgresql.org/docs/9.6/static/errcodes-appendix.html I had a go
at making a list of error codes which may be transient:53000: insufficient_resources
53100: disk_full
53200: out_of_memory
53300: too_many_connections
53400: configuration_limit_exceeded
57000: operator_intervention
57014: query_canceled
57P01: admin_shutdown
57P02: crash_shutdown
57P03: cannot_connect_now
57P04: database_dropped
58000: system_error
58030: io_error
Depends on how transient you mean, really.
I/O error, disk full, cannot_connect_now, etc may or may not require
admin intervention.
I would argue that database_dropped isn't transient. But I guess you
might be re-creating it?
These next few I am not sure whether they should be treated as transient or
not, but I am guessing so55P03: lock_not_available
Yeah, I'd say that's transient.
55006: object_in_use
Same.
55000: object_not_in_prerequisite_state
Varies. This can be a bit of a catchall error, encompassing things
that need configuration changes, things that need system state changes
(won't work in recover or whatever), and things that will change in a
short span of time.
In general you'll need classes of retry:
* just reissue the query (deadlock retry, etc)
* reconnect and retry
etc.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Craig Ringer <craig@2ndquadrant.com> writes:
On 20 March 2017 at 10:26, Dominick O'Dierno <odiernod@gmail.com> wrote:
Essentially I want to determine by the error code if it is worth retrying
the call (transient) or if the error was due to a bad query or programmer
error, in which case don't retry.
In general you'll need classes of retry:
* just reissue the query (deadlock retry, etc)
* reconnect and retry
Yeah. There's a pretty significant fraction of these where just blindly
repeating the failing query isn't likely to help; the error code is meant
to suggest that the DBA has to fix something, eg adjust configuration
limits. I'm also pretty dubious about the value of a blind retry for,
eg, disk_full.
One you missed that I think *is* supposed to imply "just retry" is
40001 serialization_failure. You have to retry the whole transaction
though, not just one query.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers