idle in txn query cancellation
Hi all,
I know it is late in the cycle, but I still think that the current behaviour
of ERRORing during execution of a query but FATALing during IDLE IN
TRANSACTION is very confusing to the user. Especially as you are not even able
to read the reason for getting disconnected because the client doesnt expect
input in that state...
The first patch adds the capability to add a flag to ereport like:
ereport(ERROR | LOG_NO_CLIENT)
Tom earlier suggested using COMERROR but thats just a version of LOG which
doesnt report to the client. The patch makes that to be a synonym of LOG |
LOG_NO_CLIENT.
While its not the most pretty API I dont think its that bad because the
directionality is somewhat a property of the loglevel. Beside it would
generate a lot of useless noise and breakage.
The second patch changes the FATAL during cancelling an idle in txn query into
ERROR | LOG_NO_CLIENT.
To avoid breaking the known state there also may no "ready for query" message
get sent. The patch ensures that by setting and checking a
"silent_error_while_idle" variable.
That way the client will not see that an error occured until the next command
sent but I dont think there is a solution to that in 9.0 timeframe if at all.
The patch only adds that for the recovery conflict path for now.
What do you think? Is it worth applying something like that now? If yes I
would try to test the patch some more (obviously the patch survives the
regression tests, but they do not seem to check the extended query protocol at
all).
One could argue that the LOG_NO_CLIENT flag should be added when a idle
transaction gets terminated by force but I wouldn't bother.
On a related note I would also like to get rid of the restriction that a
normal query cancellation will only be done if no subtransactions are stacked.
But I guess its too late for that? (I have a patch ready, some cleanup would
be needed)
The latter works by:
- adding a explicit error code (which should be done regardless of this
discussion)
- avoiding to catch such error at a few places (plperl, plpython)
- recursively aborting the subtransactions once the mainloop is reached
- relying on the fact that the cancellation signal will get resent
- possibly escalating to a FATAL if nothing happens after a certain number of
tries
Andres
PS: I know I sort-of-promised a patch earlier, but it didn't work out time-
wise.
Attachments:
0002-Dont-FATAL-a-IDLE-IN-TRANSACTIOn-backend-during-conf.patchtext/x-patch; charset=ISO-8859-1; name=0002-Dont-FATAL-a-IDLE-IN-TRANSACTIOn-backend-during-conf.patchDownload
From 2171feda1a91b0e77c1c0788abb3fc3183503b79 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 13 Feb 2010 22:30:41 +0100
Subject: [PATCH 2/2] Dont FATAL a IDLE IN TRANSACTIOn backend during conflict
resolution. Instead avoid sending the error (and a later "read for
query") to the client to avoid confusing the client state.
---
src/backend/tcop/postgres.c | 41 ++++++++++++++++++++++++++++++++---------
1 files changed, 32 insertions(+), 9 deletions(-)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 13734f0..e9c622e 100644
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
*************** static int UseNewLine = 0; /* Use EOF a
*** 176,181 ****
--- 176,187 ----
static bool RecoveryConflictPending = false;
static ProcSignalReason RecoveryConflictReason;
+ /*
+ * Are we disallowed from sending a "ready for query" message right
+ * now because it would confuse the frontend?
+ */
+ bool silent_error_while_idle = false;
+
/* ----------------------------------------------------------------
* decls for routines only used in this file
* ----------------------------------------------------------------
*************** ProcessInterrupts(void)
*** 2917,2934 ****
RecoveryConflictPending = false;
DisableNotifyInterrupt();
DisableCatchupInterrupt();
! if (DoingCommandRead)
! ereport(FATAL,
! (errcode(ERRCODE_ADMIN_SHUTDOWN),
! errmsg("terminating connection due to conflict with recovery"),
! errdetail_recovery_conflict(),
! errhint("In a moment you should be able to reconnect to the"
! " database and repeat your command.")));
! else
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling statement due to conflict with recovery"),
errdetail_recovery_conflict()));
}
/*
--- 2923,2948 ----
RecoveryConflictPending = false;
DisableNotifyInterrupt();
DisableCatchupInterrupt();
! if (DoingCommandRead){
! /*
! * We cant issue a normal ERROR here because the
! * client doesnt expect the server to send an error at
! * that point.
! * We also may not send a "ready for query"/Z message
! * because that would be unexpected as well.
! */
! silent_error_while_idle = true;
! ereport(ERROR | LOG_NO_CLIENT,
! (errcode(ERRCODE_QUERY_CANCELED),
! errmsg("canceling statement due to conflict with recovery"),
! errdetail_recovery_conflict()));
! }
! else{
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling statement due to conflict with recovery"),
errdetail_recovery_conflict()));
+ }
}
/*
*************** PostgresMain(int argc, char *argv[], con
*** 3767,3773 ****
* processing of batched messages, and because we don't want to report
* uncommitted updates (that confuses autovacuum).
*/
! if (send_ready_for_query)
{
if (IsAbortedTransactionBlockState())
{
--- 3781,3787 ----
* processing of batched messages, and because we don't want to report
* uncommitted updates (that confuses autovacuum).
*/
! if (send_ready_for_query && !silent_error_while_idle)
{
if (IsAbortedTransactionBlockState())
{
*************** PostgresMain(int argc, char *argv[], con
*** 3792,3797 ****
--- 3806,3820 ----
}
/*
+ * After ReadCommand continued we are sure that the frontend
+ * sent a message and can handle a 'Z' message. We cant set
+ * that after ReadCommand because that will error out when in
+ * an aborted transaction - which we are when
+ * silent_error_while_idle was set.
+ */
+ silent_error_while_idle = false;
+
+ /*
* (2) Allow asynchronous signals to be executed immediately if they
* come in while we are waiting for client input. (This must be
* conditional since we don't want, say, reads on behalf of COPY FROM
--
1.6.5.12.gd65df24
0001-Support-transporting-flags-in-the-elevel-argument-of.patchtext/x-patch; charset=ISO-8859-1; name=0001-Support-transporting-flags-in-the-elevel-argument-of.patchDownload
From 04d726519fbc85ea13bb7919d65400c89e0df031 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 13 Feb 2010 22:21:15 +0100
Subject: [PATCH 1/2] Support transporting flags in the 'elevel' argument of ereport(). The
reason is the wish to support avoiding the error to the client which
was only possible using the COMERROR level - which is not a error but
just a log message.
The only supported flag till now is LOG_NO_CLIENT which does what
COMERROR did for all error levels
---
src/backend/utils/error/elog.c | 6 ++++--
src/include/utils/elog.h | 25 ++++++++++++++++---------
2 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index daa88a3..3ef07ad 100644
*** a/src/backend/utils/error/elog.c
--- b/src/backend/utils/error/elog.c
*************** errstart(int elevel, const char *filenam
*** 213,219 ****
bool output_to_server;
bool output_to_client = false;
int i;
!
/*
* Check some cases in which we want to promote an error into a more
* severe error. None of this logic applies for non-error messages.
--- 213,220 ----
bool output_to_server;
bool output_to_client = false;
int i;
! int eflags = elevel & LOG_FLAG_MASK;
! elevel = elevel & ~LOG_FLAG_MASK;
/*
* Check some cases in which we want to promote an error into a more
* severe error. None of this logic applies for non-error messages.
*************** errstart(int elevel, const char *filenam
*** 273,279 ****
output_to_server = (elevel >= log_min_messages);
/* Determine whether message is enabled for client output */
! if (whereToSendOutput == DestRemote && elevel != COMMERROR)
{
/*
* client_min_messages is honored only after we complete the
--- 274,280 ----
output_to_server = (elevel >= log_min_messages);
/* Determine whether message is enabled for client output */
! if (whereToSendOutput == DestRemote && !(eflags & LOG_NO_CLIENT))
{
/*
* client_min_messages is honored only after we complete the
*************** errstart(int elevel, const char *filenam
*** 332,337 ****
--- 333,339 ----
edata = &errordata[errordata_stack_depth];
MemSet(edata, 0, sizeof(ErrorData));
edata->elevel = elevel;
+ edata->eflags = eflags;
edata->output_to_server = output_to_server;
edata->output_to_client = output_to_client;
edata->filename = filename;
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 2ec3dd3..f3b4439 100644
*** a/src/include/utils/elog.h
--- b/src/include/utils/elog.h
***************
*** 16,21 ****
--- 16,24 ----
#include <setjmp.h>
+ #define LOG_FLAG_MASK ((~0)<<20)
+ #define LOG_NO_CLIENT (2<<30)
+
/* Error level codes */
#define DEBUG5 10 /* Debugging messages, in categories of
* decreasing detail. */
***************
*** 25,54 ****
#define DEBUG1 14 /* used by GUC debug_* variables */
#define LOG 15 /* Server operational messages; sent only to
* server log by default. */
! #define COMMERROR 16 /* Client communication problems; same as LOG
* for server reporting, but never sent to
! * client. */
! #define INFO 17 /* Messages specifically requested by user (eg
* VACUUM VERBOSE output); always sent to
* client regardless of client_min_messages,
* but by default not sent to server log. */
! #define NOTICE 18 /* Helpful messages to users about query
* operation; sent to client and server log by
* default. */
! #define WARNING 19 /* Warnings. NOTICE is for expected messages
* like implicit sequence creation by SERIAL.
* WARNING is for unexpected messages. */
! #define ERROR 20 /* user error - abort transaction; return to
* known state */
/* Save ERROR value in PGERROR so it can be restored when Win32 includes
* modify it. We have to use a constant rather than ERROR because macros
* are expanded only when referenced outside macros.
*/
#ifdef WIN32
! #define PGERROR 20
#endif
! #define FATAL 21 /* fatal error - abort process */
! #define PANIC 22 /* take down the other backends with me */
/* #define DEBUG DEBUG1 */ /* Backward compatibility with pre-7.3 */
--- 28,60 ----
#define DEBUG1 14 /* used by GUC debug_* variables */
#define LOG 15 /* Server operational messages; sent only to
* server log by default. */
! #define COMMERROR (LOG|LOG_NO_CLIENT)/* Client communication problems; same as LOG
* for server reporting, but never sent to
! * client. For backward compatibility this is
! * is available without explicitly specifying
! * LOG_NO_CLIENT.*/
! #define INFO 16 /* Messages specifically requested by user (eg
* VACUUM VERBOSE output); always sent to
* client regardless of client_min_messages,
* but by default not sent to server log. */
! #define NOTICE 17 /* Helpful messages to users about query
* operation; sent to client and server log by
* default. */
! #define WARNING 18 /* Warnings. NOTICE is for expected messages
* like implicit sequence creation by SERIAL.
* WARNING is for unexpected messages. */
! #define ERROR 19 /* user error - abort transaction; return to
* known state */
/* Save ERROR value in PGERROR so it can be restored when Win32 includes
* modify it. We have to use a constant rather than ERROR because macros
* are expanded only when referenced outside macros.
*/
#ifdef WIN32
! #define PGERROR 19
#endif
! #define FATAL 20 /* fatal error - abort process */
! #define PANIC 21 /* take down the other backends with me */
!
/* #define DEBUG DEBUG1 */ /* Backward compatibility with pre-7.3 */
*************** extern PGDLLIMPORT sigjmp_buf *PG_except
*** 291,296 ****
--- 297,303 ----
typedef struct ErrorData
{
int elevel; /* error level */
+ int eflags; /* error flags */
bool output_to_server; /* will report to server log? */
bool output_to_client; /* will report to client? */
bool show_funcname; /* true to force funcname inclusion */
--
1.6.5.12.gd65df24
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
I know it is late in the cycle
No problem here. Thanks for your diligence. Will review.
--
Simon Riggs www.2ndQuadrant.com
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
On a related note I would also like to get rid of the restriction that
a normal query cancellation will only be done if no subtransactions
are stacked.
But I guess its too late for that? (I have a patch ready, some cleanup
would be needed)
The latter works by:
- adding a explicit error code (which should be done regardless of
this
discussion)
- avoiding to catch such error at a few places (plperl, plpython)
- recursively aborting the subtransactions once the mainloop is
reached
- relying on the fact that the cancellation signal will get resent
- possibly escalating to a FATAL if nothing happens after a certain
number of tries
Such an action needs to have a good, clear theoretical explanation with
it to show that the interaction with savepoints is a good one.
I toyed with the idea of a new level between ERROR and FATAL to allow
ERRORs to be handled by savepoints still in all cases.
--
Simon Riggs www.2ndQuadrant.com
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
The first patch adds the capability to add a flag to ereport like:
ereport(ERROR | LOG_NO_CLIENT)
Tom earlier suggested using COMERROR but thats just a version of LOG
which doesnt report to the client. The patch makes that to be a
synonym of LOG | LOG_NO_CLIENT.
While its not the most pretty API I dont think its that bad because
the directionality is somewhat a property of the loglevel. Beside it
would generate a lot of useless noise and breakage.The second patch changes the FATAL during cancelling an idle in txn
query into ERROR | LOG_NO_CLIENT.
To avoid breaking the known state there also may no "ready for query"
message get sent. The patch ensures that by setting and checking a
"silent_error_while_idle" variable.That way the client will not see that an error occured until the next
command sent but I dont think there is a solution to that in 9.0
timeframe if at all.The patch only adds that for the recovery conflict path for now.
What do you think? Is it worth applying something like that now? If
yes I would try to test the patch some more (obviously the patch
survives the regression tests, but they do not seem to check the
extended query protocol at all).
I think that is much better than FATAL. If it works I think we should
apply it for this release.
--
Simon Riggs www.2ndQuadrant.com
On Monday 15 February 2010 09:50:08 Simon Riggs wrote:
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
The first patch adds the capability to add a flag to ereport like:
ereport(ERROR | LOG_NO_CLIENT)
Tom earlier suggested using COMERROR but thats just a version of LOG
which doesnt report to the client. The patch makes that to be a
synonym of LOG | LOG_NO_CLIENT.
While its not the most pretty API I dont think its that bad because
the directionality is somewhat a property of the loglevel. Beside it
would generate a lot of useless noise and breakage.The second patch changes the FATAL during cancelling an idle in txn
query into ERROR | LOG_NO_CLIENT.
To avoid breaking the known state there also may no "ready for query"
message get sent. The patch ensures that by setting and checking a
"silent_error_while_idle" variable.That way the client will not see that an error occured until the next
command sent but I dont think there is a solution to that in 9.0
timeframe if at all.The patch only adds that for the recovery conflict path for now.
What do you think? Is it worth applying something like that now? If
yes I would try to test the patch some more (obviously the patch
survives the regression tests, but they do not seem to check the
extended query protocol at all).I think that is much better than FATAL. If it works I think we should
apply it for this release.
It does work for me at least ;-). I only have marginal testing with the
extended query protocol though and I think the error message needs to get
improved somewhat.
I plan to make testing the extended query protocol easier by making pgbench
able to restart after a such an error (thats why I like the seperate error
code for such cancellations...)
The problem with the error message is, that errdetail_abort() uses MyProc-
recoveryConflictPending which is already unset when the errdetail is used.
Unless you beat me I plan to provide a patch here (havent looked at how to do
so yet though).
Andres
On Monday 15 February 2010 09:47:09 Simon Riggs wrote:
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
On a related note I would also like to get rid of the restriction that
a normal query cancellation will only be done if no subtransactions
are stacked.
But I guess its too late for that? (I have a patch ready, some cleanup
would be needed)
The latter works by:
- adding a explicit error code (which should be done regardless of
this
discussion)
- avoiding to catch such error at a few places (plperl, plpython)
- recursively aborting the subtransactions once the mainloop is
reached
- relying on the fact that the cancellation signal will get resent
- possibly escalating to a FATAL if nothing happens after a certain
number of triesSuch an action needs to have a good, clear theoretical explanation with
it to show that the interaction with savepoints is a good one.
I can provide a bit more explanation. The patch (other thread) already added
some more comments but its definitely good to explain/define some more.
Will post that to the thread with the patch, ok?
I toyed with the idea of a new level between ERROR and FATAL to allow
ERRORs to be handled by savepoints still in all cases.
I have a hard time believing that it will help in that situation. Either you
allow cleaning up process local resources in PG_TRY/PG_TRY in which situation
you cant abort recursively at all places because the catching code block may
very well reference resources associated with that snapshot or you abort the
process in a way that there are no process local resources.
How would the middleway between those work?
Andres
On Sunday 14 February 2010 06:29:45 Simon Riggs wrote:
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
I know it is late in the cycle
No problem here. Thanks for your diligence. Will review.
Got a chance to look at it?
Andres
On Sun, 2010-03-14 at 19:50 +0100, Andres Freund wrote:
On Sunday 14 February 2010 06:29:45 Simon Riggs wrote:
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
I know it is late in the cycle
No problem here. Thanks for your diligence. Will review.
Got a chance to look at it?
I need to spend my time on ensuring we can avoid the cancellation
altogether, so I apologise for not reviewing. That's not a comment on
your work or the possible effectiveness of the patch. Possibly others
have the time to review?
--
Simon Riggs www.2ndQuadrant.com
Hi Simon,
On Sunday 14 March 2010 20:12:00 Simon Riggs wrote:
On Sun, 2010-03-14 at 19:50 +0100, Andres Freund wrote:
On Sunday 14 February 2010 06:29:45 Simon Riggs wrote:
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
I know it is late in the cycle
No problem here. Thanks for your diligence. Will review.
Got a chance to look at it?
I need to spend my time on ensuring we can avoid the cancellation
altogether, so I apologise for not reviewing. That's not a comment on
your work or the possible effectiveness of the patch. Possibly others
have the time to review?
I guess that wont go anywhere before 9.1?
I think at least the error code should be adjusted before 9.0.
Andres