[PATH] Idle in transaction cancellation

Started by Andres Freundabout 15 years ago47 messages
#1Andres Freund
andres@anarazel.de
2 attachment(s)

Hi all.

Here is a proposed patch which enables cancellation of $subject. The
problematic point about doing so is that the client is not expecting any
messages from the server when its in an idle state during an transaction and
that simply suppressing that message is not enough as ready for query messages
would get sent out at unexpected places.

Thus the first patch adds the possibility to add flags to ereport()s error level
to suppress notifying the client.
It also switches the earlier "COMERROR" log level over to this flag
(LOG_NO_CLIENT) with a typedef to support the old method.

The second patch sets up a variable "silent_error_while_idle" when its
cancelling an idle txn which suppresses sending out messages at wrong places
and resets its after having read a command in the simple protocol and after
having read a 'sync' message in the extended protocol.

Currently it does *not* report any special error message to the client if it
starts sending commands in an (unbekownst to it) failed transaction, but just
the normal "25P02: current transaction is aborted..." message.

It shouldn't be hard to add that and I will propose a patch if people would
like it (I personally am not very interested, but I can see people validly
wanting it), but I would like to have some feedback on the patch first.

Greetings,

Andres

Attachments:

0001-Support-transporting-flags-in-the-elevel-argument-of.patchtext/x-patch; charset=UTF-8; name=0001-Support-transporting-flags-in-the-elevel-argument-of.patchDownload
From 150b0bc845f7aab9629cc169a3a73f3338d6ab7f Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 13 Feb 2010 22:21:15 +0100
Subject: [PATCH 1/2] Support transporting flags in the 'elevel' argument of ereport(). The
 reason is the wish to support avoiding the error to the client which
 was only possible using the COMERROR level - which is not a error but
 just a log message.
 The only supported flag till now is LOG_NO_CLIENT which does what
 COMERROR did for all error levels

---
 src/backend/utils/error/elog.c |    6 ++++--
 src/include/utils/elog.h       |   29 ++++++++++++++++++++---------
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e321b99..5fc21b4 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -214,7 +214,8 @@ errstart(int elevel, const char *filename, int lineno,
 	bool		output_to_server;
 	bool		output_to_client = false;
 	int			i;
-
+	int			eflags = elevel & LOG_FLAG_MASK;
+	elevel = elevel & ~LOG_FLAG_MASK;
 	/*
 	 * Check some cases in which we want to promote an error into a more
 	 * severe error.  None of this logic applies for non-error messages.
@@ -274,7 +275,7 @@ errstart(int elevel, const char *filename, int lineno,
 		output_to_server = (elevel >= log_min_messages);
 
 	/* Determine whether message is enabled for client output */
-	if (whereToSendOutput == DestRemote && elevel != COMMERROR)
+	if (whereToSendOutput == DestRemote && !(eflags & LOG_NO_CLIENT))
 	{
 		/*
 		 * client_min_messages is honored only after we complete the
@@ -333,6 +334,7 @@ errstart(int elevel, const char *filename, int lineno,
 	edata = &errordata[errordata_stack_depth];
 	MemSet(edata, 0, sizeof(ErrorData));
 	edata->elevel = elevel;
+	edata->eflags = eflags;
 	edata->output_to_server = output_to_server;
 	edata->output_to_client = output_to_client;
 	edata->filename = filename;
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 92641ba..1853e1d 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -16,6 +16,13 @@
 
 #include <setjmp.h>
 
+#define LOG_FLAG_MASK ((~0)<<20)
+
+/* logging flags */
+#define LOG_NO_CLIENT (2<<30)   /* Don't send to the client. Helpfull
+                                 * if it would confuse the client at
+                                 * that moment */
+
 /* Error level codes */
 #define DEBUG5		10			/* Debugging messages, in categories of
 								 * decreasing detail. */
@@ -25,30 +32,33 @@
 #define DEBUG1		14			/* used by GUC debug_* variables */
 #define LOG			15			/* Server operational messages; sent only to
 								 * server log by default. */
-#define COMMERROR	16			/* Client communication problems; same as LOG
+#define COMMERROR	(LOG|LOG_NO_CLIENT)/* Client communication problems; same as LOG
 								 * for server reporting, but never sent to
-								 * client. */
-#define INFO		17			/* Messages specifically requested by user (eg
+								 * client. For backward compatibility this is
+								 * is available without explicitly specifying
+								 * LOG_NO_CLIENT.*/
+#define INFO		16			/* Messages specifically requested by user (eg
 								 * VACUUM VERBOSE output); always sent to
 								 * client regardless of client_min_messages,
 								 * but by default not sent to server log. */
-#define NOTICE		18			/* Helpful messages to users about query
+#define NOTICE		17			/* Helpful messages to users about query
 								 * operation; sent to client and server log by
 								 * default. */
-#define WARNING		19			/* Warnings.  NOTICE is for expected messages
+#define WARNING		18			/* Warnings.  NOTICE is for expected messages
 								 * like implicit sequence creation by SERIAL.
 								 * WARNING is for unexpected messages. */
-#define ERROR		20			/* user error - abort transaction; return to
+#define ERROR		19			/* user error - abort transaction; return to
 								 * known state */
 /* Save ERROR value in PGERROR so it can be restored when Win32 includes
  * modify it.  We have to use a constant rather than ERROR because macros
  * are expanded only when referenced outside macros.
  */
 #ifdef WIN32
-#define PGERROR		20
+#define PGERROR		19
 #endif
-#define FATAL		21			/* fatal error - abort process */
-#define PANIC		22			/* take down the other backends with me */
+#define FATAL		20			/* fatal error - abort process */
+#define PANIC		21			/* take down the other backends with me */
+
 
  /* #define DEBUG DEBUG1 */	/* Backward compatibility with pre-7.3 */
 
@@ -291,6 +301,7 @@ extern PGDLLIMPORT sigjmp_buf *PG_exception_stack;
 typedef struct ErrorData
 {
 	int			elevel;			/* error level */
+	int			eflags;			/* error flags */
 	bool		output_to_server;		/* will report to server log? */
 	bool		output_to_client;		/* will report to client? */
 	bool		show_funcname;	/* true to force funcname inclusion */
-- 
1.7.3.rc1.5.g73aa2

0002-Idle-transaction-cancellation.patchtext/x-patch; charset=UTF-8; name=0002-Idle-transaction-cancellation.patchDownload
From 6a27a19395df053112616b83c966f4b20869ebc3 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 21 May 2010 20:17:11 +0200
Subject: [PATCH 2/2] Idle transaction cancellation.

---
 src/backend/tcop/postgres.c |   75 +++++++++++++++++++++++++++++++-----------
 1 files changed, 55 insertions(+), 20 deletions(-)

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cba90a9..d0c8696 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -177,6 +177,12 @@ static bool RecoveryConflictPending = false;
 static bool RecoveryConflictRetryable = true;
 static ProcSignalReason RecoveryConflictReason;
 
+/*
+ * Are we disallowed from sending a "ready for query" message right
+ * now because it would confuse the frontend?
+ */
+bool silent_error_while_idle = false;
+
 /* ----------------------------------------------------------------
  *		decls for routines only used in this file
  * ----------------------------------------------------------------
@@ -2877,6 +2883,8 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
 void
 ProcessInterrupts(void)
 {
+	int error = ERROR;
+
 	/* OK to accept interrupt now? */
 	if (InterruptHoldoffCount != 0 || CritSectionCount != 0)
 		return;
@@ -2949,18 +2957,24 @@ ProcessInterrupts(void)
 			RecoveryConflictPending = false;
 			DisableNotifyInterrupt();
 			DisableCatchupInterrupt();
-			if (DoingCommandRead)
-				ereport(FATAL,
-						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-						 errmsg("terminating connection due to conflict with recovery"),
-						 errdetail_recovery_conflict(),
-				 errhint("In a moment you should be able to reconnect to the"
-						 " database and repeat your command.")));
-			else
-				ereport(ERROR,
-						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-				 errmsg("canceling statement due to conflict with recovery"),
-						 errdetail_recovery_conflict()));
+
+			if (DoingCommandRead){
+				/*
+				 * We cant issue a normal ERROR here because the
+				 * client doesnt expect the server to send an error at
+				 * that point.
+				 * We also may not send a "ready for query"/Z message
+				 * because that would be unexpected as well.
+				 */
+				silent_error_while_idle = true;
+				error |= LOG_NO_CLIENT;
+
+			}
+
+			ereport(error,
+			        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+			         errmsg("canceling statement due to conflict with recovery"),
+			         errdetail_recovery_conflict()));
 		}
 
 		/*
@@ -2968,15 +2982,18 @@ ProcessInterrupts(void)
 		 * request --- sending an extra error message won't accomplish
 		 * anything.  Otherwise, go ahead and throw the error.
 		 */
-		if (!DoingCommandRead)
+		if (DoingCommandRead)
 		{
-			ImmediateInterruptOK = false;		/* not idle anymore */
-			DisableNotifyInterrupt();
-			DisableCatchupInterrupt();
-			ereport(ERROR,
-					(errcode(ERRCODE_QUERY_CANCELED),
-					 errmsg("canceling statement due to user request")));
+			silent_error_while_idle = true;
+			error |= LOG_NO_CLIENT;
 		}
+
+		ImmediateInterruptOK = false;		/* not idle anymore */
+		DisableNotifyInterrupt();
+		DisableCatchupInterrupt();
+		ereport(error,
+		        (errcode(ERRCODE_QUERY_CANCELED),
+		         errmsg("canceling statement due to user request")));
 	}
 	/* If we get here, do nothing (probably, QueryCancelPending was reset) */
 }
@@ -3794,7 +3811,7 @@ PostgresMain(int argc, char *argv[], const char *username)
 		 * uncommitted updates (that confuses autovacuum).	The notification
 		 * processor wants a call too, if we are not in a transaction block.
 		 */
-		if (send_ready_for_query)
+		if (send_ready_for_query && !silent_error_while_idle)
 		{
 			if (IsAbortedTransactionBlockState())
 			{
@@ -3819,6 +3836,7 @@ PostgresMain(int argc, char *argv[], const char *username)
 			send_ready_for_query = false;
 		}
 
+
 		/*
 		 * (2) Allow asynchronous signals to be executed immediately if they
 		 * come in while we are waiting for client input. (This must be
@@ -3854,6 +3872,23 @@ PostgresMain(int argc, char *argv[], const char *username)
 		if (ignore_till_sync && firstchar != EOF)
 			continue;
 
+		if(silent_error_while_idle && !doing_extended_query_message)
+		{
+			/*
+			 * When using the simple protocol:
+			 * At this point we have processed a ReadCommand which means
+			 * that we got some input from the frontend and thus can start
+			 * spewing errors again
+
+			 * When using the extended protocol:
+			 * doing_extended_query_message is false after a 'sync'
+			 * message - which is exactly the point after which we can
+			 * start sending errors again.
+			 */
+			silent_error_while_idle = false;
+		}
+
+
 		switch (firstchar)
 		{
 			case 'Q':			/* simple query */
-- 
1.7.3.rc1.5.g73aa2

#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andres Freund (#1)
Re: [PATH] Idle in transaction cancellation

Andres Freund <andres@anarazel.de> wrote:

Here is a proposed patch which enables cancellation of $subject.

Cool. Some enhancements we'd like to do to Serializable Snapshot
Isolation (SSI), should the base patch make it in, would require
this capability.

Currently it does *not* report any special error message to the
client if it

starts sending commands in an (unbekownst to it) failed
transaction, but just the normal "25P02: current transaction is
aborted..." message.

It shouldn't be hard to add that and I will propose a patch if
people would like it (I personally am not very interested, but I
can see people validly wanting it)

For SSI purposes, it would be highly desirable to be able to set the
SQLSTATE and message generated when the canceled transaction
terminates.

but I would like to have some feedback on the patch first.

I'll take a look when I can, but it may be a couple weeks from now.

-Kevin

#3Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#2)
Re: [PATH] Idle in transaction cancellation

On Tue, Oct 19, 2010 at 10:18 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Andres Freund <andres@anarazel.de> wrote:

Here is a proposed patch which enables cancellation of $subject.

I'll take a look when I can, but it may be a couple weeks from now.

How about adding it to
https://commitfest.postgresql.org/action/commitfest_view/open ?

Seems like a good feature, at least from 20,000 feet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Andres Freund
andres@anarazel.de
In reply to: Kevin Grittner (#2)
3 attachment(s)
Re: [PATCH] V3: Idle in transaction cancellation

Hi,

On Tuesday 19 October 2010 16:18:29 Kevin Grittner wrote:

Andres Freund <andres@anarazel.de> wrote:

Here is a proposed patch which enables cancellation of $subject.

Cool. Some enhancements we'd like to do to Serializable Snapshot
Isolation (SSI), should the base patch make it in, would require
this capability.

Currently it does *not* report any special error message to the
client if it

starts sending commands in an (unbekownst to it) failed
transaction, but just the normal "25P02: current transaction is
aborted..." message.

It shouldn't be hard to add that and I will propose a patch if
people would like it (I personally am not very interested, but I
can see people validly wanting it)

For SSI purposes, it would be highly desirable to be able to set the
SQLSTATE and message generated when the canceled transaction
terminates.

Ok, I implemented that capability, but the patch feels somewhat wrong to me,
so its a separate patch on top the others:

* it intermingles logic between elog.c and postgres.c far too much - requiring
exporting variables which should not get exported. And it would still need
more to allow some sensible Assert()s. I.e. Assert(DoingCommandRead) would
need to be available in elog.c to avoid somebody using the re-throwing
capability out of place....

* You wont see an error if the next command after the IDLE in transaction is a
COMMIT/ROLLBACK. I don’t see any sensible way around that.

* the copied edata lives in TopMemoryContext and gets only reset in the next
reported error, after the re-raised error was reported.

That’s how it looks like to raise one such error right now:

error = ERROR
if (DoingCommandRead)
{
silent_error_while_idle = true;
error |= LOG_NO_CLIENT|LOG_RE_THROW_AFTER_SYNC;
}

...

ereport(error,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling statement due to user request")));

One could mingle together LOG_NO_CLIENT|LOG_RE_THROW_AFTER_SYNC into a macro
and set silent_error_while_idle in elog.c, but I don’t see many callsites
coming up, so I think its better to be explicit.

Ill set this up for the next commitfest, I don't think I can do much more
without further input.

Andres

Attachments:

0002-Implement-cancellation-of-backends-in-IDLE-IN-TRANSA.patchtext/x-patch; charset=UTF-8; name=0002-Implement-cancellation-of-backends-in-IDLE-IN-TRANSA.patchDownload
From 06541b25fc11a8f17ec401de5a17eeae1bad57d1 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 21 May 2010 20:17:11 +0200
Subject: [PATCH 2/3] Implement cancellation of backends in IDLE IN TRANSACTION state.

Not having support for this was the reason for HS FATALing backends
which had a lock conflict for longer than
max_standby_(archive|standby)_delay as it couldnt cancel them without
them loosing sync with the frontend. As this is not the case anymore
fail the transaction "silently". Possibly one day the protocol can
cope with this...
---
 src/backend/tcop/postgres.c |   80 +++++++++++++++++++++++++++++++------------
 1 files changed, 58 insertions(+), 22 deletions(-)

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cba90a9..505d136 100644
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
*************** static bool RecoveryConflictPending = fa
*** 177,182 ****
--- 177,188 ----
  static bool RecoveryConflictRetryable = true;
  static ProcSignalReason RecoveryConflictReason;
  
+ /*
+  * Are we disallowed from sending a "ready for query" message right
+  * now because it would confuse the frontend?
+  */
+ static bool silent_error_while_idle = false;
+ 
  /* ----------------------------------------------------------------
   *		decls for routines only used in this file
   * ----------------------------------------------------------------
*************** RecoveryConflictInterrupt(ProcSignalReas
*** 2877,2882 ****
--- 2883,2890 ----
  void
  ProcessInterrupts(void)
  {
+ 	int error = ERROR;
+ 
  	/* OK to accept interrupt now? */
  	if (InterruptHoldoffCount != 0 || CritSectionCount != 0)
  		return;
*************** ProcessInterrupts(void)
*** 2949,2966 ****
  			RecoveryConflictPending = false;
  			DisableNotifyInterrupt();
  			DisableCatchupInterrupt();
! 			if (DoingCommandRead)
! 				ereport(FATAL,
! 						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
! 						 errmsg("terminating connection due to conflict with recovery"),
! 						 errdetail_recovery_conflict(),
! 				 errhint("In a moment you should be able to reconnect to the"
! 						 " database and repeat your command.")));
! 			else
! 				ereport(ERROR,
! 						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
! 				 errmsg("canceling statement due to conflict with recovery"),
! 						 errdetail_recovery_conflict()));
  		}
  
  		/*
--- 2957,2980 ----
  			RecoveryConflictPending = false;
  			DisableNotifyInterrupt();
  			DisableCatchupInterrupt();
! 
! 			if (DoingCommandRead){
! 				/*
! 				 * We cant issue a normal ERROR here because the
! 				 * client doesnt expect the server to send an error at
! 				 * that point.
! 				 * We also may not send a "ready for query"/Z message
! 				 * because that would be unexpected as well.
! 				 */
! 				silent_error_while_idle = true;
! 				error |= LOG_NO_CLIENT;
! 
! 			}
! 
! 			ereport(error,
! 			        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
! 			         errmsg("canceling statement due to conflict with recovery"),
! 			         errdetail_recovery_conflict()));
  		}
  
  		/*
*************** ProcessInterrupts(void)
*** 2968,2982 ****
  		 * request --- sending an extra error message won't accomplish
  		 * anything.  Otherwise, go ahead and throw the error.
  		 */
! 		if (!DoingCommandRead)
  		{
! 			ImmediateInterruptOK = false;		/* not idle anymore */
! 			DisableNotifyInterrupt();
! 			DisableCatchupInterrupt();
! 			ereport(ERROR,
! 					(errcode(ERRCODE_QUERY_CANCELED),
! 					 errmsg("canceling statement due to user request")));
  		}
  	}
  	/* If we get here, do nothing (probably, QueryCancelPending was reset) */
  }
--- 2982,2999 ----
  		 * request --- sending an extra error message won't accomplish
  		 * anything.  Otherwise, go ahead and throw the error.
  		 */
! 		if (DoingCommandRead)
  		{
! 			silent_error_while_idle = true;
! 			error |= LOG_NO_CLIENT;
  		}
+ 
+ 		ImmediateInterruptOK = false;		/* not idle anymore */
+ 		DisableNotifyInterrupt();
+ 		DisableCatchupInterrupt();
+ 		ereport(error,
+ 		        (errcode(ERRCODE_QUERY_CANCELED),
+ 		         errmsg("canceling statement due to user request")));
  	}
  	/* If we get here, do nothing (probably, QueryCancelPending was reset) */
  }
*************** PostgresMain(int argc, char *argv[], con
*** 3814,3824 ****
  				set_ps_display("idle", false);
  				pgstat_report_activity("<IDLE>");
  			}
! 
! 			ReadyForQuery(whereToSendOutput);
! 			send_ready_for_query = false;
  		}
  
  		/*
  		 * (2) Allow asynchronous signals to be executed immediately if they
  		 * come in while we are waiting for client input. (This must be
--- 3831,3843 ----
  				set_ps_display("idle", false);
  				pgstat_report_activity("<IDLE>");
  			}
! 			if(!silent_error_while_idle){
! 				ReadyForQuery(whereToSendOutput);
! 				send_ready_for_query = false;
! 			}
  		}
  
+ 
  		/*
  		 * (2) Allow asynchronous signals to be executed immediately if they
  		 * come in while we are waiting for client input. (This must be
*************** PostgresMain(int argc, char *argv[], con
*** 3854,3859 ****
--- 3873,3895 ----
  		if (ignore_till_sync && firstchar != EOF)
  			continue;
  
+ 		if(silent_error_while_idle && !doing_extended_query_message)
+ 		{
+ 			/*
+ 			 * When using the simple protocol:
+ 			 * At this point we have processed a ReadCommand which means
+ 			 * that we got some input from the frontend and thus can start
+ 			 * spewing errors again
+ 
+ 			 * When using the extended protocol:
+ 			 * doing_extended_query_message is false after a 'sync'
+ 			 * message - which is exactly the point after which we can
+ 			 * start sending errors again.
+ 			 */
+ 			silent_error_while_idle = false;
+ 		}
+ 
+ 
  		switch (firstchar)
  		{
  			case 'Q':			/* simple query */
-- 
1.7.3.rc1.5.g73aa2

0001-Support-transporting-flags-in-the-elevel-argument-of.patchtext/x-patch; charset=UTF-8; name=0001-Support-transporting-flags-in-the-elevel-argument-of.patchDownload
From bd0df4139bf045edd3400ee1b321009a4d2754d3 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 13 Feb 2010 22:21:15 +0100
Subject: [PATCH 1/3] Support transporting flags in the 'elevel' argument of ereport(). The
 reason is the wish to support avoiding the error to the client which
 was only possible using the COMERROR level - which is not a error but
 just a log message.
 The only supported flag till now is LOG_NO_CLIENT which does what
 COMERROR did for all error levels

---
 src/backend/utils/error/elog.c |    8 +++++---
 src/include/utils/elog.h       |   29 ++++++++++++++++++++---------
 2 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e321b99..6e174d8 100644
*** a/src/backend/utils/error/elog.c
--- b/src/backend/utils/error/elog.c
*************** errstart(int elevel, const char *filenam
*** 214,220 ****
  	bool		output_to_server;
  	bool		output_to_client = false;
  	int			i;
! 
  	/*
  	 * Check some cases in which we want to promote an error into a more
  	 * severe error.  None of this logic applies for non-error messages.
--- 214,221 ----
  	bool		output_to_server;
  	bool		output_to_client = false;
  	int			i;
! 	int			eflags = elevel & LOG_FLAG_MASK;
! 	elevel = elevel & ~LOG_FLAG_MASK;
  	/*
  	 * Check some cases in which we want to promote an error into a more
  	 * severe error.  None of this logic applies for non-error messages.
*************** errstart(int elevel, const char *filenam
*** 274,280 ****
  		output_to_server = (elevel >= log_min_messages);
  
  	/* Determine whether message is enabled for client output */
! 	if (whereToSendOutput == DestRemote && elevel != COMMERROR)
  	{
  		/*
  		 * client_min_messages is honored only after we complete the
--- 275,281 ----
  		output_to_server = (elevel >= log_min_messages);
  
  	/* Determine whether message is enabled for client output */
! 	if (whereToSendOutput == DestRemote)
  	{
  		/*
  		 * client_min_messages is honored only after we complete the
*************** errstart(int elevel, const char *filenam
*** 333,338 ****
--- 334,340 ----
  	edata = &errordata[errordata_stack_depth];
  	MemSet(edata, 0, sizeof(ErrorData));
  	edata->elevel = elevel;
+ 	edata->eflags = eflags;
  	edata->output_to_server = output_to_server;
  	edata->output_to_client = output_to_client;
  	edata->filename = filename;
*************** EmitErrorReport(void)
*** 1168,1174 ****
  		send_message_to_server_log(edata);
  
  	/* Send to client, if enabled */
! 	if (edata->output_to_client)
  		send_message_to_frontend(edata);
  
  	MemoryContextSwitchTo(oldcontext);
--- 1170,1176 ----
  		send_message_to_server_log(edata);
  
  	/* Send to client, if enabled */
! 	if (edata->output_to_client && !(edata->eflags & LOG_NO_CLIENT))
  		send_message_to_frontend(edata);
  
  	MemoryContextSwitchTo(oldcontext);
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 92641ba..1853e1d 100644
*** a/src/include/utils/elog.h
--- b/src/include/utils/elog.h
***************
*** 16,21 ****
--- 16,28 ----
  
  #include <setjmp.h>
  
+ #define LOG_FLAG_MASK ((~0)<<20)
+ 
+ /* logging flags */
+ #define LOG_NO_CLIENT (2<<30)   /* Don't send to the client. Helpfull
+                                  * if it would confuse the client at
+                                  * that moment */
+ 
  /* Error level codes */
  #define DEBUG5		10			/* Debugging messages, in categories of
  								 * decreasing detail. */
***************
*** 25,54 ****
  #define DEBUG1		14			/* used by GUC debug_* variables */
  #define LOG			15			/* Server operational messages; sent only to
  								 * server log by default. */
! #define COMMERROR	16			/* Client communication problems; same as LOG
  								 * for server reporting, but never sent to
! 								 * client. */
! #define INFO		17			/* Messages specifically requested by user (eg
  								 * VACUUM VERBOSE output); always sent to
  								 * client regardless of client_min_messages,
  								 * but by default not sent to server log. */
! #define NOTICE		18			/* Helpful messages to users about query
  								 * operation; sent to client and server log by
  								 * default. */
! #define WARNING		19			/* Warnings.  NOTICE is for expected messages
  								 * like implicit sequence creation by SERIAL.
  								 * WARNING is for unexpected messages. */
! #define ERROR		20			/* user error - abort transaction; return to
  								 * known state */
  /* Save ERROR value in PGERROR so it can be restored when Win32 includes
   * modify it.  We have to use a constant rather than ERROR because macros
   * are expanded only when referenced outside macros.
   */
  #ifdef WIN32
! #define PGERROR		20
  #endif
! #define FATAL		21			/* fatal error - abort process */
! #define PANIC		22			/* take down the other backends with me */
  
   /* #define DEBUG DEBUG1 */	/* Backward compatibility with pre-7.3 */
  
--- 32,64 ----
  #define DEBUG1		14			/* used by GUC debug_* variables */
  #define LOG			15			/* Server operational messages; sent only to
  								 * server log by default. */
! #define COMMERROR	(LOG|LOG_NO_CLIENT)/* Client communication problems; same as LOG
  								 * for server reporting, but never sent to
! 								 * client. For backward compatibility this is
! 								 * is available without explicitly specifying
! 								 * LOG_NO_CLIENT.*/
! #define INFO		16			/* Messages specifically requested by user (eg
  								 * VACUUM VERBOSE output); always sent to
  								 * client regardless of client_min_messages,
  								 * but by default not sent to server log. */
! #define NOTICE		17			/* Helpful messages to users about query
  								 * operation; sent to client and server log by
  								 * default. */
! #define WARNING		18			/* Warnings.  NOTICE is for expected messages
  								 * like implicit sequence creation by SERIAL.
  								 * WARNING is for unexpected messages. */
! #define ERROR		19			/* user error - abort transaction; return to
  								 * known state */
  /* Save ERROR value in PGERROR so it can be restored when Win32 includes
   * modify it.  We have to use a constant rather than ERROR because macros
   * are expanded only when referenced outside macros.
   */
  #ifdef WIN32
! #define PGERROR		19
  #endif
! #define FATAL		20			/* fatal error - abort process */
! #define PANIC		21			/* take down the other backends with me */
! 
  
   /* #define DEBUG DEBUG1 */	/* Backward compatibility with pre-7.3 */
  
*************** extern PGDLLIMPORT sigjmp_buf *PG_except
*** 291,296 ****
--- 301,307 ----
  typedef struct ErrorData
  {
  	int			elevel;			/* error level */
+ 	int			eflags;			/* error flags */
  	bool		output_to_server;		/* will report to server log? */
  	bool		output_to_client;		/* will report to client? */
  	bool		show_funcname;	/* true to force funcname inclusion */
-- 
1.7.3.rc1.5.g73aa2

0003-Enable-rethrowing-errors-which-occured-while-IDLE-IN.patchtext/x-patch; charset=UTF-8; name=0003-Enable-rethrowing-errors-which-occured-while-IDLE-IN.patchDownload
From 8d39abbeea8362048eb437289fab718ea8ef23f3 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 30 Oct 2010 10:36:41 +0200
Subject: [PATCH 3/3] Enable rethrowing errors which occured while IDLE IN TRANSACTION state when explicitly told so via a new error flag.

This makes cancelling an idle transactions more visible to the cancelled connection so it can react more sensibly than just reacting to an ominous "in failed transaction" error.
---
 src/backend/tcop/fastpath.c    |    5 +--
 src/backend/tcop/postgres.c    |   61 +++++++++++++++++----------------------
 src/backend/utils/error/elog.c |   23 +++++++++++++++
 src/include/tcop/tcopprot.h    |    4 ++-
 src/include/utils/elog.h       |    2 +
 5 files changed, 56 insertions(+), 39 deletions(-)

diff --git a/src/backend/tcop/fastpath.c b/src/backend/tcop/fastpath.c
index af58e4e..ed6f02a 100644
*** a/src/backend/tcop/fastpath.c
--- b/src/backend/tcop/fastpath.c
*************** HandleFunctionRequest(StringInfo msgBuf)
*** 298,307 ****
  	 * won't lose sync with the frontend.
  	 */
  	if (IsAbortedTransactionBlockState())
! 		ereport(ERROR,
! 				(errcode(ERRCODE_IN_FAILED_SQL_TRANSACTION),
! 				 errmsg("current transaction is aborted, "
! 						"commands ignored until end of transaction block")));
  
  	/*
  	 * Now that we know we are in a valid transaction, set snapshot in case
--- 298,304 ----
  	 * won't lose sync with the frontend.
  	 */
  	if (IsAbortedTransactionBlockState())
! 		RaiseInFailedTransactionError();
  
  	/*
  	 * Now that we know we are in a valid transaction, set snapshot in case
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 505d136..f144101 100644
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
*************** static ProcSignalReason RecoveryConflict
*** 183,188 ****
--- 183,195 ----
   */
  static bool silent_error_while_idle = false;
  
+ /*
+  * Which error occured while we were idle? We want to rethrow it in
+  * the face of the next query - except if it is a ROLLBACK/COMMIT - it
+  * will silently be swallowed there...
+  */
+ ErrorData *silent_error_while_idle_edata;
+ 
  /* ----------------------------------------------------------------
   *		decls for routines only used in this file
   * ----------------------------------------------------------------
*************** exec_simple_query(const char *query_stri
*** 952,962 ****
  		 */
  		if (IsAbortedTransactionBlockState() &&
  			!IsTransactionExitStmt(parsetree))
! 			ereport(ERROR,
! 					(errcode(ERRCODE_IN_FAILED_SQL_TRANSACTION),
! 					 errmsg("current transaction is aborted, "
! 						  "commands ignored until end of transaction block"),
! 					 errdetail_abort()));
  
  		/* Make sure we are in a transaction command */
  		start_xact_command();
--- 959,965 ----
  		 */
  		if (IsAbortedTransactionBlockState() &&
  			!IsTransactionExitStmt(parsetree))
! 			RaiseInFailedTransactionError();
  
  		/* Make sure we are in a transaction command */
  		start_xact_command();
*************** exec_parse_message(const char *query_str
*** 1262,1273 ****
  		 */
  		if (IsAbortedTransactionBlockState() &&
  			!IsTransactionExitStmt(raw_parse_tree))
! 			ereport(ERROR,
! 					(errcode(ERRCODE_IN_FAILED_SQL_TRANSACTION),
! 					 errmsg("current transaction is aborted, "
! 						  "commands ignored until end of transaction block"),
! 					 errdetail_abort()));
! 
  		/*
  		 * Set up a snapshot if parse analysis/planning will need one.
  		 */
--- 1265,1271 ----
  		 */
  		if (IsAbortedTransactionBlockState() &&
  			!IsTransactionExitStmt(raw_parse_tree))
! 			RaiseInFailedTransactionError();
  		/*
  		 * Set up a snapshot if parse analysis/planning will need one.
  		 */
*************** exec_bind_message(StringInfo input_messa
*** 1543,1554 ****
  	if (IsAbortedTransactionBlockState() &&
  		(!IsTransactionExitStmt(psrc->raw_parse_tree) ||
  		 numParams != 0))
! 		ereport(ERROR,
! 				(errcode(ERRCODE_IN_FAILED_SQL_TRANSACTION),
! 				 errmsg("current transaction is aborted, "
! 						"commands ignored until end of transaction block"),
! 				 errdetail_abort()));
! 
  	/*
  	 * Create the portal.  Allow silent replacement of an existing portal only
  	 * if the unnamed portal is specified.
--- 1541,1547 ----
  	if (IsAbortedTransactionBlockState() &&
  		(!IsTransactionExitStmt(psrc->raw_parse_tree) ||
  		 numParams != 0))
! 		RaiseInFailedTransactionError();
  	/*
  	 * Create the portal.  Allow silent replacement of an existing portal only
  	 * if the unnamed portal is specified.
*************** exec_execute_message(const char *portal_
*** 1985,1995 ****
  	 */
  	if (IsAbortedTransactionBlockState() &&
  		!IsTransactionExitStmtList(portal->stmts))
! 		ereport(ERROR,
! 				(errcode(ERRCODE_IN_FAILED_SQL_TRANSACTION),
! 				 errmsg("current transaction is aborted, "
! 						"commands ignored until end of transaction block"),
! 				 errdetail_abort()));
  
  	/* Check for cancel signal before we start execution */
  	CHECK_FOR_INTERRUPTS();
--- 1978,1984 ----
  	 */
  	if (IsAbortedTransactionBlockState() &&
  		!IsTransactionExitStmtList(portal->stmts))
! 		RaiseInFailedTransactionError();
  
  	/* Check for cancel signal before we start execution */
  	CHECK_FOR_INTERRUPTS();
*************** exec_describe_statement_message(const ch
*** 2353,2363 ****
  	 */
  	if (IsAbortedTransactionBlockState() &&
  		psrc->resultDesc)
! 		ereport(ERROR,
! 				(errcode(ERRCODE_IN_FAILED_SQL_TRANSACTION),
! 				 errmsg("current transaction is aborted, "
! 						"commands ignored until end of transaction block"),
! 				 errdetail_abort()));
  
  	if (whereToSendOutput != DestRemote)
  		return;					/* can't actually do anything... */
--- 2342,2348 ----
  	 */
  	if (IsAbortedTransactionBlockState() &&
  		psrc->resultDesc)
! 		RaiseInFailedTransactionError();
  
  	if (whereToSendOutput != DestRemote)
  		return;					/* can't actually do anything... */
*************** exec_describe_portal_message(const char
*** 2434,2444 ****
  	 */
  	if (IsAbortedTransactionBlockState() &&
  		portal->tupDesc)
! 		ereport(ERROR,
! 				(errcode(ERRCODE_IN_FAILED_SQL_TRANSACTION),
! 				 errmsg("current transaction is aborted, "
! 						"commands ignored until end of transaction block"),
! 				 errdetail_abort()));
  
  	if (whereToSendOutput != DestRemote)
  		return;					/* can't actually do anything... */
--- 2419,2425 ----
  	 */
  	if (IsAbortedTransactionBlockState() &&
  		portal->tupDesc)
! 		RaiseInFailedTransactionError();
  
  	if (whereToSendOutput != DestRemote)
  		return;					/* can't actually do anything... */
*************** IsTransactionStmtList(List *parseTrees)
*** 2571,2576 ****
--- 2552,2569 ----
  	return false;
  }
  
+ void
+ RaiseInFailedTransactionError(void){
+ 	if(silent_error_while_idle_edata){
+ 		ReThrowError(silent_error_while_idle_edata);
+ 	}
+ 	ereport(ERROR,
+ 	        (errcode(ERRCODE_IN_FAILED_SQL_TRANSACTION),
+ 	         errmsg("current transaction is aborted, "
+ 	                "commands ignored until end of transaction block"),
+ 	         errdetail_abort()));
+ }
+ 
  /* Release any existing unnamed prepared statement */
  static void
  drop_unnamed_stmt(void)
*************** ProcessInterrupts(void)
*** 2967,2973 ****
  				 * because that would be unexpected as well.
  				 */
  				silent_error_while_idle = true;
! 				error |= LOG_NO_CLIENT;
  
  			}
  
--- 2960,2966 ----
  				 * because that would be unexpected as well.
  				 */
  				silent_error_while_idle = true;
! 				error |= LOG_NO_CLIENT|LOG_RE_THROW_AFTER_SYNC;
  
  			}
  
*************** ProcessInterrupts(void)
*** 2985,2991 ****
  		if (DoingCommandRead)
  		{
  			silent_error_while_idle = true;
! 			error |= LOG_NO_CLIENT;
  		}
  
  		ImmediateInterruptOK = false;		/* not idle anymore */
--- 2978,2984 ----
  		if (DoingCommandRead)
  		{
  			silent_error_while_idle = true;
! 			error |= LOG_NO_CLIENT|LOG_RE_THROW_AFTER_SYNC;
  		}
  
  		ImmediateInterruptOK = false;		/* not idle anymore */
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 6e174d8..d8895c6 100644
*** a/src/backend/utils/error/elog.c
--- b/src/backend/utils/error/elog.c
*************** EmitErrorReport(void)
*** 1173,1178 ****
--- 1173,1201 ----
  	if (edata->output_to_client && !(edata->eflags & LOG_NO_CLIENT))
  		send_message_to_frontend(edata);
  
+ 	if(silent_error_while_idle_edata){
+ 		FreeErrorData(silent_error_while_idle_edata);
+ 		silent_error_while_idle_edata = 0;
+ 	}
+ 
+ 	if (edata->eflags & LOG_RE_THROW_AFTER_SYNC){
+ 		Assert(edata->elevel >= ERROR);
+ 
+ 		/* The old context is already saved above and reset
+ 		 * below... We cannot store the ErrorData in any other context
+ 		 * than TopMemoryContext - they all may get reset inbetween
+ 		 * two EmitErrorReport() calls.
+ 		 * As there is no convenient point to reset the variable using
+ 		 * any transaction bound variable is not possible without
+ 		 * leaving a stray pointer.
+ 		 */
+ 		MemoryContextSwitchTo(TopMemoryContext);
+ 		silent_error_while_idle_edata = CopyErrorData();
+ 
+ 		/* We dont want to save/rethrow that error again */
+ 		silent_error_while_idle_edata->eflags &= ~(LOG_RE_THROW_AFTER_SYNC|LOG_NO_CLIENT);
+ 	}
+ 
  	MemoryContextSwitchTo(oldcontext);
  	recursion_depth--;
  }
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index f4026ec..b529ce1 100644
*** a/src/include/tcop/tcopprot.h
--- b/src/include/tcop/tcopprot.h
*************** typedef enum
*** 45,50 ****
--- 45,52 ----
  
  extern int	log_statement;
  
+ extern ErrorData *silent_error_while_idle_edata;
+ 
  extern List *pg_parse_and_rewrite(const char *query_string,
  					 Oid *paramTypes, int numParams);
  extern List *pg_parse_query(const char *query_string);
*************** extern void set_debug_options(int debug_
*** 81,85 ****
  extern bool set_plan_disabling_options(const char *arg,
  						   GucContext context, GucSource source);
  extern const char *get_stats_option_name(const char *arg);
! 
  #endif   /* TCOPPROT_H */
--- 83,87 ----
  extern bool set_plan_disabling_options(const char *arg,
  						   GucContext context, GucSource source);
  extern const char *get_stats_option_name(const char *arg);
! extern void RaiseInFailedTransactionError(void);
  #endif   /* TCOPPROT_H */
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 1853e1d..529265a 100644
*** a/src/include/utils/elog.h
--- b/src/include/utils/elog.h
***************
*** 23,28 ****
--- 23,30 ----
                                   * if it would confuse the client at
                                   * that moment */
  
+ #define LOG_RE_THROW_AFTER_SYNC (2<<29)
+ 
  /* Error level codes */
  #define DEBUG5		10			/* Debugging messages, in categories of
  								 * decreasing detail. */
-- 
1.7.3.rc1.5.g73aa2

#5Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andres Freund (#4)
Re: [PATCH] V3: Idle in transaction cancellation

Andres Freund <andres@anarazel.de> wrote:

* You wont see an error if the next command after the IDLE in
transaction is a COMMIT/ROLLBACK. I don*t see any sensible way
around that.

Well, on a ROLLBACK I'm not sure it's a problem. On a COMMIT,
couldn't you call a function to check for it in CommitTransaction
and PrepareTransaction?

-Kevin

#6Andres Freund
andres@anarazel.de
In reply to: Kevin Grittner (#5)
Re: [PATCH] V3: Idle in transaction cancellation

On Tuesday 02 November 2010 18:33:15 Kevin Grittner wrote:

Andres Freund <andres@anarazel.de> wrote:

* You wont see an error if the next command after the IDLE in
transaction is a COMMIT/ROLLBACK. I don*t see any sensible way
around that.

Well, on a ROLLBACK I'm not sure it's a problem. On a COMMIT,
couldn't you call a function to check for it in CommitTransaction
and PrepareTransaction?

Sure, throwing an error somewhere wouldnt be that hard. But at the moment a
COMMIT is always successfull (and just reporting a ROLLBACK in a failed txn).
Doesn't seem to be something to changed lightly.

Does anybody have any idea why COMMIT is allowed there? Seems pretty strange
to me.

Andres

#7Alvaro Herrera
alvherre@commandprompt.com
In reply to: Andres Freund (#6)
Re: [PATCH] V3: Idle in transaction cancellation

Excerpts from Andres Freund's message of mar nov 02 18:36:19 -0300 2010:

On Tuesday 02 November 2010 18:33:15 Kevin Grittner wrote:

Andres Freund <andres@anarazel.de> wrote:

* You wont see an error if the next command after the IDLE in
transaction is a COMMIT/ROLLBACK. I don*t see any sensible way
around that.

Well, on a ROLLBACK I'm not sure it's a problem. On a COMMIT,
couldn't you call a function to check for it in CommitTransaction
and PrepareTransaction?

Sure, throwing an error somewhere wouldnt be that hard. But at the moment a
COMMIT is always successfull (and just reporting a ROLLBACK in a failed txn).
Doesn't seem to be something to changed lightly.

If the user calls COMMIT, it calls EndTransactionBlock, which ends up
calling AbortTransaction. Can't you do it at that point? (Says he who
hasn't looked at the patch at all)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#8Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andres Freund (#6)
Re: [PATCH] V3: Idle in transaction cancellation

Andres Freund <andres@anarazel.de> wrote:

On Tuesday 02 November 2010 18:33:15 Kevin Grittner wrote:

Well, on a ROLLBACK I'm not sure it's a problem. On a COMMIT,
couldn't you call a function to check for it in CommitTransaction
and PrepareTransaction?

Sure, throwing an error somewhere wouldnt be that hard. But at the
moment a COMMIT is always successfull (and just reporting a
ROLLBACK in a failed txn).
Doesn't seem to be something to changed lightly.

Well, I'm looking at using this with the Serializable Snapshot
Isolation (SSI) patch, which can throw an error from a COMMIT, if
the commit completes the conditions necessary for an anomaly to
occur (i.e., the committing transaction is on the rw-conflict *out*
side of a pivot, and it is the first in the set to commit). If we
succeed in enhancing SSI to use lists of rw-conflicted transactions
rather than its current techniques, we might be able to (and find it
desirable to) always commit in that circumstance and roll back some
*other* transaction which is part of the problem. Of course, that
other transaction might be idle at the time, and the next thing *it*
tries to execute *might* be a COMMIT.

So if the SSI patch goes in, there will always be some chance that a
COMMIT can fail, but doing it through the mechanism your patch
provides could improve performance, because we could guarantee that
nobody ever has a serialization failure without first successfully
committing a transaction which wrote to one or more tables. If a
commit fails due to SSI, it is highly desirable that the error use
the serialization failure SQLSTATE, so that an application framework
can know that it is reasonable to retry the transaction.

Does anybody have any idea why COMMIT is allowed there? Seems
pretty strange to me.

So that the "failed transaction" state can be cleared. The
transaction as a whole has failed, but you don't want the connection
to become useless.

-Kevin

#9Andres Freund
andres@anarazel.de
In reply to: Kevin Grittner (#8)
Re: [PATCH] V3: Idle in transaction cancellation

On Tuesday 02 November 2010 22:59:15 Kevin Grittner wrote:

Does anybody have any idea why COMMIT is allowed there? Seems
pretty strange to me.

So that the "failed transaction" state can be cleared. The
transaction as a whole has failed, but you don't want the connection
to become useless.

Well, allowing ROLLBACK is enought there, isnt it?

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#9)
Re: [PATCH] V3: Idle in transaction cancellation

Andres Freund <andres@anarazel.de> writes:

On Tuesday 02 November 2010 22:59:15 Kevin Grittner wrote:

Does anybody have any idea why COMMIT is allowed there? Seems
pretty strange to me.

So that the "failed transaction" state can be cleared. The
transaction as a whole has failed, but you don't want the connection
to become useless.

Well, allowing ROLLBACK is enought there, isnt it?

The client has no reason to think the transaction has failed, so what
it's going to send is COMMIT, not ROLLBACK. From the point of view of
the client, this should look like its COMMIT failed (or in general its
next command failed); not like there was some magic action-at-a-distance
on the state of its transaction.

Now you could argue that if you send COMMIT, and that fails, you should
have to send ROLLBACK to get to an idle state ... but that's not how
this ever worked before, and I don't think it's what the SQL standard
expects either. After a COMMIT, you're out of the transaction either
way.

regards, tom lane

#11Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andres Freund (#4)
Re: [PATCH] V3: Idle in transaction cancellation

Andres Freund <andres@anarazel.de> wrote:

Ok, I implemented that capability

I applied all three patches with minor offsets, and it builds, but
several regression tests fail. I backed out the patches in reverse
order and confirmed that while the regression tests pass without
any of these patches, they fail with just the first, the first and
the second, or all three patches.

If you're not seeing the same thing there, I'll be happy to provide
the details.

-Kevin

#12Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#11)
Re: [PATCH] V3: Idle in transaction cancellation

I wrote:

I applied all three patches with minor offsets, and it builds, but
several regression tests fail.

Sorry, after sending that I realized I hadn't done a make distclean.
After that it passes. Please ignore the previous post.

-Kevin

#13Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andres Freund (#4)
Re: V3: Idle in transaction cancellation

Andres Freund <andres@anarazel.de> wrote:

On Tuesday 19 October 2010 16:18:29 Kevin Grittner wrote:

For SSI purposes, it would be highly desirable to be able to set
the SQLSTATE and message generated when the canceled transaction
terminates.

Ok, I implemented that capability, but the patch feels somewhat
wrong to me,

so its a separate patch on top the others:

The patch is in context diff format, applies with minor offsets,
compiles and passes regression tests.

I have to admit that after reading the patch, I think I previously
misunderstood the scope of it. Am I correct in reading that the
main thrust of this is to improve error handling on standbys? Is
there any provision for one backend to cause a *different* backend
which is idle in a transaction to terminate cleanly when it attempts
to process its next statement? (That is what I was hoping to find,
for use in the SSI patch.)

Anyway, if the third patch file is only there because of my request,
I think it might be best to focus on the first two as a solution for
the standby issues this was originally meant to address, and then to
look at an API for the usage I have in mind after that is settled.

-Kevin

#14Andres Freund
andres@anarazel.de
In reply to: Kevin Grittner (#13)
Re: V3: Idle in transaction cancellation

On Thursday 02 December 2010 00:48:53 Kevin Grittner wrote:

Andres Freund <andres@anarazel.de> wrote:

On Tuesday 19 October 2010 16:18:29 Kevin Grittner wrote:

For SSI purposes, it would be highly desirable to be able to set
the SQLSTATE and message generated when the canceled transaction
terminates.

Ok, I implemented that capability, but the patch feels somewhat
wrong to me,

so its a separate patch on top the others:

The patch is in context diff format, applies with minor offsets,
compiles and passes regression tests.

I have to admit that after reading the patch, I think I previously
misunderstood the scope of it. Am I correct in reading that the
main thrust of this is to improve error handling on standbys? Is
there any provision for one backend to cause a *different* backend
which is idle in a transaction to terminate cleanly when it attempts
to process its next statement? (That is what I was hoping to find,
for use in the SSI patch.)

Do you wan't to terminate it immediately or on next statement?

You might want to check out SendProcSignal() et al.

Anyway, if the third patch file is only there because of my request,
I think it might be best to focus on the first two as a solution for
the standby issues this was originally meant to address, and then to
look at an API for the usage I have in mind after that is settled.

Besides that I dont like the implementation very much, I think its generally a
good idea...

#15Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andres Freund (#14)
Re: V3: Idle in transaction cancellation

Andres Freund <andres@anarazel.de> wrote:

Do you wan't to terminate it immediately or on next statement?

I want to have one backend terminate the transaction on another
backend as soon as practicable. If a query is active, it would be
best if it was canceled. It appears that if it is "idle in
transaction" there is a need to wait for the next request. It would
be a big plus for the backend requesting the cancellation to be able
to specify the SQLSTATE, message, etc., used by the other backend on
failure.

You might want to check out SendProcSignal() et al.

Will take a look.

Besides that I dont like the implementation very much, I think its
generally a good idea...

OK. While browsing around, I'll try to think of an alternative
approach, but this is new territory for me -- I've been learning
about areas in the code at need so far....

-Kevin

#16Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andres Freund (#14)
Re: V3: Idle in transaction cancellation

Andres Freund <andres@anarazel.de> wrote:

On Thursday 02 December 2010 00:48:53 Kevin Grittner wrote:

Is there any provision for one backend to cause a *different*
backend which is idle in a transaction to terminate cleanly when
it attempts to process its next statement?

You might want to check out SendProcSignal() et al.

Yeah, that was the missing link for me. Thanks!

Anyway, if the third patch file is only there because of my
request, I think it might be best to focus on the first two as a
solution for the standby issues this was originally meant to
address, and then to look at an API for the usage I have in mind
after that is settled.

Besides that I dont like the implementation very much, I think its
generally a good idea...

Is it sane to leave the implementation of this for the specific
areas which need it (like SSI), or do you think a generalized API
for it is needed?

I'll look at it more closely tonight, but at first scan it appears
that just reserving one flag for PROCSIG_SERIALIZATION_FAILURE (or
PROCSIG_SSI_CANCELLATION?) would allow me to code up the desired
behavior in a function called from procsignal_sigusr1_handler. I
can arrange for passing any needed detail through the SSI-controlled
structures somehow. Would that allow you to skip the parts you
didn't like?

It looks as though this is something which could easily be split off
as a separate patch within the SSI effort.

-Kevin

#17Alvaro Herrera
alvherre@commandprompt.com
In reply to: Andres Freund (#4)
Re: [PATCH] V3: Idle in transaction cancellation

Excerpts from Andres Freund's message of sáb oct 30 05:49:21 -0300 2010:

Ill set this up for the next commitfest, I don't think I can do much more
without further input.

Are you reserving about 20 bits for levels, and 12 for flags? Given the
relatively scarce growth of levels, I think we could live with about 6
or 7 bits for level, rest for flags.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#18Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#17)
Re: [PATCH] V3: Idle in transaction cancellation

On Thursday 02 December 2010 22:21:37 Alvaro Herrera wrote:

Excerpts from Andres Freund's message of sáb oct 30 05:49:21 -0300 2010:

Ill set this up for the next commitfest, I don't think I can do much
more without further input.

Are you reserving about 20 bits for levels, and 12 for flags? Given the
relatively scarce growth of levels, I think we could live with about 6
or 7 bits for level, rest for flags.

The number I picked was absolutely arbitrary I admit. Neither did I think it
would be likely to see more levels, nor did I forsee many flags, so I just
chose some number I liked in that certain moment ;-)

Andres

#19Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andres Freund (#18)
Re: V3: Idle in transaction cancellation

"Kevin Grittner" wrote:

Andres Freund wrote:

On Thursday 02 December 2010 00:48:53 Kevin Grittner wrote:

Is there any provision for one backend to cause a *different*
backend which is idle in a transaction to terminate cleanly when
it attempts to process its next statement?

You might want to check out SendProcSignal() et al.

Yeah, that was the missing link for me. Thanks!

OK, I have confirmed that the patch set Andres posted compiles and
passes regression tests (both `make check` and `make
installcheck-world`), and that the API provided works for at least
one purpose -- allowing SSI conflict detection to cancel an a
transaction other than the one on which the failure is detected, to
allow a guarantee that there will be no rollbacks without first
having a successful commit of a transaction which wrote data.

I created a "skip" branch in my git repo to hold the work for this
test. A single commit there which includes all three of Andres's
patches plus the code needed to implement the SSI feature is at:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=
commitdiff;h=0e1e86830b49dab51ff80619a580527260f4e186

I now wish I'd committed Andres's patches separately first. If
anyone likes, I could try to put something together to separate
things out like that.

I did notice one picky thing while working on this -- the left brace
for some if statements wasn't placed consistently with surrounding
code. Rather than:

if (x)
{
a();
b();
}

the patch tends to do this:

if (x){
a();
b();
}

I agree with Andres that the API to use it is a little bit spread out
and non-obvious, but I don't see a nicer way to do it. Maybe someone
else can think of something.

I didn't test the patch in relation to the original purpose (for
which only the first two of the three patch files should be needed)
-- error handling with hot standby. I'm hoping that someone familiar
with the issues there can test that part of it.

-Kevin

#20Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#4)
Re: [PATCH] V3: Idle in transaction cancellation

On Sat, Oct 30, 2010 at 4:49 AM, Andres Freund <andres@anarazel.de> wrote:

Here is a proposed patch which enables cancellation of $subject.

Disclaimer: This isn't my area of expertise, so take the below with a
grain or seven of salt.

It sort of looks to me like the LOG_NO_CLIENT error flag and the
silent_error_while_idle flag are trying to cooperate to get the effect
of throwing an error without actually throwing an error. I'm
wondering if it would be at all sensible to do that more directly by
making ProcessInterrupts() call AbortCurrentTransaction() in this
case.

Assuming that works at all, it would presumably mean that the client
would thereafter get something like this:

current transaction is aborted, commands ignored until end of transaction block

...which might be thought unhelpful. But that could be fixed either
by modifying errdetail_abort() or perhaps even by abstracting the
"current transaction is aborted, commands..." message into a function
that could produce an entirely different message if on either the
first or all calls within a given transaction.

I'm not sure if this would work, or if it's better. I'm just throwing
it out there, because the current approach looks a little grotty to
me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#20)
Re: [PATCH] V3: Idle in transaction cancellation

On Wednesday 15 December 2010 02:20:31 Robert Haas wrote:

On Sat, Oct 30, 2010 at 4:49 AM, Andres Freund <andres@anarazel.de> wrote:

Here is a proposed patch which enables cancellation of $subject.

Disclaimer: This isn't my area of expertise, so take the below with a
grain or seven of salt.

I don't know whos area of expertise it is except maybe, surprise, surprise,
Toms.

It sort of looks to me like the LOG_NO_CLIENT error flag and the
silent_error_while_idle flag are trying to cooperate to get the effect
of throwing an error without actually throwing an error. I'm
wondering if it would be at all sensible to do that more directly by
making ProcessInterrupts() call AbortCurrentTransaction() in this
case.

Hm. I think you want the normal server-side error logging continuing to work.

Its not really throwing an error without throwing one - its throwing one
without confusing the heck out of the client because the protocol is not ready
for that. I don't think introducing an "half-error" state is a good idea
because one day the protocol maybe ready to actually transport an error while
idle in txn (I would like to get there).

I'm not sure if this would work, or if it's better. I'm just throwing
it out there, because the current approach looks a little grotty to
me.

I with you on the grotty aspect... On the other hand the whole code is not
exactly nice...

Andres

#22Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#21)
Re: [PATCH] V3: Idle in transaction cancellation

On Wed, Dec 15, 2010 at 7:13 AM, Andres Freund <andres@anarazel.de> wrote:

It sort of looks to me like the LOG_NO_CLIENT error flag and the
silent_error_while_idle flag are trying to cooperate to get the effect
of throwing an error without actually throwing an error.  I'm
wondering if it would be at all sensible to do that more directly by
making ProcessInterrupts() call AbortCurrentTransaction() in this
case.

Hm. I think you want the normal server-side error logging continuing to work.

I was thinking we could get around that by doing elog(LOG), but I
guess that doesn't quite work either since we don't know what
client_min_messages is. Hrm...

I'm not sure if this would work, or if it's better.  I'm just throwing
it out there, because the current approach looks a little grotty to
me.

I with you on the grotty aspect... On the other hand the whole code is not
exactly nice...

Yeah. I'll try to find some time to think about this some more. It
would sure be nice if we could find a solution that's a bit
conceptually cleaner, even if it basically works the same way as what
you've done here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#23Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#22)
Re: [PATCH] V3: Idle in transaction cancellation

On Wednesday 15 December 2010 13:33:30 Robert Haas wrote:

On Wed, Dec 15, 2010 at 7:13 AM, Andres Freund <andres@anarazel.de> wrote:

It sort of looks to me like the LOG_NO_CLIENT error flag and the
silent_error_while_idle flag are trying to cooperate to get the effect
of throwing an error without actually throwing an error. I'm
wondering if it would be at all sensible to do that more directly by
making ProcessInterrupts() call AbortCurrentTransaction() in this
case.

Hm. I think you want the normal server-side error logging continuing to
work.

I was thinking we could get around that by doing elog(LOG), but I
guess that doesn't quite work either since we don't know what
client_min_messages is. Hrm...

I thought about doing that first. Btw, LOG_NO_CLIENT is just a more abstracted
way of what COMERROR did before...

I'm not sure if this would work, or if it's better. I'm just throwing
it out there, because the current approach looks a little grotty to
me.

I with you on the grotty aspect... On the other hand the whole code is
not exactly nice...

Yeah. I'll try to find some time to think about this some more. It
would sure be nice if we could find a solution that's a bit
conceptually cleaner, even if it basically works the same way as what
you've done here.

I would like that as well. I am not sure you can achieve that in a reasonable
amount of work. At least I couldn't.

Andres

#24Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#23)
Re: [PATCH] V3: Idle in transaction cancellation

On Wed, Dec 15, 2010 at 7:47 AM, Andres Freund <andres@anarazel.de> wrote:

I thought about doing that first. Btw, LOG_NO_CLIENT is just a more abstracted
way of what COMERROR did before...

Hmm, but it must not be quite the same, because that didn't require
the silent_error_while_idle flag.

Yeah.  I'll try to find some time to think about this some more.  It
would sure be nice if we could find a solution that's a bit
conceptually cleaner, even if it basically works the same way as what
you've done here.

I would like that as well. I am not sure you can achieve that in a reasonable
amount of work. At least I couldn't.

Is there a way that errstart() and/or errfinish() can know enough
about the state of the communication with the frontend to decide
whether to suppress edata->output_to_client? In other words, instead
of explicitly passing in a flag that says whether to inform the
client, it would be better for the error-reporting machinery to
intrinsically know whether it's right to send_message_to_frontend().
Otherwise, an error thrown from an unexpected location might not have
the flag set correctly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#25Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#24)
Re: [PATCH] V3: Idle in transaction cancellation

On Wednesday 15 December 2010 15:40:20 Robert Haas wrote:

On Wed, Dec 15, 2010 at 7:47 AM, Andres Freund <andres@anarazel.de> wrote:

I thought about doing that first. Btw, LOG_NO_CLIENT is just a more
abstracted way of what COMERROR did before...

Hmm, but it must not be quite the same, because that didn't require
the silent_error_while_idle flag.

True. Thats a separate thing.

Yeah. I'll try to find some time to think about this some more. It
would sure be nice if we could find a solution that's a bit
conceptually cleaner, even if it basically works the same way as what
you've done here.

I would like that as well. I am not sure you can achieve that in a
reasonable amount of work. At least I couldn't.

Is there a way that errstart() and/or errfinish() can know enough
about the state of the communication with the frontend to decide
whether to suppress edata->output_to_client? In other words, instead
of explicitly passing in a flag that says whether to inform the
client, it would be better for the error-reporting machinery to
intrinsically know whether it's right to send_message_to_frontend().
Otherwise, an error thrown from an unexpected location might not have
the flag set correctly.

Currently there are no other locations where we errors could get thrown at
that point but I see where youre going.

You could use "DoingCommandRead" to solve that specific use-case, but the
COMERROR ones I don't see as being replaced that easily.
We could introduce something like

NoLogToClientBegin();
NoLogToClientEnd();
int NoLogToClientCntr = 0;

but that sounds like overdoing it for me.

Andres

#26Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#25)
Re: [PATCH] V3: Idle in transaction cancellation

On Wed, Dec 15, 2010 at 10:02 AM, Andres Freund <andres@anarazel.de> wrote:

Is there a way that errstart() and/or errfinish() can know enough
about the state of the communication with the frontend to decide
whether to suppress edata->output_to_client?  In other words, instead
of explicitly passing in a flag that says whether to inform the
client, it would be better for the error-reporting machinery to
intrinsically know whether it's right to send_message_to_frontend().
Otherwise, an error thrown from an unexpected location might not have
the flag set correctly.

You could use "DoingCommandRead" to solve that specific use-case, but the
COMERROR ones I don't see as being replaced that easily.

Well, again, I'm not an expert on this, but why would we need to unify
the two mechanisms? Asynchronous rollbacks (what we're trying to do
here) and protocol violations (which is what COMMERROR looks to be
used for) are really sort of different. I'm not really sure we need
to handle them in the same way. Let's think about a recovery conflict
where ProcessInterrupts() has been called. Right now, if that
situation occurs and we are not DoingCommandRead, then we just throw
an error. That's either safe, or an already-existing bug. So the
question is what to do if we ARE DoingCommandRead. Right now, we
throw a fatal error. There's no comment explaining why, but I'm
guessing that the reason is the same problem we're trying to fix here:
the protocol state gets confused - but if we throw a FATAL then the
client goes away and we don't have to worry about it any more. Our
goal here, as I understand it, is to handle that case without a FATAL.

So let's see... if we're DoingCommandRead at that point, and
whereToSendOutput == DestRemote then we set whereToSendOutput =
DestNone before throwing the error, and restore it just after we reset
DoingCommandRead? <stabs blindly at target>

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#27Greg Smith
greg@2ndquadrant.com
In reply to: Andres Freund (#18)
Re: [PATCH] V3: Idle in transaction cancellation

Andres Freund wrote:

On Thursday 02 December 2010 22:21:37 Alvaro Herrera wrote:

Excerpts from Andres Freund's message of sáb oct 30 05:49:21 -0300 2010:

Ill set this up for the next commitfest, I don't think I can do much
more without further input.

Are you reserving about 20 bits for levels, and 12 for flags? Given the
relatively scarce growth of levels, I think we could live with about 6
or 7 bits for level, rest for flags.

The number I picked was absolutely arbitrary I admit. Neither did I think it
would be likely to see more levels, nor did I forsee many flags, so I just
chose some number I liked in that certain moment ;-)

This bit of detail seems to have died down without being resolved;
bumping it to add a reminder about that.

I count four issues of various sizes left with this patch right now:

1) This levels bit
2) Can the approach used be simplified or the code made cleaner?
3) What is the interaction with Hot Standby error handling?
4) The usual code formatting nitpicking, Kevin mentioned braces being an
issue

Robert is already thinking about (2); I'll keep an eye out for someone
who can test (3) now that it's been identified as a concern; and the
other two are pretty small details once those are crossed. I don't see
this as being ready to commit just yet though, so I don't see this going
anywhere other than returned for now; will mark it as such. Hopefully
this will gather enough additional review to continue moving forward now
that the main issues are identified.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

#28Andres Freund
andres@anarazel.de
In reply to: Greg Smith (#27)
Re: [PATCH] V3: Idle in transaction cancellation

Hi Greg,

On Thursday 16 December 2010 13:32:46 Greg Smith wrote:

Andres Freund wrote:

On Thursday 02 December 2010 22:21:37 Alvaro Herrera wrote:

Excerpts from Andres Freund's message of sáb oct 30 05:49:21 -0300 2010:

Ill set this up for the next commitfest, I don't think I can do much
more without further input.

Are you reserving about 20 bits for levels, and 12 for flags? Given the
relatively scarce growth of levels, I think we could live with about 6
or 7 bits for level, rest for flags.

The number I picked was absolutely arbitrary I admit. Neither did I think
it would be likely to see more levels, nor did I forsee many flags, so I
just chose some number I liked in that certain moment ;-)

This bit of detail seems to have died down without being resolved;
bumping it to add a reminder about that.
I count four issues of various sizes left with this patch right now:

1) This levels bit

If we go with that approach I think the amount of bits reserved for both is
big enough to never be reached in reality. Also there is no backward-compat
issue in changing as were not ABI stable between major releases anyway...

2) Can the approach used be simplified or the code made cleaner?
3) What is the interaction with Hot Standby error handling?

It works for me - not to say that independent testing wouldn't be good though.

4) The usual code formatting nitpicking, Kevin mentioned braces being an
issue

Will redo if the other issues are cleared.

Andres

#29Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#26)
Re: [PATCH] V3: Idle in transaction cancellation

On Wednesday 15 December 2010 20:12:45 Robert Haas wrote:

On Wed, Dec 15, 2010 at 10:02 AM, Andres Freund <andres@anarazel.de> wrote:

Is there a way that errstart() and/or errfinish() can know enough
about the state of the communication with the frontend to decide
whether to suppress edata->output_to_client? In other words, instead
of explicitly passing in a flag that says whether to inform the
client, it would be better for the error-reporting machinery to
intrinsically know whether it's right to send_message_to_frontend().
Otherwise, an error thrown from an unexpected location might not have
the flag set correctly.

You could use "DoingCommandRead" to solve that specific use-case, but the
COMERROR ones I don't see as being replaced that easily.

Well, again, I'm not an expert on this, but why would we need to unify
the two mechanisms? Asynchronous rollbacks (what we're trying to do
here) and protocol violations (which is what COMMERROR looks to be
used for) are really sort of different.

I think its not only protocol violations. Data-Leakage during auth are also
handled with it I think.

I'm not really sure we need to handle them in the same way. Let's think
about a recovery conflict where ProcessInterrupts() has been called. Right
now, if that situation occurs and we are not DoingCommandRead, then we just
throw an error. That's either safe, or an already-existing bug.

That should be safe.

So the question is what to do if we ARE DoingCommandRead. Right now, we
throw a fatal error. There's no comment explaining why, but I'm
guessing that the reason is the same problem we're trying to fix here:
the protocol state gets confused - but if we throw a FATAL then the
client goes away and we don't have to worry about it any more. Our
goal here, as I understand it, is to handle that case without a FATAL.

Yes, thats it.

So let's see... if we're DoingCommandRead at that point, and
whereToSendOutput == DestRemote then we set whereToSendOutput =
DestNone before throwing the error, and restore it just after we reset
DoingCommandRead? <stabs blindly at target>

That won't be enough unfortunately. A transaction-aborted error will get re-
raised before the client is ready to re-accept it. Thats what the
silent_error_while_idle flag is needed for.
Also I think such a simple implementation would cause problems with single
user mode. Now that could get fixed by saving and resetting the old mode - that
might have exception handling problems in turn (COMERRORS during an ssl
connection for example) because it will leave the section without having reset
whereToSendOutput. Sprinkling PG_TRY everywhere hardly seems simpler...

Andres

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Smith (#27)
Re: [PATCH] V3: Idle in transaction cancellation

Greg Smith <greg@2ndquadrant.com> writes:

I count four issues of various sizes left with this patch right now:

1) This levels bit
2) Can the approach used be simplified or the code made cleaner?
3) What is the interaction with Hot Standby error handling?
4) The usual code formatting nitpicking, Kevin mentioned braces being an
issue

You forgot (5) it doesn't work and (6) it's impossibly ugly :-(.

The reason it doesn't work is you can *not* throw a longjmp while in
DoingCommandRead state. This isn't just a matter of not breaking the
protocol at our level; it risks leaving openssl in a confused state,
either violating the ssl protocol or simply with internal state trashed
because it got interrupted in the middle of changing it.

It's possible we could refactor things so we abort the open transaction
while inside the interrupt handler, then queue up an error to be
reported whenever we next get a command (as envisioned in part 0003),
then just return control back to the input stack. But things could get
messy if we get another error to be reported while trying to abort.

In any case I really do not care for flag bits in the elog level field.
That's horrid, and I don't think it's even well matched to the problem.
What we need is for elog.c to understand that we're in a *state* that
forbids transmission to the client; it's not a property of specific
error messages, much less a property that the code reporting the error
should be required to be aware of.

I think a more realistic approach to the problem might be to expose the
DoingCommandRead flag to elog.c, and have it take responsibility for
queuing messages that can't be sent to the client right away. Plus the
above-mentioned refactoring of where transaction abort happens.

regards, tom lane

#31Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#30)
Re: [PATCH] V3: Idle in transaction cancellation

On Thu, Dec 16, 2010 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Smith <greg@2ndquadrant.com> writes:

I count four issues of various sizes left with this patch right now:

1) This levels bit
2) Can the approach used be simplified or the code made cleaner?
3) What is the interaction with Hot Standby error handling?
4) The usual code formatting nitpicking, Kevin mentioned braces being an
issue

You forgot (5) it doesn't work and (6) it's impossibly ugly :-(.

The reason it doesn't work is you can *not* throw a longjmp while in
DoingCommandRead state.  This isn't just a matter of not breaking the
protocol at our level; it risks leaving openssl in a confused state,
either violating the ssl protocol or simply with internal state trashed
because it got interrupted in the middle of changing it.

Ah! An excellent point. Thanks for weighing in on this.

It's possible we could refactor things so we abort the open transaction
while inside the interrupt handler, then queue up an error to be
reported whenever we next get a command (as envisioned in part 0003),
then just return control back to the input stack.  But things could get
messy if we get another error to be reported while trying to abort.

In any case I really do not care for flag bits in the elog level field.
That's horrid, and I don't think it's even well matched to the problem.
What we need is for elog.c to understand that we're in a *state* that
forbids transmission to the client; it's not a property of specific
error messages, much less a property that the code reporting the error
should be required to be aware of.

I think a more realistic approach to the problem might be to expose the
DoingCommandRead flag to elog.c, and have it take responsibility for
queuing messages that can't be sent to the client right away.  Plus the
above-mentioned refactoring of where transaction abort happens.

This is along the lines of what Andres and I have already been groping
towards upthread. But the question of what to do if another error is
encountered while trying to abort the transaction is one that I also
thought about, and I don't see an easy solution. I suppose we could
upgrade such an error to FATAL, given that right now we throw an
*unconditional* FATAL here when DoingCommandRead. That's not
super-appealing, but it might be the most practical solution.

Another thing I don't quite understand is - at what point does the
protocol allow us to emit an error? Suppose that the transaction gets
cancelled due to a conflict with recovery while we're
DoingCommandRead, and then the user now sends us "SELCT 2+2". Are we
going to send them back both errors now, or just one of them? Which
one?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#31)
Re: [PATCH] V3: Idle in transaction cancellation

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 16, 2010 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's possible we could refactor things so we abort the open transaction
while inside the interrupt handler, then queue up an error to be
reported whenever we next get a command (as envisioned in part 0003),
then just return control back to the input stack. �But things could get
messy if we get another error to be reported while trying to abort.

This is along the lines of what Andres and I have already been groping
towards upthread. But the question of what to do if another error is
encountered while trying to abort the transaction is one that I also
thought about, and I don't see an easy solution.

Yeah, it's a bit messy, because you really can't send multiple ERROR
messages to the client when it next sends a query: the protocol says
there'll be at most one. We could either discard all but the first
(or last?) of the queued error events, or add code to elog.c that
somehow merges them into one error event, perhaps by adding info
to the DETAIL field. I'm handwaving there --- I think probably the
first cut should just discard errors after the first, and see how
well that works in practice.

Another thing I don't quite understand is - at what point does the
protocol allow us to emit an error?

Basically, you can send an error in response to a query. In the current
FATAL cases, we're cheating a bit by sending something while idle ---
what will typically happen at the client end is that it will not notice
the input until it sends a query, and then it will see the
already-pending input, which it will think is a (remarkably fast)
response to its query. Or possibly it will bomb out on send() failure
and not ever consume the input at all. But if we want to keep the
connection going, we can't cheat like that.

Suppose that the transaction gets
cancelled due to a conflict with recovery while we're
DoingCommandRead, and then the user now sends us "SELCT 2+2". Are we
going to send them back both errors now, or just one of them? Which
one?

You can only send one, and in that situation you probably want the
cancellation to be reported.

FWIW, I'm not too worried about preserving the existing
recovery-conflict behavior, as I think the odds are at least ten to one
that that code is broken when you look closely enough. I do like the
idea that this patch would provide a better-thought-out framework for
handling the conflict case.

regards, tom lane

#33Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#32)
Re: [PATCH] V3: Idle in transaction cancellation

On Thu, Dec 16, 2010 at 12:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

 I'm handwaving there --- I think probably the
first cut should just discard errors after the first, and see how
well that works in practice.

Seems reasonable.

Another thing I don't quite understand is - at what point does the
protocol allow us to emit an error?

Basically, you can send an error in response to a query.

What about some other message that's not a query?

Suppose that the transaction gets
cancelled due to a conflict with recovery while we're
DoingCommandRead, and then the user now sends us "SELCT 2+2".  Are we
going to send them back both errors now, or just one of them?  Which
one?

You can only send one, and in that situation you probably want the
cancellation to be reported.

What about an elog or ereport with severity < ERROR? Surely there
must at least be provision for multiple non-error messages per
transaction.

FWIW, I'm not too worried about preserving the existing
recovery-conflict behavior, as I think the odds are at least ten to one
that that code is broken when you look closely enough.  I do like the
idea that this patch would provide a better-thought-out framework for
handling the conflict case.

We already have pg_terminate_backend() and pg_cancel_backend(). Are
you imagining a general mechanism like pg_rollback_backend()?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#33)
Re: [PATCH] V3: Idle in transaction cancellation

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 16, 2010 at 12:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Another thing I don't quite understand is - at what point does the
protocol allow us to emit an error?

Basically, you can send an error in response to a query.

What about some other message that's not a query?

There aren't any (I'm using a loose definition of "query" here --- any
client request counts).

You can only send one, and in that situation you probably want the
cancellation to be reported.

What about an elog or ereport with severity < ERROR? Surely there
must at least be provision for multiple non-error messages per
transaction.

You can send NOTICEs freely, but downgrading an error to a notice is
probably not a great solution --- keep in mind that some clients just
discard those altogether.

FWIW, I'm not too worried about preserving the existing
recovery-conflict behavior, as I think the odds are at least ten to one
that that code is broken when you look closely enough. �I do like the
idea that this patch would provide a better-thought-out framework for
handling the conflict case.

We already have pg_terminate_backend() and pg_cancel_backend(). Are
you imagining a general mechanism like pg_rollback_backend()?

No, not really, I'm just concerned about the fact that it's trying to
send a message while in DoingCommandRead state. FE/BE protocol
considerations aside, that's likely to break if using SSL, because who
knows where we've interrupted openssl. In fairness, the various
pre-existing FATAL-interrupt cases have that problem already, but I was
willing to live with it for things that don't happen during normal
operation.

regards, tom lane

#35Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#34)
Re: [PATCH] V3: Idle in transaction cancellation

On Thu, Dec 16, 2010 at 1:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 16, 2010 at 12:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Another thing I don't quite understand is - at what point does the
protocol allow us to emit an error?

Basically, you can send an error in response to a query.

What about some other message that's not a query?

There aren't any (I'm using a loose definition of "query" here --- any
client request counts).

OK.

You can only send one, and in that situation you probably want the
cancellation to be reported.

What about an elog or ereport with severity < ERROR?  Surely there
must at least be provision for multiple non-error messages per
transaction.

You can send NOTICEs freely, but downgrading an error to a notice is
probably not a great solution --- keep in mind that some clients just
discard those altogether.

Yeah, I wasn't proposing that, just trying to understand the rules.

FWIW, I'm not too worried about preserving the existing
recovery-conflict behavior, as I think the odds are at least ten to one
that that code is broken when you look closely enough.  I do like the
idea that this patch would provide a better-thought-out framework for
handling the conflict case.

We already have pg_terminate_backend() and pg_cancel_backend().  Are
you imagining a general mechanism like pg_rollback_backend()?

No, not really, I'm just concerned about the fact that it's trying to
send a message while in DoingCommandRead state.  FE/BE protocol
considerations aside, that's likely to break if using SSL, because who
knows where we've interrupted openssl.  In fairness, the various
pre-existing FATAL-interrupt cases have that problem already, but I was
willing to live with it for things that don't happen during normal
operation.

Hmm. It's seeming to me that what we want to do is something like this:

1. If an error is thrown while DoingCommandRead, it gets upgraded to
FATAL. I don't think we have much choice about this because, per your
previous comments, we can't longjmp() here without risking protocol
breakage, and we certainly can't return from an elog(ERROR) or
ereport(ERROR).

2. If a recovery conflict arrives while DoingCommandRead(), we
AbortCurrentTransaction(). If this runs into unexpected trouble,
it'll turn into a FATAL per #1. If it completes successfully, then
we'll set a flag indicating that upon emerging from DoingCommandRead
state, we need to signal the recovery conflict to the client.

3. When we clear DoingCommandRead, we'll check whether the flag is set
and if so ereport(ERROR).

Step #2 seems like the dangerous part, but I'm not immediately sure
what hazards may be lurking there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#35)
Re: [PATCH] V3: Idle in transaction cancellation

Robert Haas <robertmhaas@gmail.com> writes:

Hmm. It's seeming to me that what we want to do is something like this:

1. If an error is thrown while DoingCommandRead, it gets upgraded to
FATAL. I don't think we have much choice about this because, per your
previous comments, we can't longjmp() here without risking protocol
breakage, and we certainly can't return from an elog(ERROR) or
ereport(ERROR).

Um, if that's the ground rules then we have no advance over the current
situation.

I guess you misunderstood what I said. What I meant was that we cannot
longjmp *out to the outer level*, ie we cannot take control away from
the input stack. We could however have a TRY block inside the interrupt
handler that catches and handles (queues) any errors occurring during
transaction abort. As long as we eventually return control to openssl
I think it should work. (Hm, but I wonder whether there are any hard
timing constraints in the ssl protocol ... although hopefully xact abort
won't ever take long enough that that's a real problem.)

regards, tom lane

#37Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#36)
Re: [PATCH] V3: Idle in transaction cancellation

On Thu, Dec 16, 2010 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Hmm.  It's seeming to me that what we want to do is something like this:

1. If an error is thrown while DoingCommandRead, it gets upgraded to
FATAL.  I don't think we have much choice about this because, per your
previous comments, we can't longjmp() here without risking protocol
breakage, and we certainly can't return from an elog(ERROR) or
ereport(ERROR).

Um, if that's the ground rules then we have no advance over the current
situation.

I guess you misunderstood what I said.  What I meant was that we cannot
longjmp *out to the outer level*, ie we cannot take control away from
the input stack.  We could however have a TRY block inside the interrupt
handler that catches and handles (queues) any errors occurring during
transaction abort.  As long as we eventually return control to openssl
I think it should work.

Is there any real advantage to that? How often do we hit an error
trying to abort a transaction? And how will we report the error
anyway? I thought the next thing we'd report would be the recovery
conflict, not any bizarre can't-abort-the-transaction scenario.

(Hm, but I wonder whether there are any hard
timing constraints in the ssl protocol ... although hopefully xact abort
won't ever take long enough that that's a real problem.)

That would be incredibly broken.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#37)
Re: [PATCH] V3: Idle in transaction cancellation

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 16, 2010 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I guess you misunderstood what I said. �What I meant was that we cannot
longjmp *out to the outer level*, ie we cannot take control away from
the input stack. �We could however have a TRY block inside the interrupt
handler that catches and handles (queues) any errors occurring during
transaction abort. �As long as we eventually return control to openssl
I think it should work.

Is there any real advantage to that?

Not crashing when something funny happens seems like a real advantage to
me. (And an unexpected elog(FATAL) will look like a crash to most
users, even if you want to try to define it as not a crash.)

How often do we hit an error
trying to abort a transaction? And how will we report the error
anyway?

Queue it up and report it at the next opportunity, as per upthread.

I thought the next thing we'd report would be the recovery
conflict, not any bizarre can't-abort-the-transaction scenario.

Well, if we discard it because we're too lazy to implement error message
merging, that's OK. Presumably it'll still get into the postmaster log.

(Hm, but I wonder whether there are any hard
timing constraints in the ssl protocol ... although hopefully xact abort
won't ever take long enough that that's a real problem.)

That would be incredibly broken.

Think "authentication timeout". I wouldn't be a bit surprised if the
remote end would drop the connection if certain events didn't come back
reasonably promptly. There might even be security reasons for that,
ie, somebody could brute-force a key if you give them long enough.
(But this is all speculation; I don't actually know SSL innards.)

regards, tom lane

#39Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#38)
Re: [PATCH] V3: Idle in transaction cancellation

On Thu, Dec 16, 2010 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I thought the next thing we'd report would be the recovery
conflict, not any bizarre can't-abort-the-transaction scenario.

Well, if we discard it because we're too lazy to implement error message
merging, that's OK.  Presumably it'll still get into the postmaster log.

OK, that's reasonable.

(Hm, but I wonder whether there are any hard
timing constraints in the ssl protocol ... although hopefully xact abort
won't ever take long enough that that's a real problem.)

That would be incredibly broken.

Think "authentication timeout".  I wouldn't be a bit surprised if the
remote end would drop the connection if certain events didn't come back
reasonably promptly.  There might even be security reasons for that,
ie, somebody could brute-force a key if you give them long enough.
(But this is all speculation; I don't actually know SSL innards.)

I would be really surprised if aborting a transaction takes long
enough to mess up SSL. I mean, there could be a network delay at any
time, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#39)
Re: [PATCH] V3: Idle in transaction cancellation

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 16, 2010 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(But this is all speculation; I don't actually know SSL innards.)

I would be really surprised if aborting a transaction takes long
enough to mess up SSL. I mean, there could be a network delay at any
time, too.

Yeah, that's what I think too. I was just wondering if there could be
any situation where xact abort takes minutes.

regards, tom lane

#41Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#40)
Re: [PATCH] V3: Idle in transaction cancellation

Excerpts from Tom Lane's message of jue dic 16 17:54:51 -0300 2010:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 16, 2010 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(But this is all speculation; I don't actually know SSL innards.)

I would be really surprised if aborting a transaction takes long
enough to mess up SSL. I mean, there could be a network delay at any
time, too.

Yeah, that's what I think too. I was just wondering if there could be
any situation where xact abort takes minutes.

Maybe if it's transaction commit there could be a long queue of pending
deferred triggers, but on abort, those things are discarded.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#42Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#38)
Re: [PATCH] V3: Idle in transaction cancellation

On Thursday 16 December 2010 21:41:10 Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 16, 2010 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I guess you misunderstood what I said. �What I meant was that we cannot
longjmp *out to the outer level*, ie we cannot take control away from
the input stack. �We could however have a TRY block inside the interrupt
handler that catches and handles (queues) any errors occurring during
transaction abort. �As long as we eventually return control to openssl
I think it should work.

Is there any real advantage to that?

Not crashing when something funny happens seems like a real advantage to
me. (And an unexpected elog(FATAL) will look like a crash to most
users, even if you want to try to define it as not a crash.)

How often do we hit an error
trying to abort a transaction? And how will we report the error
anyway?

Queue it up and report it at the next opportunity, as per upthread.

I thought the next thing we'd report would be the recovery
conflict, not any bizarre can't-abort-the-transaction scenario.

Well, if we discard it because we're too lazy to implement error message
merging, that's OK. Presumably it'll still get into the postmaster log.

(Hm, but I wonder whether there are any hard
timing constraints in the ssl protocol ... although hopefully xact abort
won't ever take long enough that that's a real problem.)

That would be incredibly broken.

Think "authentication timeout". I wouldn't be a bit surprised if the
remote end would drop the connection if certain events didn't come back
reasonably promptly. There might even be security reasons for that,
ie, somebody could brute-force a key if you give them long enough.
(But this is all speculation; I don't actually know SSL innards.)

I will try to read the thread and make a proposal for a more carefull
implementation - just not today... I think the results would be interesting...

Thanks for the input,

Andres

#43Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andres Freund (#42)
Re: [PATCH] V3: Idle in transaction cancellation

Andres Freund <andres@anarazel.de> wrote:

I will try to read the thread and make a proposal for a more
carefull implementation - just not today... I think the results
would be interesting...

FWIW, the SSI patch that Dan and I are working on can't have a
guarantee that it is immediately safe to retry a transaction which
rolls back with a serialization failure (without potentially failing
again on exactly the same set of transactions) unless there is a
capability such as this "Idle in transaction cancellation" patch
would provide. Safe immediate retry would be a nice guarantee for
SSI to provide.

That being the case, we may not be able to generate the final form
of the SSI patch until a patch for this issue is applied. Obviously
I know that nobody is in a position to make any promises on this,
but I just thought I'd let people know that this issue could
possibly be on the critical path to a timely release -- at least if
that release will include SSI with the safe retry guarantee. (At
least when I'm planning for a release, I like to know such
things....)

-Kevin

#44Alvaro Herrera
alvherre@commandprompt.com
In reply to: Robert Haas (#35)
Re: [PATCH] V3: Idle in transaction cancellation

Is anybody working on this patch?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#45Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Alvaro Herrera (#44)
Re: [PATCH] V3: Idle in transaction cancellation

Alvaro Herrera <alvherre@commandprompt.com> wrote:

Is anybody working on this patch?

I'm not, but I sure hope someone is -- we could *really* use this in
the SSI patch. With something providing the equivalent
functionality to Andres's previous patch, and about one day's work
in the SSI patch, SSI could guarantee that an immediate retry of a
transaction rolled back with a serialization failure would not fail
again on conflicts with the same transaction(s). This would be a
very nice guarantee to be able to make.

-Kevin

#46Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#44)
Re: [PATCH] V3: Idle in transaction cancellation

On Monday, January 03, 2011 03:38:56 PM Alvaro Herrera wrote:

Is anybody working on this patch?

I am. Wont be finished in the next two days though (breakin last night...)

Andres

PS: Alvarro: commandprompt.com doesn't resolv anymore, so I can't send you the
email directly...

#47Alvaro Herrera
alvherre@commandprompt.com
In reply to: Andres Freund (#46)
Re: [PATCH] V3: Idle in transaction cancellation

Excerpts from Andres Freund's message of lun ene 03 12:03:58 -0300 2011:

On Monday, January 03, 2011 03:38:56 PM Alvaro Herrera wrote:

Is anybody working on this patch?

I am. Wont be finished in the next two days though (breakin last night...)

Okay ... I will be moving to a new house this week anyway :-P

PS: Alvarro: commandprompt.com doesn't resolv anymore, so I can't send you the
email directly...

You gotta be kidding --- hmm, oops! Let me find the flamethrower ...

(I guess it's a good thing that my suscription to the list uses a
different email address)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support