Hot Standy introduced problem with query cancel behavior

Started by Kris Jurkaover 16 years ago49 messageshackers
Jump to latest
#1Kris Jurka
books@ejurka.com

The JDBC driver's regression test suite has revealed a change in behavior
introduced by the hot standy patch. Previously when a client sent a
cancel request on an idle connection, nothing happened. Now it sends an
error message "ERROR: canceling statement due to user request". This
confuses the driver's protocol handling and it thinks that the error
message is for the next statement issued.

Attached is a test case.

Kris Jurka

Attachments:

Cancel.javatext/plain; charset=US-ASCII; name=Cancel.javaDownload
#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Kris Jurka (#1)
Re: Hot Standy introduced problem with query cancel behavior

On Sat, 2009-12-26 at 20:15 -0500, Kris Jurka wrote:

The JDBC driver's regression test suite has revealed a change in behavior
introduced by the hot standy patch. Previously when a client sent a
cancel request on an idle connection, nothing happened. Now it sends an
error message "ERROR: canceling statement due to user request". This
confuses the driver's protocol handling and it thinks that the error
message is for the next statement issued.

Attached is a test case.

Thanks for the report. I'll see about a fix.

--
Simon Riggs www.2ndQuadrant.com

#3Joachim Wieland
joe@mcknight.de
In reply to: Simon Riggs (#2)
Re: Hot Standy introduced problem with query cancel behavior

On Sun, Dec 27, 2009 at 10:42 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Thanks for the report. I'll see about a fix.

In the end we are about to use SIGINT for two use cases:

- cancel an idle transaction
- cancel a running query

Previously a backend that was DoingCommandRead == true didn't do
anything upon reception of SIGINT, now it aborts either the running
query or the idle transaction, which is why Kris's example behaves
differently now.

If we use the same signal for both cases, the receiving backend cannot
tell what the intention of the sending backend was. That's why I
proposed to make SIGINT similar to SIGUSR1 where we write a reason to
a shared memory structure first and then send the signal (see
http://archives.postgresql.org/pgsql-hackers/2009-12/msg02067.php from
a few days ago).

There was also some dicussion about how to communicate the
cancellation back to the client when its idle transaction got aborted.
I implemented what I thought was the conclusion of the discussion but
haven't received a reply on it yet.

Joachim

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joachim Wieland (#3)
Re: Hot Standy introduced problem with query cancel behavior

Joachim Wieland <joe@mcknight.de> writes:

If we use the same signal for both cases, the receiving backend cannot
tell what the intention of the sending backend was. That's why I
proposed to make SIGINT similar to SIGUSR1 where we write a reason to
a shared memory structure first and then send the signal (see
http://archives.postgresql.org/pgsql-hackers/2009-12/msg02067.php from
a few days ago).

This seems like a fairly bad idea. One of the intended use-cases is to
be able to manually "kill -INT" a misbehaving backend. Assuming that
there will be valid info about the signal in shared memory will break
that.

regards, tom lane

#5Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#4)
Re: Hot Standy introduced problem with query cancel behavior

On Tuesday 29 December 2009 16:22:54 Tom Lane wrote:

Joachim Wieland <joe@mcknight.de> writes:

If we use the same signal for both cases, the receiving backend cannot
tell what the intention of the sending backend was. That's why I
proposed to make SIGINT similar to SIGUSR1 where we write a reason to
a shared memory structure first and then send the signal (see
http://archives.postgresql.org/pgsql-hackers/2009-12/msg02067.php from
a few days ago).

This seems like a fairly bad idea. One of the intended use-cases is to
be able to manually "kill -INT" a misbehaving backend. Assuming that
there will be valid info about the signal in shared memory will break
that.

Well. That already is the case now. MyProc->recoveryConflictMode is checked to
recognize what kind of conflict is being resolved...

Andres

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#5)
Re: Hot Standy introduced problem with query cancel behavior

Andres Freund <andres@anarazel.de> writes:

On Tuesday 29 December 2009 16:22:54 Tom Lane wrote:

This seems like a fairly bad idea. One of the intended use-cases is to
be able to manually "kill -INT" a misbehaving backend. Assuming that
there will be valid info about the signal in shared memory will break
that.

Well. That already is the case now. MyProc->recoveryConflictMode is checked to
recognize what kind of conflict is being resolved...

In that case, HS has already broken it, and we need to fix it not make
it worse.

My humble opinion is that SIGINT should not be overloaded with multiple
meanings. We already have a multiplexed signal mechanism, which is what
should be used for any additional signal reasons HS may need to
introduce.

regards, tom lane

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Joachim Wieland (#3)
Re: Hot Standy introduced problem with query cancel behavior

On Tue, 2009-12-29 at 14:14 +0100, Joachim Wieland wrote:

haven't received a reply on it yet.

Apologies for that. I replied today, just forgot to hit send until now.

Thanks again for the patch.

--
Simon Riggs www.2ndQuadrant.com

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#6)
Re: Hot Standy introduced problem with query cancel behavior

On Tue, 2009-12-29 at 11:13 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On Tuesday 29 December 2009 16:22:54 Tom Lane wrote:

This seems like a fairly bad idea. One of the intended use-cases is to
be able to manually "kill -INT" a misbehaving backend. Assuming that
there will be valid info about the signal in shared memory will break
that.

Well. That already is the case now. MyProc->recoveryConflictMode is checked to
recognize what kind of conflict is being resolved...

In that case, HS has already broken it, and we need to fix it not make
it worse.

My humble opinion is that SIGINT should not be overloaded with multiple
meanings. We already have a multiplexed signal mechanism, which is what
should be used for any additional signal reasons HS may need to
introduce.

It's a revelation to me, but yes, I see it now and agree.

I'm looking at Fujii-san's multiplexing patch from Jul 31 to rewrite
this code using that mechanism. It sounds like it's a neat fit and it
should get around the bug report from Kris also if it all works.

--
Simon Riggs www.2ndQuadrant.com

#9Joachim Wieland
joe@mcknight.de
In reply to: Tom Lane (#4)
Re: Hot Standy introduced problem with query cancel behavior

On Tue, Dec 29, 2009 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This seems like a fairly bad idea.  One of the intended use-cases is to
be able to manually "kill -INT" a misbehaving backend.  Assuming that
there will be valid info about the signal in shared memory will break
that.

I never intended to change the current behavior.

Actually I wanted to even enhance it by allowing to also cancel an idle
transaction via SIGINT. We are free to define what should happen if there is no
internal reason available because the signal has been sent manually.

We can also use SIGUSR1 of course but then you cannot cancel an idle
transaction just from the commandline. Not sure if this is necessary
though but I would have liked it in the past already (I once used a version of
slony that left transactions idle from time to time...)

Joachim

#10Simon Riggs
simon@2ndQuadrant.com
In reply to: Joachim Wieland (#9)
Re: Hot Standy introduced problem with query cancel behavior

On Wed, 2009-12-30 at 12:05 +0100, Joachim Wieland wrote:

On Tue, Dec 29, 2009 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This seems like a fairly bad idea. One of the intended use-cases is to
be able to manually "kill -INT" a misbehaving backend. Assuming that
there will be valid info about the signal in shared memory will break
that.

I never intended to change the current behavior.

Actually I wanted to even enhance it by allowing to also cancel an idle
transaction via SIGINT. We are free to define what should happen if there is no
internal reason available because the signal has been sent manually.

We can also use SIGUSR1 of course but then you cannot cancel an idle
transaction just from the commandline. Not sure if this is necessary
though but I would have liked it in the past already (I once used a version of
slony that left transactions idle from time to time...)

Andres mentioned this in relation to Startup process sending signals to
backends. Startup needs to in-some cases issue a FATAL error also, which
is a separate issue from the discusion around SIGINT.

I will rework the FATAL case and continue to support you in finding a
solution to the cancel-while-idle case.

--
Simon Riggs www.2ndQuadrant.com

#11Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#8)
Re: Hot Standy introduced problem with query cancel behavior

On Wednesday 30 December 2009 01:13:01 Simon Riggs wrote:

On Tue, 2009-12-29 at 11:13 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On Tuesday 29 December 2009 16:22:54 Tom Lane wrote:

This seems like a fairly bad idea. One of the intended use-cases is
to be able to manually "kill -INT" a misbehaving backend. Assuming
that there will be valid info about the signal in shared memory will
break that.

Well. That already is the case now. MyProc->recoveryConflictMode is
checked to recognize what kind of conflict is being resolved...

In that case, HS has already broken it, and we need to fix it not make
it worse.

My humble opinion is that SIGINT should not be overloaded with multiple
meanings. We already have a multiplexed signal mechanism, which is what
should be used for any additional signal reasons HS may need to
introduce.

It's a revelation to me, but yes, I see it now and agree.

I'm looking at Fujii-san's multiplexing patch from Jul 31 to rewrite
this code using that mechanism. It sounds like it's a neat fit and it
should get around the bug report from Kris also if it all works.

Hm. I just read a bit of that multiplexing facility (out of a different reason)
and I have some doubt about it being used unmodified for canceling backends:

procsignal.c:
/*
* Note: Since there's no locking, it's possible that the target
* process detaches from shared memory and exits right after this
* test, before we set the flag and send signal. And the signal slot
* might even be recycled by a new process, so it's remotely possible
* that we set a flag for a wrong process. That's OK, all the signals
* are such that no harm is done if they're mistakenly fired.
*/
procsignal.h:
...
* Also, because of race conditions, it's important that all the signals be
* defined so that no harm is done if a process mistakenly receives one.
*/

When cancelling a backend that behaviour could be a bit annoying ;-)

I guess locking procarray during sending the signal should be enough?

Andres

#12Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#11)
Re: Hot Standy introduced problem with query cancel behavior

On Wed, Dec 30, 2009 at 2:06 PM, Andres Freund <andres@anarazel.de> wrote:

On Wednesday 30 December 2009 01:13:01 Simon Riggs wrote:

On Tue, 2009-12-29 at 11:13 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On Tuesday 29 December 2009 16:22:54 Tom Lane wrote:

This seems like a fairly bad idea.  One of the intended use-cases is
to be able to manually "kill -INT" a misbehaving backend.  Assuming
that there will be valid info about the signal in shared memory will
break that.

Well. That already is the case now. MyProc->recoveryConflictMode is
checked to recognize what kind of conflict is being resolved...

In that case, HS has already broken it, and we need to fix it not make
it worse.

My humble opinion is that SIGINT should not be overloaded with multiple
meanings.  We already have a multiplexed signal mechanism, which is what
should be used for any additional signal reasons HS may need to
introduce.

It's a revelation to me, but yes, I see it now and agree.

I'm looking at Fujii-san's multiplexing patch from Jul 31 to rewrite
this code using that mechanism. It sounds like it's a neat fit and it
should get around the bug report from Kris also if it all works.

Hm. I just read a bit of that multiplexing facility (out of a different reason)
and I have some doubt about it being used unmodified for canceling backends:

procsignal.c:
/*
 * Note: Since there's no locking, it's possible that the target
 * process detaches from shared memory and exits right after this
 * test, before we set the flag and send signal. And the signal slot
 * might even be recycled by a new process, so it's remotely possible
 * that we set a flag for a wrong process. That's OK, all the signals
 * are such that no harm is done if they're mistakenly fired.
 */
procsignal.h:
...
 * Also, because of race conditions, it's important that all the signals be
 * defined so that no harm is done if a process mistakenly receives one.
 */

When cancelling a backend that behaviour could be a bit annoying ;-)

I guess locking procarray during sending the signal should be enough?

I think the idea is that you define the behavior of the signal to be
"look at this other piece of state to see whether you should cancel
yourself" rather than just "cancel yourself". Then if a signal is
delivered by mistake, it's no big deal - you just look at the other
piece of state and decide that you don't need to do anything.

...Robert

#13Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#12)
Re: Hot Standy introduced problem with query cancel behavior

----- Ursprüngliche Mitteilung -----

On Wed, Dec 30, 2009 at 2:06 PM, Andres Freund <andres@anarazel.de> wrote:

On Wednesday 30 December 2009 01:13:01 Simon Riggs wrote:

On Tue, 2009-12-29 at 11:13 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On Tuesday 29 December 2009 16:22:54 Tom Lane wrote:

This seems like a fairly bad idea.  One of the intended use-cases is
to be able to manually "kill -INT" a misbehaving backend.  Assuming
that there will be valid info about the signal in shared memory will
break that.

Well. That already is the case now. MyProc->recoveryConflictMode is
checked to recognize what kind of conflict is being resolved...

In that case, HS has already broken it, and we need to fix it not make
it worse.

My humble opinion is that SIGINT should not be overloaded with multiple
meanings.  We already have a multiplexed signal mechanism, which is what
should be used for any additional signal reasons HS may need to
introduce.

It's a revelation to me, but yes, I see it now and agree.

I'm looking at Fujii-san's multiplexing patch from Jul 31 to rewrite
this code using that mechanism. It sounds like it's a neat fit and it
should get around the bug report from Kris also if it all works.

Hm. I just read a bit of that multiplexing facility (out of a different reason)
and I have some doubt about it being used unmodified for canceling backends:

procsignal.c:
/*
 * Note: Since there's no locking, it's possible that the target
 * process detaches from shared memory and exits right after this
 * test, before we set the flag and send signal. And the signal slot
 * might even be recycled by a new process, so it's remotely possible
 * that we set a flag for a wrong process. That's OK, all the signals
 * are such that no harm is done if they're mistakenly fired.
 */
procsignal.h:
...
 * Also, because of race conditions, it's important that all the signals be
 * defined so that no harm is done if a process mistakenly receives one.
 */

When cancelling a backend that behaviour could be a bit annoying ;-)

I guess locking procarray during sending the signal should be enough?

I think the idea is that you define the behavior of the signal to be
"look at this other piece of state to see whether you should cancel
yourself" rather than just "cancel yourself".  Then if a signal is
delivered by mistake, it's no big deal - you just look at the other
piece of state and decide that you don't need to do anything.

I dont see an easy way to pass enough information right now. You cant regenerate enough of it in the to be killed backend as most of the relevant information is only available in the startup process.
Inventing yet another segment in shm just for this seems overcomplicated to me.
Thats why I suggested locking the procarray for this - without having looked at the code that should prevent a backend slot from beimg reused.

Andres

#14Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#13)
Re: Hot Standy introduced problem with query cancel behavior

On Wed, Dec 30, 2009 at 6:43 PM, Andres Freund <andres@anarazel.de> wrote:

----- Ursprüngliche Mitteilung -----

On Wed, Dec 30, 2009 at 2:06 PM, Andres Freund <andres@anarazel.de> wrote:

On Wednesday 30 December 2009 01:13:01 Simon Riggs wrote:

On Tue, 2009-12-29 at 11:13 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On Tuesday 29 December 2009 16:22:54 Tom Lane wrote:

This seems like a fairly bad idea.  One of the intended use-cases is
to be able to manually "kill -INT" a misbehaving backend.  Assuming
that there will be valid info about the signal in shared memory will
break that.

Well. That already is the case now. MyProc->recoveryConflictMode is
checked to recognize what kind of conflict is being resolved...

In that case, HS has already broken it, and we need to fix it not make
it worse.

My humble opinion is that SIGINT should not be overloaded with multiple
meanings.  We already have a multiplexed signal mechanism, which is what
should be used for any additional signal reasons HS may need to
introduce.

It's a revelation to me, but yes, I see it now and agree.

I'm looking at Fujii-san's multiplexing patch from Jul 31 to rewrite
this code using that mechanism. It sounds like it's a neat fit and it
should get around the bug report from Kris also if it all works.

Hm. I just read a bit of that multiplexing facility (out of a different reason)
and I have some doubt about it being used unmodified for canceling backends:

procsignal.c:
/*
 * Note: Since there's no locking, it's possible that the target
 * process detaches from shared memory and exits right after this
 * test, before we set the flag and send signal. And the signal slot
 * might even be recycled by a new process, so it's remotely possible
 * that we set a flag for a wrong process. That's OK, all the signals
 * are such that no harm is done if they're mistakenly fired.
 */
procsignal.h:
...
 * Also, because of race conditions, it's important that all the signals be
 * defined so that no harm is done if a process mistakenly receives one.
 */

When cancelling a backend that behaviour could be a bit annoying ;-)

I guess locking procarray during sending the signal should be enough?

I think the idea is that you define the behavior of the signal to be
"look at this other piece of state to see whether you should cancel
yourself" rather than just "cancel yourself".  Then if a signal is
delivered by mistake, it's no big deal - you just look at the other
piece of state and decide that you don't need to do anything.

I dont see an easy way to pass enough information right now. You cant regenerate enough of it in the to be killed backend as most of the relevant information is only available in the startup process.
Inventing yet another segment in shm just for this seems overcomplicated to me.
Thats why I suggested locking the procarray for this - without having looked at the code that should prevent a backend slot from beimg reused.

Yeah, I understand, but I have a feeling that the code doesn't do it
that way right now for a reason. Someone who understands this better
than I should comment, but I'm thinking you would likely need to lock
the ProcArray in CheckProcSignal as well, and I'm thinking that can't
be safely done from within a signal handler.

...Robert

#15Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#14)
Re: Hot Standy introduced problem with query cancel behavior

On Thursday 31 December 2009 01:09:57 Robert Haas wrote:

On Wed, Dec 30, 2009 at 6:43 PM, Andres Freund <andres@anarazel.de> wrote:

On Wed, Dec 30, 2009 at 2:06 PM, Andres Freund <andres@anarazel.de>

wrote:

On Wednesday 30 December 2009 01:13:01 Simon Riggs wrote:

On Tue, 2009-12-29 at 11:13 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On Tuesday 29 December 2009 16:22:54 Tom Lane wrote:

This seems like a fairly bad idea. One of the intended
use-cases is to be able to manually "kill -INT" a misbehaving
backend. Assuming that there will be valid info about the
signal in shared memory will break that.

Well. That already is the case now. MyProc->recoveryConflictMode
is checked to recognize what kind of conflict is being
resolved...

In that case, HS has already broken it, and we need to fix it not
make it worse.

My humble opinion is that SIGINT should not be overloaded with
multiple meanings. We already have a multiplexed signal
mechanism, which is what should be used for any additional signal
reasons HS may need to introduce.

It's a revelation to me, but yes, I see it now and agree.

I'm looking at Fujii-san's multiplexing patch from Jul 31 to rewrite
this code using that mechanism. It sounds like it's a neat fit and
it should get around the bug report from Kris also if it all works.

Hm. I just read a bit of that multiplexing facility (out of a
different reason) and I have some doubt about it being used unmodified
for canceling backends:

procsignal.c:
/*
* Note: Since there's no locking, it's possible that the target
* process detaches from shared memory and exits right after this
* test, before we set the flag and send signal. And the signal slot
* might even be recycled by a new process, so it's remotely possible
* that we set a flag for a wrong process. That's OK, all the signals
* are such that no harm is done if they're mistakenly fired.
*/
procsignal.h:
...
* Also, because of race conditions, it's important that all the
signals be * defined so that no harm is done if a process mistakenly
receives one. */

When cancelling a backend that behaviour could be a bit annoying ;-)

I guess locking procarray during sending the signal should be enough?

I think the idea is that you define the behavior of the signal to be
"look at this other piece of state to see whether you should cancel
yourself" rather than just "cancel yourself". Then if a signal is
delivered by mistake, it's no big deal - you just look at the other
piece of state and decide that you don't need to do anything.

I dont see an easy way to pass enough information right now. You cant
regenerate enough of it in the to be killed backend as most of the
relevant information is only available in the startup process. Inventing
yet another segment in shm just for this seems overcomplicated to me.
Thats why I suggested locking the procarray for this - without having
looked at the code that should prevent a backend slot from beimg reused.

Yeah, I understand, but I have a feeling that the code doesn't do it
that way right now for a reason. Someone who understands this better
than I should comment, but I'm thinking you would likely need to lock
the ProcArray in CheckProcSignal as well, and I'm thinking that can't
be safely done from within a signal handler.

I don't think we would need to lock in the signal handler.
The situation is that two different flags (at this point at least) are needed.
FATAL which aborts the session and ERROR which aborts the transaction.

Consider the following scenario:
- both flag are set while holding a lock on the procarray
- starting a new backend requires a lock on the procarray
- backend startup cleans up both flags in its ProcSignalSlot (only that specific
one as its the only one manipulated under a lock, mind you)
- transaction startup clears the ERROR flag under locking
- after the new backend started no signal handler targeted for the old backend
can get triggered (new pid, if we consider direct reuse of the same pid we
have much bigger problems anyway), thus no flag targeted for the old backend
can get set

That would require a nice comment explaining that but it should work.

Another possibility would be to make the whole signal delivery mechanism safe
- that should be possible if we had a atomically settable backend id...

Unfortunately that would limit the max lifetime for a backend a bit - as
sig_atomic_t is 4byte on most platforms. So no backend would be allowed to
live after creation of the 2**32-1th backend after it.
I don't immediately see a way circumvent that 32bit barrier without using
assembler or locks.

Andres

PS: Hm. For my former message beeing written on a mobile phone it looked
surprisingly clean...

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#11)
Re: Hot Standy introduced problem with query cancel behavior

On Wed, 2009-12-30 at 20:06 +0100, Andres Freund wrote:

Hm. I just read a bit of that multiplexing facility (out of a different reason)
and I have some doubt about it being used unmodified for canceling backends:

procsignal.c:
/*
* Note: Since there's no locking, it's possible that the target
* process detaches from shared memory and exits right after this
* test, before we set the flag and send signal. And the signal slot
* might even be recycled by a new process, so it's remotely possible
* that we set a flag for a wrong process. That's OK, all the signals
* are such that no harm is done if they're mistakenly fired.
*/
procsignal.h:
...
* Also, because of race conditions, it's important that all the signals be
* defined so that no harm is done if a process mistakenly receives one.
*/

When cancelling a backend that behaviour could be a bit annoying ;-)

Reading comments alone doesn't show the full situation here.

Before we send signal we check pid also, so the chances of this
happening are fairly small. i.e. we would need to have a backend slot
reused by a new backend with exactly same pid as the last slot holder.

I'm proposing to use this for killing transactions and connections, so I
don't think there's too much harm done there. If the backend is leaving
anyway, that's what we wanted. If its a new guy that is wearing the same
boots then a little collateral damage doesn't leave the server in a bad
place. HS cancellations aren't yet so precise that this matters.

--
Simon Riggs www.2ndQuadrant.com

#17Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#16)
Re: Hot Standy introduced problem with query cancel behavior

On Thursday 31 December 2009 12:25:19 Simon Riggs wrote:

On Wed, 2009-12-30 at 20:06 +0100, Andres Freund wrote:

Hm. I just read a bit of that multiplexing facility (out of a different
reason) and I have some doubt about it being used unmodified for
canceling backends:

procsignal.c:
/*
* Note: Since there's no locking, it's possible that the target
* process detaches from shared memory and exits right after this
* test, before we set the flag and send signal. And the signal slot
* might even be recycled by a new process, so it's remotely possible
* that we set a flag for a wrong process. That's OK, all the signals
* are such that no harm is done if they're mistakenly fired.
*/
procsignal.h:
...
* Also, because of race conditions, it's important that all the signals
be * defined so that no harm is done if a process mistakenly receives
one. */

When cancelling a backend that behaviour could be a bit annoying ;-)

Reading comments alone doesn't show the full situation here.

Before we send signal we check pid also, so the chances of this
happening are fairly small. i.e. we would need to have a backend slot
reused by a new backend with exactly same pid as the last slot holder.

Well. The problem does not occur for critical errors (i.e. session death)
only. As signal delivery is asynchronous it can very well happen for
transaction cancellation as well.

I'm proposing to use this for killing transactions and connections, so I
don't think there's too much harm done there. If the backend is leaving
anyway, that's what we wanted. If its a new guy that is wearing the same
boots then a little collateral damage doesn't leave the server in a bad
place. HS cancellations aren't yet so precise that this matters.

Building racy infrastructure when it can be avoided with a little care still
seems not to be the best path to me.

Andres

#18Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#17)
Re: Hot Standy introduced problem with query cancel behavior

On Thu, 2009-12-31 at 13:14 +0100, Andres Freund wrote:

When cancelling a backend that behaviour could be a bit

annoying ;-)

Reading comments alone doesn't show the full situation here.

Before we send signal we check pid also, so the chances of this
happening are fairly small. i.e. we would need to have a backend

slot

reused by a new backend with exactly same pid as the last slot

holder.
Well. The problem does not occur for critical errors (i.e. session
death)
only. As signal delivery is asynchronous it can very well happen for
transaction cancellation as well.

I'm proposing to use this for killing transactions and connections,

so I

don't think there's too much harm done there. If the backend is

leaving

anyway, that's what we wanted. If its a new guy that is wearing the

same

boots then a little collateral damage doesn't leave the server in a

bad

place. HS cancellations aren't yet so precise that this matters.

Building racy infrastructure when it can be avoided with a little care
still seems not to be the best path to me.

Doing that will add more complexity in an area that is hard to test
effectively. I think the risk of introducing further bugs while trying
to fix this rare condition is high. Right now the conflict processing
needs more work and is often much less precise than this, so improving
this aspect of it would not be a priority. I've added it to the TODO
though. Thank you for your research.

I enclose the patch I am currently testing, as a patch-on-patch on top
of Joachim's changes, recently published on this thread. POC only as
yet.

Patch implements recovery conflict signalling using SIGUSR1
multiplexing, then uses a SessionCancelPending mode similar to Joachim's
TransactionCancelPending.

--
Simon Riggs www.2ndQuadrant.com

Attachments:

conflict_using_sigusr1.v1.patchtext/x-patch; charset=UTF-8; name=conflict_using_sigusr1.v1.patchDownload+127-77
#19Joachim Wieland
joe@mcknight.de
In reply to: Simon Riggs (#18)
Re: Hot Standy introduced problem with query cancel behavior

On Thu, Dec 31, 2009 at 6:40 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Building racy infrastructure when it can be avoided with a little care
still seems not to be the best path to me.

Doing that will add more complexity in an area that is hard to test
effectively. I think the risk of introducing further bugs while trying
to fix this rare condition is high. Right now the conflict processing
needs more work and is often much less precise than this, so improving
this aspect of it would not be a priority. I've added it to the TODO
though. Thank you for your research.

Patch implements recovery conflict signalling using SIGUSR1
multiplexing, then uses a SessionCancelPending mode similar to Joachim's
TransactionCancelPending.

I have reworked Simon's patch a bit and attach the result.

Quick facts:

- Hot Standby only uses SIGUSR1
- SIGINT behaves as it did before: it only cancels running statements
- pg_cancel_backend() continues to use SIGINT
- I added pg_cancel_idle_transaction() to cancel an idle transaction via
SIGUSR1
- One central function HandleCancelAction() sets the flags before calling
ProcessInterrupts(), it is called from the different signal handlers and
receives parameters about what it should do
- If a SIGUSR1 reason is used that will cancel something, ProcArrayLock is
acquired until the signal has been sent to make sure that we won't signal the
wrong backend. Does this sufficiently cover the concerns of Andres Freund
discussed upthread?

As there were so many boolean SomethingCancelPending variables I changed them
to be bitmasks and merged all of them into a single variable. However I am not
sure if we can change the name and type of especially InterruptPending that
easily as it was marked with PGDLLIMPORT...

@Simon: Is there a reason why you have not yet removed recoveryConflictMode
from PGPROC?

Joachim

Attachments:

hs_signaling.1.difftext/x-diff; charset=US-ASCII; name=hs_signaling.1.diffDownload+473-311
#20Simon Riggs
simon@2ndQuadrant.com
In reply to: Joachim Wieland (#19)
Re: Hot Standy introduced problem with query cancel behavior

On Thu, 2010-01-07 at 14:45 +0100, Joachim Wieland wrote:

On Thu, Dec 31, 2009 at 6:40 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Building racy infrastructure when it can be avoided with a little care
still seems not to be the best path to me.

Doing that will add more complexity in an area that is hard to test
effectively. I think the risk of introducing further bugs while trying
to fix this rare condition is high. Right now the conflict processing
needs more work and is often much less precise than this, so improving
this aspect of it would not be a priority. I've added it to the TODO
though. Thank you for your research.

Patch implements recovery conflict signalling using SIGUSR1
multiplexing, then uses a SessionCancelPending mode similar to Joachim's
TransactionCancelPending.

I have reworked Simon's patch a bit and attach the result.

Oh dear, this is exactly what I've been working on...

Quick facts:

- Hot Standby only uses SIGUSR1
- SIGINT behaves as it did before: it only cancels running statements
- pg_cancel_backend() continues to use SIGINT
- I added pg_cancel_idle_transaction() to cancel an idle transaction via
SIGUSR1
- One central function HandleCancelAction() sets the flags before calling
ProcessInterrupts(), it is called from the different signal handlers and
receives parameters about what it should do
- If a SIGUSR1 reason is used that will cancel something, ProcArrayLock is
acquired until the signal has been sent to make sure that we won't signal the
wrong backend. Does this sufficiently cover the concerns of Andres Freund
discussed upthread?

As there were so many boolean SomethingCancelPending variables I changed them
to be bitmasks and merged all of them into a single variable. However I am not
sure if we can change the name and type of especially InterruptPending that
easily as it was marked with PGDLLIMPORT...

@Simon: Is there a reason why you have not yet removed recoveryConflictMode
from PGPROC?

--
Simon Riggs www.2ndQuadrant.com

#21Andres Freund
andres@anarazel.de
In reply to: Joachim Wieland (#19)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joachim Wieland (#19)
#23Joachim Wieland
joe@mcknight.de
In reply to: Tom Lane (#22)
#24Joachim Wieland
joe@mcknight.de
In reply to: Simon Riggs (#20)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Joachim Wieland (#24)
#26Simon Riggs
simon@2ndQuadrant.com
In reply to: Joachim Wieland (#19)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#26)
#28Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#28)
#30Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#29)
#31Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#29)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#30)
#33Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#30)
#34Joachim Wieland
joe@mcknight.de
In reply to: Tom Lane (#29)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joachim Wieland (#34)
#36Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#32)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#36)
#38Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#37)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#38)
#40Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#39)
#41Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#40)
#42Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#40)
#43Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#41)
#44Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#43)
#45Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#43)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#45)
#47Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#46)
#48Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#45)
#49Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#48)