pg_sleep() doesn't work well with recovery conflict interrupts.

Started by Andres Freundover 11 years ago5 messages
#1Andres Freund
andres@2ndquadrant.com

Hi,

Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses
WaitLatch() to wait. That's fine in itself. But
procsignal_sigusr1_handler, which is used e.g. when resolving recovery
conflicts, doesn't unconditionally do a SetLatch().
That means that we'll we'll currently not be able to cancel conflicting
backends during recovery for 10min. Now, I don't think that'll happen
too often in practice, but it's still annoying.

As an alternative to doing the PG_TRY/save set_latch_on_sigusr1/set
set_latch_on_sigusr1/PG_CATCH/reset set_latch_on_sigusr1/ dance in
pg_sleep() we could also have RecoveryConflictInterrupt() do an
unconditional SetLatch()?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#1)
Re: pg_sleep() doesn't work well with recovery conflict interrupts.

On Wed, May 28, 2014 at 8:53 PM, Andres Freund <andres@2ndquadrant.com>
wrote:

Hi,

Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses
WaitLatch() to wait. That's fine in itself. But
procsignal_sigusr1_handler, which is used e.g. when resolving recovery
conflicts, doesn't unconditionally do a SetLatch().
That means that we'll we'll currently not be able to cancel conflicting
backends during recovery for 10min. Now, I don't think that'll happen
too often in practice, but it's still annoying.

How will such a situation occur, aren't we using pg_usleep during
RecoveryConflict functions
(ex. in ResolveRecoveryConflictWithVirtualXIDs)?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#3Andres Freund
andres@2ndquadrant.com
In reply to: Amit Kapila (#2)
Re: pg_sleep() doesn't work well with recovery conflict interrupts.

On 2014-05-30 10:30:42 +0530, Amit Kapila wrote:

On Wed, May 28, 2014 at 8:53 PM, Andres Freund <andres@2ndquadrant.com>
wrote:

Hi,

Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses
WaitLatch() to wait. That's fine in itself. But
procsignal_sigusr1_handler, which is used e.g. when resolving recovery
conflicts, doesn't unconditionally do a SetLatch().
That means that we'll we'll currently not be able to cancel conflicting
backends during recovery for 10min. Now, I don't think that'll happen
too often in practice, but it's still annoying.

How will such a situation occur, aren't we using pg_usleep during
RecoveryConflict functions
(ex. in ResolveRecoveryConflictWithVirtualXIDs)?

I am not sure what you mean. pg_sleep() is the SQL callable function, a
different thing to pg_usleep(). The latter isn't interruptible on all
platforms, but the sleep times should be short enough for that not to
matter.
I am pretty sure by now that the sane fix for this is to add a
SetLatch() call to RecoveryConflictInterrupt(). All the signal handlers
that deal with query cancelation et al. do so, so it seems right that
RecoveryConflictInterrupt() does so as well.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#3)
Re: pg_sleep() doesn't work well with recovery conflict interrupts.

Andres Freund <andres@2ndquadrant.com> writes:

I am pretty sure by now that the sane fix for this is to add a
SetLatch() call to RecoveryConflictInterrupt(). All the signal handlers
that deal with query cancelation et al. do so, so it seems right that
RecoveryConflictInterrupt() does so as well.

+1

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#3)
Re: pg_sleep() doesn't work well with recovery conflict interrupts.

On Sun, Jun 1, 2014 at 1:05 PM, Andres Freund <andres@2ndquadrant.com>
wrote:

On 2014-05-30 10:30:42 +0530, Amit Kapila wrote:

On Wed, May 28, 2014 at 8:53 PM, Andres Freund <andres@2ndquadrant.com>

Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses
WaitLatch() to wait. That's fine in itself. But
procsignal_sigusr1_handler, which is used e.g. when resolving recovery
conflicts, doesn't unconditionally do a SetLatch().
That means that we'll we'll currently not be able to cancel

conflicting

backends during recovery for 10min. Now, I don't think that'll happen
too often in practice, but it's still annoying.

How will such a situation occur, aren't we using pg_usleep during
RecoveryConflict functions
(ex. in ResolveRecoveryConflictWithVirtualXIDs)?

I am not sure what you mean. pg_sleep() is the SQL callable function, a
different thing to pg_usleep().

I was not clear how such a situation can occur, but now looking at
it bit more carefully, I think I understood that any backend calling
pg_sleep() during recovery conflict resolution can face this situation.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com