Spurious standby query cancellations

Started by Jeff Janesover 10 years ago12 messages
#1Jeff Janes
jeff.janes@gmail.com
1 attachment(s)

In ResolveRecoveryConflictWithLock, there is the comment:

/*
* If blowing away everybody with conflicting locks doesn't work, after
* the first two attempts then we just start blowing everybody away
until
* it does work.

But what it does is something different than that.

At least under some conditions, it will wait patiently for all initially
conflicting locks to go away. If that doesn't work twice (because new
lockers joined while we were waiting for the old ones to go away), then it
will wait patiently for all transactions throughout the system to go away
even if they don't conflict, perhaps not even realizing that the lock has
become free in the mean time.

Then when its patience runs out, it kills everything on the system. But it
never tried to blow away just the conflicting locks, instead it just tried
to wait them out.

The fact that trying to wait them out didn't work (twice) doesn't mean that
killing them wouldn't have worked.

I think that it was intended that num_attempts would get incremented only
once WaitExceedsMaxStandbyDelay becomes true, but that is not what happens
with the current code.

Currently WaitExceedsMaxStandbyDelay only has one caller. I think it we
should take the sleep code out of that function and move it into the
existing call site, and then have ResolveRecoveryConflictWithLock check
with WaitExceedsMaxStandbyDelay before incrementing num_attempts.

Cheers,

Jeff

Attachments:

standby_lock_cancel_v1.patchapplication/octet-stream; name=standby_lock_cancel_v1.patchDownload
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
new file mode 100644
index 292bed5..8719da9
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
*************** WaitExceedsMaxStandbyDelay(void)
*** 165,184 ****
  	ltime = GetStandbyLimitTime();
  	if (ltime && GetCurrentTimestamp() >= ltime)
  		return true;
- 
- 	/*
- 	 * Sleep a bit (this is essential to avoid busy-waiting).
- 	 */
- 	pg_usleep(standbyWait_us);
- 
- 	/*
- 	 * Progressively increase the sleep times, but not to more than 1s, since
- 	 * pg_usleep isn't interruptable on some platforms.
- 	 */
- 	standbyWait_us *= 2;
- 	if (standbyWait_us > 1000000)
- 		standbyWait_us = 1000000;
- 
  	return false;
  }
  
--- 165,170 ----
*************** ResolveRecoveryConflictWithVirtualXIDs(V
*** 247,252 ****
--- 233,253 ----
  				if (pid != 0)
  					pg_usleep(5000L);
  			}
+ 			else 
+ 			{
+ 				/*
+ 				 * Sleep a bit (this is essential to avoid busy-waiting).
+ 				 */
+ 				pg_usleep(standbyWait_us);
+ 
+ 				/*
+ 				 * Progressively increase the sleep times, but not to more than 1s, since
+ 				 * pg_usleep isn't interruptable on some platforms.
+ 				 */
+ 				standbyWait_us *= 2;
+ 				if (standbyWait_us > 1000000)
+ 					standbyWait_us = 1000000;
+ 			}
  		}
  
  		/* The virtual transaction is gone now, wait for the next one */
*************** ResolveRecoveryConflictWithLock(Oid dbOi
*** 357,363 ****
  	 */
  	while (!lock_acquired)
  	{
! 		if (++num_attempts < 3)
  			backends = GetLockConflicts(&locktag, AccessExclusiveLock);
  		else
  			backends = GetConflictingVirtualXIDs(InvalidTransactionId,
--- 358,364 ----
  	 */
  	while (!lock_acquired)
  	{
! 		if (num_attempts < 2)
  			backends = GetLockConflicts(&locktag, AccessExclusiveLock);
  		else
  			backends = GetConflictingVirtualXIDs(InvalidTransactionId,
*************** ResolveRecoveryConflictWithLock(Oid dbOi
*** 369,374 ****
--- 370,378 ----
  		if (LockAcquireExtended(&locktag, AccessExclusiveLock, true, true, false)
  			!= LOCKACQUIRE_NOT_AVAIL)
  			lock_acquired = true;
+ 
+ 		if (WaitExceedsMaxStandbyDelay()) 
+ 			num_attempts++;
  	}
  }
  
#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Jeff Janes (#1)
Re: Spurious standby query cancellations

On 27 August 2015 at 22:55, Jeff Janes <jeff.janes@gmail.com> wrote:

In ResolveRecoveryConflictWithLock, there is the comment:

/*
* If blowing away everybody with conflicting locks doesn't work, after
* the first two attempts then we just start blowing everybody away
until
* it does work.

But what it does is something different than that.

At least under some conditions, it will wait patiently for all initially
conflicting locks to go away. If that doesn't work twice (because new
lockers joined while we were waiting for the old ones to go away), then it
will wait patiently for all transactions throughout the system to go away
even if they don't conflict, perhaps not even realizing that the lock has
become free in the mean time.

Then when its patience runs out, it kills everything on the system. But
it never tried to blow away just the conflicting locks, instead it just
tried to wait them out.

The fact that trying to wait them out didn't work (twice) doesn't mean
that killing them wouldn't have worked.

I think that it was intended that num_attempts would get incremented only
once WaitExceedsMaxStandbyDelay becomes true, but that is not what happens
with the current code.

Currently WaitExceedsMaxStandbyDelay only has one caller. I think it we
should take the sleep code out of that function and move it into the
existing call site, and then have ResolveRecoveryConflictWithLock check
with WaitExceedsMaxStandbyDelay before incrementing num_attempts.

I agree with the problem in the current code, thank you for spotting it.

I also agree that the proposed solution would work-ish, if we are
suggesting backpatching this.

It's now possible to fix this by putting a lock wait on the actual lock
request, which wasn't available when I first wrote that, hence the crappy
wait loop. Using the timeout handler would now be the preferred way to
solve this. We can backpatch that to 9.3 if needed, where they were
introduced.

There's an example of how to use lock waits further down
on ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing
it that way?

We probably also need to resurrect my earlier patch to avoid logging
AccessExclusiveLocks on temp tables.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Jeff Janes
jeff.janes@gmail.com
In reply to: Simon Riggs (#2)
Re: Spurious standby query cancellations

On Fri, Sep 4, 2015 at 3:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 27 August 2015 at 22:55, Jeff Janes <jeff.janes@gmail.com> wrote:

In ResolveRecoveryConflictWithLock, there is the comment:

/*
* If blowing away everybody with conflicting locks doesn't work,
after
* the first two attempts then we just start blowing everybody away
until
* it does work.

But what it does is something different than that.

At least under some conditions, it will wait patiently for all initially
conflicting locks to go away. If that doesn't work twice (because new
lockers joined while we were waiting for the old ones to go away), then it
will wait patiently for all transactions throughout the system to go away
even if they don't conflict, perhaps not even realizing that the lock has
become free in the mean time.

Then when its patience runs out, it kills everything on the system. But
it never tried to blow away just the conflicting locks, instead it just
tried to wait them out.

The fact that trying to wait them out didn't work (twice) doesn't mean
that killing them wouldn't have worked.

I think that it was intended that num_attempts would get incremented only
once WaitExceedsMaxStandbyDelay becomes true, but that is not what happens
with the current code.

Currently WaitExceedsMaxStandbyDelay only has one caller. I think it we
should take the sleep code out of that function and move it into the
existing call site, and then have ResolveRecoveryConflictWithLock check
with WaitExceedsMaxStandbyDelay before incrementing num_attempts.

I agree with the problem in the current code, thank you for spotting it.

I also agree that the proposed solution would work-ish, if we are
suggesting backpatching this.

I wasn't specifically thinking of backpatching it, but if it doesn't
operate the way you intended it to, it might make sense to do that.

I stumbled on this while experimentally looking into a different issue (the
ProcArrayLock contention issue in HeapTupleSatisfiesMVCC that was recently
ameliorated in 8a7d0701814, but the variant of that issue reported
in "[PERFORM] Strange query stalls on replica in 9.3.9"). I haven't heard
of any complaints from the field about too-frequent query cancellations,
but I also don't have my "ear to the ground" in that area.

It's now possible to fix this by putting a lock wait on the actual lock
request, which wasn't available when I first wrote that, hence the crappy
wait loop. Using the timeout handler would now be the preferred way to
solve this. We can backpatch that to 9.3 if needed, where they were
introduced.

There's an example of how to use lock waits further down
on ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing
it that way?

It looks like this will take some major surgery. The heavy weight lock
manager doesn't play well with others when it comes to timeouts other than
its own. LockBufferForCleanup is a simple retry loop, but the lock manager
is much more complicated than that.

Here are some approaches I can think of:

Approach one, replace LockAcquireExtended's dontWait boolean with a
"waitUntil" timeout value. Queue the lock like ordinary
(non-recovery-backend) lock requests are queued so that new requestors are
not granted new locks which conflict with what we want. If the timeout
expires without the lock being acquired, dequeue ourselves and clean up,
and then return LOCKACQUIRE_NOT_AVAIL. I think something would have to be
done about deadlocks, as well, as I don't think the regular deadlock
detectors works in the standby startup process.

Approach two, same as one but when the timeout expires we invoke a callback
to terminate the blocking processes without dequeueing ourselves. That way
no one can sneak in and grab the lock during the time we were doing the
terminations. This way we are guaranteed to succeed after one round of
terminations, and there is no need to terminate innocent bystanders. I
think this is my preferred solution to have. (But I haven't had much luck
implementing it beyond the hand-wavy stages.)

Approach three, ask for the lock with dontWait as we do now, but if we
don't get the lock then somehow arrange for us to be signalled when the
lock becomes free, without being queued for it. Then we would have to
retry, with no guarantee the retry would be successful. Functionally this
would be no different than the simple patch I gave, except the polling loop
is much more efficient. You still have to terminate processes if a stream
of new requestors take the lock in a constantly overlapping basis.

We probably also need to resurrect my earlier patch to avoid logging
AccessExclusiveLocks on temp tables.

I couldn't find that, can you provide a link?

Another source of frequent AccessExclusiveLocks is vacuum truncation
attempts. If a table has some occupied pages right at the end which are
marked as all-visible, then forward scan doesn't visit them. Since it
didn't visit them, it assumes they might be empty and truncatable and so
takes the AccessExclusiveLock, only to immediately see that the last page
is not empty. Perhaps the forward scan could be special-cased to never
skip the final block of a table. That way if it is not empty, the
truncation is abandoned before taking the AccessExclusiveLock.

Cheers,

Jeff

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Jeff Janes (#3)
Re: Spurious standby query cancellations

On 14 September 2015 at 12:00, Jeff Janes <jeff.janes@gmail.com> wrote:

It's now possible to fix this by putting a lock wait on the actual lock

request, which wasn't available when I first wrote that, hence the crappy
wait loop. Using the timeout handler would now be the preferred way to
solve this. We can backpatch that to 9.3 if needed, where they were
introduced.

There's an example of how to use lock waits further down
on ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing
it that way?

It looks like this will take some major surgery. The heavy weight lock
manager doesn't play well with others when it comes to timeouts other than
its own. LockBufferForCleanup is a simple retry loop, but the lock manager
is much more complicated than that.

Not sure I understand this objection. I can't see a reason that my proposal
wouldn't work.

We probably also need to resurrect my earlier patch to avoid logging
AccessExclusiveLocks on temp tables.

I couldn't find that, can you provide a link?

/messages/by-id/CA+U5nMJYGYS+kccJ+=aQxuC_G09hF6zsD-xhu5W9oqGNkbE6nQ@mail.gmail.com

Another source of frequent AccessExclusiveLocks is vacuum truncation
attempts. If a table has some occupied pages right at the end which are
marked as all-visible, then forward scan doesn't visit them. Since it
didn't visit them, it assumes they might be empty and truncatable and so
takes the AccessExclusiveLock, only to immediately see that the last page
is not empty. Perhaps the forward scan could be special-cased to never
skip the final block of a table. That way if it is not empty, the
truncation is abandoned before taking the AccessExclusiveLock.

We can probably get rid of that lock, but it will require lots of smoke and
mirrors to persuade scans that they don't actually need to scan as far as
they thought they did, because a VACUUM has checked and it knows those
blocks do not contain tuples visible to the scan. Which sounds a little
hairy.

I think Andres is working on putting an end of data watermark in shared
memory rather than using the end of physical file, which sounds like a
better plan.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Jeff Janes
jeff.janes@gmail.com
In reply to: Simon Riggs (#4)
1 attachment(s)
Re: Spurious standby query cancellations

On Wed, Sep 16, 2015 at 2:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 14 September 2015 at 12:00, Jeff Janes <jeff.janes@gmail.com> wrote:

It's now possible to fix this by putting a lock wait on the actual lock

request, which wasn't available when I first wrote that, hence the crappy
wait loop. Using the timeout handler would now be the preferred way to
solve this. We can backpatch that to 9.3 if needed, where they were
introduced.

There's an example of how to use lock waits further down
on ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing
it that way?

It looks like this will take some major surgery. The heavy weight lock
manager doesn't play well with others when it comes to timeouts other than
its own. LockBufferForCleanup is a simple retry loop, but the lock manager
is much more complicated than that.

Not sure I understand this objection. I can't see a reason that my
proposal wouldn't work.

On further thought, neither do I. The attached patch inverts
ResolveRecoveryConflictWithLock
to be called back from the lmgr code so that is it like
ResolveRecoveryConflictWithBufferPin
code. It does not try to cancel the conflicting lock holders from the
signal handler, rather it just loops an extra time and cancels the
transactions on the next call.

It looks like the deadlock detection is adequately handled within normal
lmgr code within the back-ends of the other parties to the deadlock, so I
didn't do a timeout for deadlock detection purposes.

Cheers,

Jeff

Attachments:

standby_lock_cancel_v2.patchtext/x-patch; charset=US-ASCII; name=standby_lock_cancel_v2.patchDownload
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
new file mode 100644
index 581837a..b993ca8
*** a/src/backend/postmaster/startup.c
--- b/src/backend/postmaster/startup.c
*************** StartupProcessMain(void)
*** 203,208 ****
--- 203,209 ----
  	 */
  	RegisterTimeout(STANDBY_DEADLOCK_TIMEOUT, StandbyDeadLockHandler);
  	RegisterTimeout(STANDBY_TIMEOUT, StandbyTimeoutHandler);
+ 	RegisterTimeout(STANDBY_LOCK_TIMEOUT, StandbyLockTimeoutHandler);
  
  	/*
  	 * Unblock signals (they were blocked when the postmaster forked us)
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
new file mode 100644
index 292bed5..4b428de
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
*************** static List *RecoveryLockList;
*** 41,47 ****
  
  static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
  									   ProcSignalReason reason);
- static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
  static void SendRecoveryConflictWithBufferPin(ProcSignalReason reason);
  static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
  static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
--- 41,46 ----
*************** ResolveRecoveryConflictWithDatabase(Oid
*** 337,375 ****
  	}
  }
  
! static void
! ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid)
  {
! 	VirtualTransactionId *backends;
! 	bool		lock_acquired = false;
! 	int			num_attempts = 0;
! 	LOCKTAG		locktag;
  
! 	SET_LOCKTAG_RELATION(locktag, dbOid, relOid);
  
! 	/*
! 	 * If blowing away everybody with conflicting locks doesn't work, after
! 	 * the first two attempts then we just start blowing everybody away until
! 	 * it does work. We do this because its likely that we either have too
! 	 * many locks and we just can't get one at all, or that there are many
! 	 * people crowding for the same table. Recovery must win; the end
! 	 * justifies the means.
! 	 */
! 	while (!lock_acquired)
! 	{
! 		if (++num_attempts < 3)
! 			backends = GetLockConflicts(&locktag, AccessExclusiveLock);
! 		else
! 			backends = GetConflictingVirtualXIDs(InvalidTransactionId,
! 												 InvalidOid);
  
  		ResolveRecoveryConflictWithVirtualXIDs(backends,
  											 PROCSIG_RECOVERY_CONFLICT_LOCK);
  
! 		if (LockAcquireExtended(&locktag, AccessExclusiveLock, true, true, false)
! 			!= LOCKACQUIRE_NOT_AVAIL)
! 			lock_acquired = true;
  	}
  }
  
  /*
--- 336,402 ----
  	}
  }
  
! /*
!  * ResolveRecoveryConflictWithLock is called from ProcSleep()
!  * to resolve conflicts with other backends holding relation locks.
!  *
!  * The WaitLatch sleep normally done in ProcSleep()
!  * (when not InHotStandby) is performed here, for code clarity.
!  *
!  * We either resolve conflicts immediately or set a timeout to wake us at
!  * the limit of our patience.
!  *
!  * Resolve conflicts by cancelling to all backends holding a conflicting 
!  * lock.  As we are already queued to be granted the lock, no new lock
!  * requests conflicting with ours will be granted in the meantime.
!  *
!  * Deadlocks involving the Startup process and an ordinary backend process 
!  * will be detected by the deadlock detector within the ordinary backend.
!  */
! void
! ResolveRecoveryConflictWithLock(LOCKTAG locktag)
  {
! 	TimestampTz ltime;
  
! 	Assert(InHotStandby);
  
! 	ltime = GetStandbyLimitTime();
  
+ 	if (GetCurrentTimestamp() >= ltime)
+ 	{
+ 		/*
+ 		 * We're already behind, so clear a path as quickly as possible.
+ 		 */
+ 		VirtualTransactionId *backends;
+ 		backends = GetLockConflicts(&locktag, AccessExclusiveLock);
  		ResolveRecoveryConflictWithVirtualXIDs(backends,
  											 PROCSIG_RECOVERY_CONFLICT_LOCK);
+ 	}
+ 	else
+ 	{
+ 		/*
+ 		 * Wake up at ltime
+ 		 */
+ 		EnableTimeoutParams timeouts[1];
  
! 		timeouts[0].id = STANDBY_LOCK_TIMEOUT;
! 		timeouts[0].type = TMPARAM_AT;
! 		timeouts[0].fin_time = ltime;
! 		enable_timeouts(timeouts, 1);
  	}
+ 
+ 	/* Wait to be signaled by the release of the Relation Lock */
+ 	ProcWaitForSignal();
+ 
+ 	/*
+ 	 * Clear any timeout requests established above.  We assume here that the
+ 	 * Startup process doesn't have any other outstanding timeouts than what 
+ 	 * this function
+ 	 * uses.  If that stops being true, we could cancel the timeouts
+ 	 * individually, but that'd be slower.
+ 	 */
+ 	disable_all_timeouts(false);
+ 
  }
  
  /*
*************** StandbyTimeoutHandler(void)
*** 532,537 ****
--- 559,574 ----
  	SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
  }
  
+ /*
+  * StandbyLockTimeoutHandler() will be called if STANDBY_LOCK_TIMEOUT is exceeded.
+  * This doesn't need to do anything, simply waking up is enough.
+  */
+ void
+ StandbyLockTimeoutHandler(void)
+ {
+ 	/* forget any pending STANDBY_DEADLOCK_TIMEOUT request */
+ 	disable_timeout(STANDBY_DEADLOCK_TIMEOUT, false);
+ }
  
  /*
   * -----------------------------------------------------
*************** StandbyTimeoutHandler(void)
*** 545,551 ****
   * process is the proxy by which the original locks are implemented.
   *
   * We only keep track of AccessExclusiveLocks, which are only ever held by
!  * one transaction on one relation, and don't worry about lock queuing.
   *
   * We keep a single dynamically expandible list of locks in local memory,
   * RelationLockList, so we can keep track of the various entries made by
--- 582,588 ----
   * process is the proxy by which the original locks are implemented.
   *
   * We only keep track of AccessExclusiveLocks, which are only ever held by
!  * one transaction on one relation.
   *
   * We keep a single dynamically expandible list of locks in local memory,
   * RelationLockList, so we can keep track of the various entries made by
*************** StandbyAcquireAccessExclusiveLock(Transa
*** 587,600 ****
  	newlock->relOid = relOid;
  	RecoveryLockList = lappend(RecoveryLockList, newlock);
  
- 	/*
- 	 * Attempt to acquire the lock as requested, if not resolve conflict
- 	 */
  	SET_LOCKTAG_RELATION(locktag, newlock->dbOid, newlock->relOid);
  
! 	if (LockAcquireExtended(&locktag, AccessExclusiveLock, true, true, false)
! 		== LOCKACQUIRE_NOT_AVAIL)
! 		ResolveRecoveryConflictWithLock(newlock->dbOid, newlock->relOid);
  }
  
  static void
--- 624,632 ----
  	newlock->relOid = relOid;
  	RecoveryLockList = lappend(RecoveryLockList, newlock);
  
  	SET_LOCKTAG_RELATION(locktag, newlock->dbOid, newlock->relOid);
  
! 	LockAcquireExtended(&locktag, AccessExclusiveLock, true, false, false);
  }
  
  static void
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
new file mode 100644
index 2c2535b..716477c
*** a/src/backend/storage/lmgr/proc.c
--- b/src/backend/storage/lmgr/proc.c
***************
*** 42,47 ****
--- 42,48 ----
  #include "postmaster/autovacuum.h"
  #include "replication/slot.h"
  #include "replication/syncrep.h"
+ #include "storage/standby.h"
  #include "storage/ipc.h"
  #include "storage/lmgr.h"
  #include "storage/pmsignal.h"
*************** ProcSleep(LOCALLOCK *locallock, LockMeth
*** 1090,1109 ****
  	 * If LockTimeout is set, also enable the timeout for that.  We can save a
  	 * few cycles by enabling both timeout sources in one call.
  	 */
! 	if (LockTimeout > 0)
  	{
! 		EnableTimeoutParams timeouts[2];
! 
! 		timeouts[0].id = DEADLOCK_TIMEOUT;
! 		timeouts[0].type = TMPARAM_AFTER;
! 		timeouts[0].delay_ms = DeadlockTimeout;
! 		timeouts[1].id = LOCK_TIMEOUT;
! 		timeouts[1].type = TMPARAM_AFTER;
! 		timeouts[1].delay_ms = LockTimeout;
! 		enable_timeouts(timeouts, 2);
  	}
- 	else
- 		enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
  
  	/*
  	 * If somebody wakes us between LWLockRelease and WaitLatch, the latch
--- 1091,1113 ----
  	 * If LockTimeout is set, also enable the timeout for that.  We can save a
  	 * few cycles by enabling both timeout sources in one call.
  	 */
! 	if (!InHotStandby)
  	{
! 		if (LockTimeout > 0)
! 		{
! 			EnableTimeoutParams timeouts[2];
! 	
! 			timeouts[0].id = DEADLOCK_TIMEOUT;
! 			timeouts[0].type = TMPARAM_AFTER;
! 			timeouts[0].delay_ms = DeadlockTimeout;
! 			timeouts[1].id = LOCK_TIMEOUT;
! 			timeouts[1].type = TMPARAM_AFTER;
! 			timeouts[1].delay_ms = LockTimeout;
! 			enable_timeouts(timeouts, 2);
! 		}
! 		else
! 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
  	}
  
  	/*
  	 * If somebody wakes us between LWLockRelease and WaitLatch, the latch
*************** ProcSleep(LOCALLOCK *locallock, LockMeth
*** 1121,1135 ****
  	 */
  	do
  	{
! 		WaitLatch(MyLatch, WL_LATCH_SET, 0);
! 		ResetLatch(MyLatch);
! 		/* check for deadlocks first, as that's probably log-worthy */
! 		if (got_deadlock_timeout)
  		{
! 			CheckDeadLock();
! 			got_deadlock_timeout = false;
  		}
- 		CHECK_FOR_INTERRUPTS();
  
  		/*
  		 * waitStatus could change from STATUS_WAITING to something else
--- 1125,1147 ----
  	 */
  	do
  	{
! 		if (InHotStandby)
  		{
! 			/* Set a timer and wait for that or for the Lock to be granted */
! 			ResolveRecoveryConflictWithLock(locallock->tag.lock);
! 		}
! 		else
! 		{
! 			WaitLatch(MyLatch, WL_LATCH_SET, 0);
! 			ResetLatch(MyLatch);
! 			/* check for deadlocks first, as that's probably log-worthy */
! 			if (got_deadlock_timeout)
! 			{
! 				CheckDeadLock();
! 				got_deadlock_timeout = false;
! 			}
! 			CHECK_FOR_INTERRUPTS();
  		}
  
  		/*
  		 * waitStatus could change from STATUS_WAITING to something else
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
new file mode 100644
index 40b329b..b82406e
*** a/src/include/storage/standby.h
--- b/src/include/storage/standby.h
*************** extern void ResolveRecoveryConflictWithS
*** 33,42 ****
--- 33,44 ----
  extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
  extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
  
+ extern void ResolveRecoveryConflictWithLock(LOCKTAG locktag);
  extern void ResolveRecoveryConflictWithBufferPin(void);
  extern void CheckRecoveryConflictDeadlock(void);
  extern void StandbyDeadLockHandler(void);
  extern void StandbyTimeoutHandler(void);
+ extern void StandbyLockTimeoutHandler(void);
  
  /*
   * Standby Rmgr (RM_STANDBY_ID)
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
new file mode 100644
index 30581a1..33b5016
*** a/src/include/utils/timeout.h
--- b/src/include/utils/timeout.h
*************** typedef enum TimeoutId
*** 29,34 ****
--- 29,35 ----
  	STATEMENT_TIMEOUT,
  	STANDBY_DEADLOCK_TIMEOUT,
  	STANDBY_TIMEOUT,
+ 	STANDBY_LOCK_TIMEOUT,
  	/* First user-definable timeout reason */
  	USER_TIMEOUT,
  	/* Maximum number of timeout reasons */
#6Michael Paquier
michael.paquier@gmail.com
In reply to: Jeff Janes (#5)
Re: Spurious standby query cancellations

On Thu, Sep 24, 2015 at 3:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Sep 16, 2015 at 2:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 14 September 2015 at 12:00, Jeff Janes <jeff.janes@gmail.com> wrote:

It's now possible to fix this by putting a lock wait on the actual lock
request, which wasn't available when I first wrote that, hence the crappy
wait loop. Using the timeout handler would now be the preferred way to solve
this. We can backpatch that to 9.3 if needed, where they were introduced.

There's an example of how to use lock waits further down on
ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing it
that way?

It looks like this will take some major surgery. The heavy weight lock
manager doesn't play well with others when it comes to timeouts other than
its own. LockBufferForCleanup is a simple retry loop, but the lock manager
is much more complicated than that.

Not sure I understand this objection. I can't see a reason that my
proposal wouldn't work.

On further thought, neither do I. The attached patch inverts
ResolveRecoveryConflictWithLock to be called back from the lmgr code so that
is it like ResolveRecoveryConflictWithBufferPin code. It does not try to
cancel the conflicting lock holders from the signal handler, rather it just
loops an extra time and cancels the transactions on the next call.

It looks like the deadlock detection is adequately handled within normal
lmgr code within the back-ends of the other parties to the deadlock, so I
didn't do a timeout for deadlock detection purposes.

Patch moved to next CF because of a lack of reviews. Simon is
registered as reviewer, hence I guess that the ball is on his side of
the field.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Jeff Janes
jeff.janes@gmail.com
In reply to: Jeff Janes (#5)
Re: Spurious standby query cancellations

On Wed, Sep 23, 2015 at 11:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On further thought, neither do I. The attached patch inverts
ResolveRecoveryConflictWithLock to be called back from the lmgr code so that
is it like ResolveRecoveryConflictWithBufferPin code. It does not try to
cancel the conflicting lock holders from the signal handler, rather it just
loops an extra time and cancels the transactions on the next call.

It looks like the deadlock detection is adequately handled within normal
lmgr code within the back-ends of the other parties to the deadlock, so I
didn't do a timeout for deadlock detection purposes.

I was testing that this still applies to git HEAD. And it doesn't due
to the re-factoring of lock.h into lockdef.h. (And apparently it
never actually did, because that refactoring happened long before I
wrote this patch. I guess I must have done this work against
9_5_STABLE.)

standby.h cannot include lock.h because standby.h is included
indirectly by pg_xlogdump. But it has to get LOCKTAG which is only in
lock.h.

Does this mean that standby.h also needs to get parts spun off into a
new standbydef.h that can be included from front-end code?

standby.h doesn't need to know the internals of LOCKTAG. It just
needs to declare a function that receives it as an opaque pointer. I
don't know if that info helps resolve the situation, though.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jeff Janes
jeff.janes@gmail.com
In reply to: Jeff Janes (#7)
2 attachment(s)
Re: Spurious standby query cancellations

On Wed, Dec 23, 2015 at 9:40 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Sep 23, 2015 at 11:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On further thought, neither do I. The attached patch inverts
ResolveRecoveryConflictWithLock to be called back from the lmgr code so that
is it like ResolveRecoveryConflictWithBufferPin code. It does not try to
cancel the conflicting lock holders from the signal handler, rather it just
loops an extra time and cancels the transactions on the next call.

It looks like the deadlock detection is adequately handled within normal
lmgr code within the back-ends of the other parties to the deadlock, so I
didn't do a timeout for deadlock detection purposes.

I was testing that this still applies to git HEAD. And it doesn't due
to the re-factoring of lock.h into lockdef.h. (And apparently it
never actually did, because that refactoring happened long before I
wrote this patch. I guess I must have done this work against
9_5_STABLE.)

standby.h cannot include lock.h because standby.h is included
indirectly by pg_xlogdump. But it has to get LOCKTAG which is only in
lock.h.

Does this mean that standby.h also needs to get parts spun off into a
new standbydef.h that can be included from front-end code?

That is how I've done it.

The lock cancel patch applies over the header split patch.

Cheers,

Jeff

Attachments:

standby_header_split_v1.patchtext/x-patch; charset=US-ASCII; name=standby_header_split_v1.patchDownload
diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c
new file mode 100644
index 3d35f38..acd590b
*** a/src/backend/access/rmgrdesc/standbydesc.c
--- b/src/backend/access/rmgrdesc/standbydesc.c
***************
*** 14,20 ****
   */
  #include "postgres.h"
  
! #include "storage/standby.h"
  
  static void
  standby_desc_running_xacts(StringInfo buf, xl_running_xacts *xlrec)
--- 14,20 ----
   */
  #include "postgres.h"
  
! #include "storage/standbydefs.h"
  
  static void
  standby_desc_running_xacts(StringInfo buf, xl_running_xacts *xlrec)
diff --git a/src/bin/pg_xlogdump/rmgrdesc.c b/src/bin/pg_xlogdump/rmgrdesc.c
new file mode 100644
index 5b88a8d..f9cd395
*** a/src/bin/pg_xlogdump/rmgrdesc.c
--- b/src/bin/pg_xlogdump/rmgrdesc.c
***************
*** 27,33 ****
  #include "commands/tablespace.h"
  #include "replication/origin.h"
  #include "rmgrdesc.h"
! #include "storage/standby.h"
  #include "utils/relmapper.h"
  
  #define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup) \
--- 27,33 ----
  #include "commands/tablespace.h"
  #include "replication/origin.h"
  #include "rmgrdesc.h"
! #include "storage/standbydefs.h"
  #include "utils/relmapper.h"
  
  #define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup) \
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
new file mode 100644
index 40b329b..f27a7ba
*** a/src/include/storage/standby.h
--- b/src/include/storage/standby.h
***************
*** 14,22 ****
  #ifndef STANDBY_H
  #define STANDBY_H
  
! #include "access/xlogreader.h"
! #include "lib/stringinfo.h"
! #include "storage/lockdefs.h"
  #include "storage/procsignal.h"
  #include "storage/relfilenode.h"
  
--- 14,20 ----
  #ifndef STANDBY_H
  #define STANDBY_H
  
! #include "storage/standbydefs.h"
  #include "storage/procsignal.h"
  #include "storage/relfilenode.h"
  
*************** extern void StandbyReleaseLockTree(Trans
*** 51,91 ****
  extern void StandbyReleaseAllLocks(void);
  extern void StandbyReleaseOldLocks(int nxids, TransactionId *xids);
  
- /*
-  * XLOG message types
-  */
- #define XLOG_STANDBY_LOCK			0x00
- #define XLOG_RUNNING_XACTS			0x10
- 
- typedef struct xl_standby_locks
- {
- 	int			nlocks;			/* number of entries in locks array */
- 	xl_standby_lock locks[FLEXIBLE_ARRAY_MEMBER];
- } xl_standby_locks;
- 
- /*
-  * When we write running xact data to WAL, we use this structure.
-  */
- typedef struct xl_running_xacts
- {
- 	int			xcnt;			/* # of xact ids in xids[] */
- 	int			subxcnt;		/* # of subxact ids in xids[] */
- 	bool		subxid_overflow;	/* snapshot overflowed, subxids missing */
- 	TransactionId nextXid;		/* copy of ShmemVariableCache->nextXid */
- 	TransactionId oldestRunningXid;		/* *not* oldestXmin */
- 	TransactionId latestCompletedXid;	/* so we can set xmax */
- 
- 	TransactionId xids[FLEXIBLE_ARRAY_MEMBER];
- } xl_running_xacts;
- 
  #define MinSizeOfXactRunningXacts offsetof(xl_running_xacts, xids)
  
  
- /* Recovery handlers for the Standby Rmgr (RM_STANDBY_ID) */
- extern void standby_redo(XLogReaderState *record);
- extern void standby_desc(StringInfo buf, XLogReaderState *record);
- extern const char *standby_identify(uint8 info);
- 
  /*
   * Declarations for GetRunningTransactionData(). Similar to Snapshots, but
   * not quite. This has nothing at all to do with visibility on this server,
--- 49,57 ----
diff --git a/src/include/storage/standbydefs.h b/src/include/storage/standbydefs.h
new file mode 100644
index ...a2c9db8
*** a/src/include/storage/standbydefs.h
--- b/src/include/storage/standbydefs.h
***************
*** 0 ****
--- 1,53 ----
+ /*-------------------------------------------------------------------------
+  *
+  * standbydef.h
+  *	   Frontend exposed definitions for hot standby mode.
+  *
+  *
+  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * src/include/storage/standbydefs.h
+  *
+  *-------------------------------------------------------------------------
+  */
+ #ifndef STANDBYDEFS_H
+ #define STANDBYDEFS_H
+ 
+ #include "access/xlogreader.h"
+ #include "lib/stringinfo.h"
+ #include "storage/lockdefs.h"
+ 
+ /* Recovery handlers for the Standby Rmgr (RM_STANDBY_ID) */
+ extern void standby_redo(XLogReaderState *record);
+ extern void standby_desc(StringInfo buf, XLogReaderState *record);
+ extern const char *standby_identify(uint8 info);
+ 
+ /*
+  * XLOG message types
+  */
+ #define XLOG_STANDBY_LOCK			0x00
+ #define XLOG_RUNNING_XACTS			0x10
+ 
+ typedef struct xl_standby_locks
+ {
+ 	int			nlocks;			/* number of entries in locks array */
+ 	xl_standby_lock locks[FLEXIBLE_ARRAY_MEMBER];
+ } xl_standby_locks;
+ 
+ /*
+  * When we write running xact data to WAL, we use this structure.
+  */
+ typedef struct xl_running_xacts
+ {
+ 	int			xcnt;			/* # of xact ids in xids[] */
+ 	int			subxcnt;		/* # of subxact ids in xids[] */
+ 	bool		subxid_overflow;	/* snapshot overflowed, subxids missing */
+ 	TransactionId nextXid;		/* copy of ShmemVariableCache->nextXid */
+ 	TransactionId oldestRunningXid;		/* *not* oldestXmin */
+ 	TransactionId latestCompletedXid;	/* so we can set xmax */
+ 
+ 	TransactionId xids[FLEXIBLE_ARRAY_MEMBER];
+ } xl_running_xacts;
+ 
+ #endif   /* STANDBYDEFS_H */
standby_lock_cancel_v3.patchtext/x-patch; charset=US-ASCII; name=standby_lock_cancel_v3.patchDownload
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
new file mode 100644
index 581837a..b993ca8
*** a/src/backend/postmaster/startup.c
--- b/src/backend/postmaster/startup.c
*************** StartupProcessMain(void)
*** 203,208 ****
--- 203,209 ----
  	 */
  	RegisterTimeout(STANDBY_DEADLOCK_TIMEOUT, StandbyDeadLockHandler);
  	RegisterTimeout(STANDBY_TIMEOUT, StandbyTimeoutHandler);
+ 	RegisterTimeout(STANDBY_LOCK_TIMEOUT, StandbyLockTimeoutHandler);
  
  	/*
  	 * Unblock signals (they were blocked when the postmaster forked us)
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
new file mode 100644
index 292bed5..4b428de
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
*************** static List *RecoveryLockList;
*** 41,47 ****
  
  static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
  									   ProcSignalReason reason);
- static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
  static void SendRecoveryConflictWithBufferPin(ProcSignalReason reason);
  static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
  static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
--- 41,46 ----
*************** ResolveRecoveryConflictWithDatabase(Oid
*** 337,375 ****
  	}
  }
  
! static void
! ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid)
  {
! 	VirtualTransactionId *backends;
! 	bool		lock_acquired = false;
! 	int			num_attempts = 0;
! 	LOCKTAG		locktag;
  
! 	SET_LOCKTAG_RELATION(locktag, dbOid, relOid);
  
! 	/*
! 	 * If blowing away everybody with conflicting locks doesn't work, after
! 	 * the first two attempts then we just start blowing everybody away until
! 	 * it does work. We do this because its likely that we either have too
! 	 * many locks and we just can't get one at all, or that there are many
! 	 * people crowding for the same table. Recovery must win; the end
! 	 * justifies the means.
! 	 */
! 	while (!lock_acquired)
! 	{
! 		if (++num_attempts < 3)
! 			backends = GetLockConflicts(&locktag, AccessExclusiveLock);
! 		else
! 			backends = GetConflictingVirtualXIDs(InvalidTransactionId,
! 												 InvalidOid);
  
  		ResolveRecoveryConflictWithVirtualXIDs(backends,
  											 PROCSIG_RECOVERY_CONFLICT_LOCK);
  
! 		if (LockAcquireExtended(&locktag, AccessExclusiveLock, true, true, false)
! 			!= LOCKACQUIRE_NOT_AVAIL)
! 			lock_acquired = true;
  	}
  }
  
  /*
--- 336,402 ----
  	}
  }
  
! /*
!  * ResolveRecoveryConflictWithLock is called from ProcSleep()
!  * to resolve conflicts with other backends holding relation locks.
!  *
!  * The WaitLatch sleep normally done in ProcSleep()
!  * (when not InHotStandby) is performed here, for code clarity.
!  *
!  * We either resolve conflicts immediately or set a timeout to wake us at
!  * the limit of our patience.
!  *
!  * Resolve conflicts by cancelling to all backends holding a conflicting 
!  * lock.  As we are already queued to be granted the lock, no new lock
!  * requests conflicting with ours will be granted in the meantime.
!  *
!  * Deadlocks involving the Startup process and an ordinary backend process 
!  * will be detected by the deadlock detector within the ordinary backend.
!  */
! void
! ResolveRecoveryConflictWithLock(LOCKTAG locktag)
  {
! 	TimestampTz ltime;
  
! 	Assert(InHotStandby);
  
! 	ltime = GetStandbyLimitTime();
  
+ 	if (GetCurrentTimestamp() >= ltime)
+ 	{
+ 		/*
+ 		 * We're already behind, so clear a path as quickly as possible.
+ 		 */
+ 		VirtualTransactionId *backends;
+ 		backends = GetLockConflicts(&locktag, AccessExclusiveLock);
  		ResolveRecoveryConflictWithVirtualXIDs(backends,
  											 PROCSIG_RECOVERY_CONFLICT_LOCK);
+ 	}
+ 	else
+ 	{
+ 		/*
+ 		 * Wake up at ltime
+ 		 */
+ 		EnableTimeoutParams timeouts[1];
  
! 		timeouts[0].id = STANDBY_LOCK_TIMEOUT;
! 		timeouts[0].type = TMPARAM_AT;
! 		timeouts[0].fin_time = ltime;
! 		enable_timeouts(timeouts, 1);
  	}
+ 
+ 	/* Wait to be signaled by the release of the Relation Lock */
+ 	ProcWaitForSignal();
+ 
+ 	/*
+ 	 * Clear any timeout requests established above.  We assume here that the
+ 	 * Startup process doesn't have any other outstanding timeouts than what 
+ 	 * this function
+ 	 * uses.  If that stops being true, we could cancel the timeouts
+ 	 * individually, but that'd be slower.
+ 	 */
+ 	disable_all_timeouts(false);
+ 
  }
  
  /*
*************** StandbyTimeoutHandler(void)
*** 532,537 ****
--- 559,574 ----
  	SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
  }
  
+ /*
+  * StandbyLockTimeoutHandler() will be called if STANDBY_LOCK_TIMEOUT is exceeded.
+  * This doesn't need to do anything, simply waking up is enough.
+  */
+ void
+ StandbyLockTimeoutHandler(void)
+ {
+ 	/* forget any pending STANDBY_DEADLOCK_TIMEOUT request */
+ 	disable_timeout(STANDBY_DEADLOCK_TIMEOUT, false);
+ }
  
  /*
   * -----------------------------------------------------
*************** StandbyTimeoutHandler(void)
*** 545,551 ****
   * process is the proxy by which the original locks are implemented.
   *
   * We only keep track of AccessExclusiveLocks, which are only ever held by
!  * one transaction on one relation, and don't worry about lock queuing.
   *
   * We keep a single dynamically expandible list of locks in local memory,
   * RelationLockList, so we can keep track of the various entries made by
--- 582,588 ----
   * process is the proxy by which the original locks are implemented.
   *
   * We only keep track of AccessExclusiveLocks, which are only ever held by
!  * one transaction on one relation.
   *
   * We keep a single dynamically expandible list of locks in local memory,
   * RelationLockList, so we can keep track of the various entries made by
*************** StandbyAcquireAccessExclusiveLock(Transa
*** 587,600 ****
  	newlock->relOid = relOid;
  	RecoveryLockList = lappend(RecoveryLockList, newlock);
  
- 	/*
- 	 * Attempt to acquire the lock as requested, if not resolve conflict
- 	 */
  	SET_LOCKTAG_RELATION(locktag, newlock->dbOid, newlock->relOid);
  
! 	if (LockAcquireExtended(&locktag, AccessExclusiveLock, true, true, false)
! 		== LOCKACQUIRE_NOT_AVAIL)
! 		ResolveRecoveryConflictWithLock(newlock->dbOid, newlock->relOid);
  }
  
  static void
--- 624,632 ----
  	newlock->relOid = relOid;
  	RecoveryLockList = lappend(RecoveryLockList, newlock);
  
  	SET_LOCKTAG_RELATION(locktag, newlock->dbOid, newlock->relOid);
  
! 	LockAcquireExtended(&locktag, AccessExclusiveLock, true, false, false);
  }
  
  static void
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
new file mode 100644
index bb10c1b..4f13f9d
*** a/src/backend/storage/lmgr/proc.c
--- b/src/backend/storage/lmgr/proc.c
***************
*** 42,47 ****
--- 42,48 ----
  #include "postmaster/autovacuum.h"
  #include "replication/slot.h"
  #include "replication/syncrep.h"
+ #include "storage/standby.h"
  #include "storage/ipc.h"
  #include "storage/lmgr.h"
  #include "storage/pmsignal.h"
*************** ProcSleep(LOCALLOCK *locallock, LockMeth
*** 1080,1099 ****
  	 * If LockTimeout is set, also enable the timeout for that.  We can save a
  	 * few cycles by enabling both timeout sources in one call.
  	 */
! 	if (LockTimeout > 0)
  	{
! 		EnableTimeoutParams timeouts[2];
! 
! 		timeouts[0].id = DEADLOCK_TIMEOUT;
! 		timeouts[0].type = TMPARAM_AFTER;
! 		timeouts[0].delay_ms = DeadlockTimeout;
! 		timeouts[1].id = LOCK_TIMEOUT;
! 		timeouts[1].type = TMPARAM_AFTER;
! 		timeouts[1].delay_ms = LockTimeout;
! 		enable_timeouts(timeouts, 2);
  	}
- 	else
- 		enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
  
  	/*
  	 * If somebody wakes us between LWLockRelease and WaitLatch, the latch
--- 1081,1103 ----
  	 * If LockTimeout is set, also enable the timeout for that.  We can save a
  	 * few cycles by enabling both timeout sources in one call.
  	 */
! 	if (!InHotStandby)
  	{
! 		if (LockTimeout > 0)
! 		{
! 			EnableTimeoutParams timeouts[2];
! 	
! 			timeouts[0].id = DEADLOCK_TIMEOUT;
! 			timeouts[0].type = TMPARAM_AFTER;
! 			timeouts[0].delay_ms = DeadlockTimeout;
! 			timeouts[1].id = LOCK_TIMEOUT;
! 			timeouts[1].type = TMPARAM_AFTER;
! 			timeouts[1].delay_ms = LockTimeout;
! 			enable_timeouts(timeouts, 2);
! 		}
! 		else
! 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
  	}
  
  	/*
  	 * If somebody wakes us between LWLockRelease and WaitLatch, the latch
*************** ProcSleep(LOCALLOCK *locallock, LockMeth
*** 1111,1125 ****
  	 */
  	do
  	{
! 		WaitLatch(MyLatch, WL_LATCH_SET, 0);
! 		ResetLatch(MyLatch);
! 		/* check for deadlocks first, as that's probably log-worthy */
! 		if (got_deadlock_timeout)
  		{
! 			CheckDeadLock();
! 			got_deadlock_timeout = false;
  		}
- 		CHECK_FOR_INTERRUPTS();
  
  		/*
  		 * waitStatus could change from STATUS_WAITING to something else
--- 1115,1137 ----
  	 */
  	do
  	{
! 		if (InHotStandby)
  		{
! 			/* Set a timer and wait for that or for the Lock to be granted */
! 			ResolveRecoveryConflictWithLock(locallock->tag.lock);
! 		}
! 		else
! 		{
! 			WaitLatch(MyLatch, WL_LATCH_SET, 0);
! 			ResetLatch(MyLatch);
! 			/* check for deadlocks first, as that's probably log-worthy */
! 			if (got_deadlock_timeout)
! 			{
! 				CheckDeadLock();
! 				got_deadlock_timeout = false;
! 			}
! 			CHECK_FOR_INTERRUPTS();
  		}
  
  		/*
  		 * waitStatus could change from STATUS_WAITING to something else
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
new file mode 100644
index f27a7ba..957ba19
*** a/src/include/storage/standby.h
--- b/src/include/storage/standby.h
***************
*** 15,20 ****
--- 15,21 ----
  #define STANDBY_H
  
  #include "storage/standbydefs.h"
+ #include "storage/lock.h"
  #include "storage/procsignal.h"
  #include "storage/relfilenode.h"
  
*************** extern void ResolveRecoveryConflictWithS
*** 31,40 ****
--- 32,43 ----
  extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
  extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
  
+ extern void ResolveRecoveryConflictWithLock(LOCKTAG locktag);
  extern void ResolveRecoveryConflictWithBufferPin(void);
  extern void CheckRecoveryConflictDeadlock(void);
  extern void StandbyDeadLockHandler(void);
  extern void StandbyTimeoutHandler(void);
+ extern void StandbyLockTimeoutHandler(void);
  
  /*
   * Standby Rmgr (RM_STANDBY_ID)
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
new file mode 100644
index 30581a1..33b5016
*** a/src/include/utils/timeout.h
--- b/src/include/utils/timeout.h
*************** typedef enum TimeoutId
*** 29,34 ****
--- 29,35 ----
  	STATEMENT_TIMEOUT,
  	STANDBY_DEADLOCK_TIMEOUT,
  	STANDBY_TIMEOUT,
+ 	STANDBY_LOCK_TIMEOUT,
  	/* First user-definable timeout reason */
  	USER_TIMEOUT,
  	/* Maximum number of timeout reasons */
#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Jeff Janes (#8)
Re: Spurious standby query cancellations

On 24 December 2015 at 20:15, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Dec 23, 2015 at 9:40 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Sep 23, 2015 at 11:33 PM, Jeff Janes <jeff.janes@gmail.com>

wrote:

On further thought, neither do I. The attached patch inverts
ResolveRecoveryConflictWithLock to be called back from the lmgr code so

that

is it like ResolveRecoveryConflictWithBufferPin code. It does not try

to

cancel the conflicting lock holders from the signal handler, rather it

just

loops an extra time and cancels the transactions on the next call.

It looks like the deadlock detection is adequately handled within normal
lmgr code within the back-ends of the other parties to the deadlock, so

I

didn't do a timeout for deadlock detection purposes.

I was testing that this still applies to git HEAD. And it doesn't due
to the re-factoring of lock.h into lockdef.h. (And apparently it
never actually did, because that refactoring happened long before I
wrote this patch. I guess I must have done this work against
9_5_STABLE.)

standby.h cannot include lock.h because standby.h is included
indirectly by pg_xlogdump. But it has to get LOCKTAG which is only in
lock.h.

Does this mean that standby.h also needs to get parts spun off into a
new standbydef.h that can be included from front-end code?

That is how I've done it.

The lock cancel patch applies over the header split patch.

This looks good to me, apart from some WhitespaceCrime.

Header split applied, will test and apply the main patch this week.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Simon Riggs (#9)
Re: Spurious standby query cancellations

Simon Riggs wrote:

This looks good to me, apart from some WhitespaceCrime.

Header split applied, will test and apply the main patch this week.

Since the patch already appears to have a committer's attention, it's
okay to move it to the next commitfest; if Simon happens to commit ahead
of time, that would be a great outcome too.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Jeff Janes (#8)
Re: Spurious standby query cancellations

On Fri, Dec 25, 2015 at 1:45 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Dec 23, 2015 at 9:40 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Sep 23, 2015 at 11:33 PM, Jeff Janes <jeff.janes@gmail.com>

wrote:

On further thought, neither do I. The attached patch inverts
ResolveRecoveryConflictWithLock to be called back from the lmgr code

so that

is it like ResolveRecoveryConflictWithBufferPin code. It does not try

to

cancel the conflicting lock holders from the signal handler, rather it

just

loops an extra time and cancels the transactions on the next call.

It looks like the deadlock detection is adequately handled within

normal

lmgr code within the back-ends of the other parties to the deadlock,

so I

didn't do a timeout for deadlock detection purposes.

I was testing that this still applies to git HEAD. And it doesn't due
to the re-factoring of lock.h into lockdef.h. (And apparently it
never actually did, because that refactoring happened long before I
wrote this patch. I guess I must have done this work against
9_5_STABLE.)

standby.h cannot include lock.h because standby.h is included
indirectly by pg_xlogdump. But it has to get LOCKTAG which is only in
lock.h.

Does this mean that standby.h also needs to get parts spun off into a
new standbydef.h that can be included from front-end code?

That is how I've done it.

The lock cancel patch applies over the header split patch.

Review Comments -

1.
/*
  * If somebody wakes us between LWLockRelease and WaitLatch, the latch
--- 1081,1103 ----
  * If
LockTimeout is set, also enable the timeout for that.  We can save a
  * few cycles by enabling both
timeout sources in one call.
  */
! if (!InHotStandby)
  {
! if (LockTimeout > 0)
!
{
! EnableTimeoutParams timeouts[2];
!
! timeouts[0].id =
DEADLOCK_TIMEOUT;
! timeouts[0].type = TMPARAM_AFTER;
! timeouts
[0].delay_ms = DeadlockTimeout;
! timeouts[1].id = LOCK_TIMEOUT;
!
timeouts[1].type = TMPARAM_AFTER;
! timeouts[1].delay_ms = LockTimeout;
!
enable_timeouts(timeouts, 2);
! }
! else
! enable_timeout_after
(DEADLOCK_TIMEOUT, DeadlockTimeout);
  }

If you are enabling the timeout only in non-hotstandby mode, then
later in code (in same function) it should be disabled under the same
condition, otherwise it will lead to assertion failure.

2.
+ /*
+ * Clear any timeout requests established above.  We assume here that the
+ * Startup
process doesn't have any other outstanding timeouts than what
+ * this function
+ * uses.  If
that stops being true, we could cancel the timeouts
+ * individually, but that'd be slower.
+ */

Comment seems to be misaligned.

3.
+ void
+ StandbyLockTimeoutHandler(void)
+ {
+ /* forget any pending STANDBY_DEADLOCK_TIMEOUT request */
+
disable_timeout(STANDBY_DEADLOCK_TIMEOUT, false);
+ }

Is there any reason to disable deadlock timeout when it is not
enabled by ResolveRecoveryConflictWithLock()?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Jeff Janes (#8)
Re: Spurious standby query cancellations

On 24 December 2015 at 20:15, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Dec 23, 2015 at 9:40 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Sep 23, 2015 at 11:33 PM, Jeff Janes <jeff.janes@gmail.com>

wrote:

On further thought, neither do I. The attached patch inverts
ResolveRecoveryConflictWithLock to be called back from the lmgr code so

that

is it like ResolveRecoveryConflictWithBufferPin code. It does not try

to

cancel the conflicting lock holders from the signal handler, rather it

just

loops an extra time and cancels the transactions on the next call.

It looks like the deadlock detection is adequately handled within normal
lmgr code within the back-ends of the other parties to the deadlock, so

I

didn't do a timeout for deadlock detection purposes.

That is how I've done it.

It's taken me a while to figure this out.

My testing showed a bug in disable_timeout(), which turns out to be a
double-disable, which I've fixed. I'll submit a different patch to put in
some diagnostics if such cases show up again, which could happen now we
have user-defined timeouts.

What surprises me is that I can't see this patch ever worked as submitted,
when run on an assert-enabled build.

If you want this backpatched, please submit versions that apply cleanly and
test them. I'm less inclined to do that myself, just regard this as an
improvement.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services