Hot Standby: Relation-specific deferred conflict resolution

Started by Simon Riggsalmost 16 years ago35 messages

simon@2ndQuadrant.com

almost 16 years ago

1 attachment(s)

Conflict resolution improvements are important to include in this
release, as discussed many times. Proposal given here
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php
presents a viable design to improve this.

Following patch is a complete working implementation of that design.
I'm still testing it, but its worth publishing as early as possible to
allow discussion. Not for commit, just yet, but soon.

standby.c changes are to decide whether to defer recovery based upon
relfilenode of WAL record. If resolution deferred, re-check for conflict
during LockAcquire() and fail with snapshot error, just as if resolution
had never been deferred. Also, an optimisation of conflict processing to
avoid continual re-evaluation of conflicts since some are now deferred.

API changes in heapam and nbtxlog to pass thru RelFileNode
API changes in indexcmds, no behaviour changes
procarray changes to implement LatestRemovedXid cache

--
Simon Riggs www.2ndQuadrant.com

Attachments:

relation_specific_conflict_resolution.patchtext/x-patch; charset=UTF-8; name=relation_specific_conflict_resolution.patchDownload

*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***************
*** 4139,4145 **** heap_xlog_cleanup_info(XLogRecPtr lsn, XLogRecord *record)
  	xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
  
  	if (InHotStandby)
! 		ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid);
  
  	/*
  	 * Actual operation is a no-op. Record type exists to provide a means
--- 4139,4145 ----
  	xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
  
  	if (InHotStandby)
! 		ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
  
  	/*
  	 * Actual operation is a no-op. Record type exists to provide a means
***************
*** 4171,4177 **** heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record, bool clean_move)
  	 * no queries running for which the removed tuples are still visible.
  	 */
  	if (InHotStandby)
! 		ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid);
  
  	RestoreBkpBlocks(lsn, record, true);
  
--- 4171,4177 ----
  	 * no queries running for which the removed tuples are still visible.
  	 */
  	if (InHotStandby)
! 		ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
  
  	RestoreBkpBlocks(lsn, record, true);
  
***************
*** 4241,4247 **** heap_xlog_freeze(XLogRecPtr lsn, XLogRecord *record)
  	 * consider the frozen xids as running.
  	 */
  	if (InHotStandby)
! 		ResolveRecoveryConflictWithSnapshot(cutoff_xid);
  
  	RestoreBkpBlocks(lsn, record, false);
  
--- 4241,4247 ----
  	 * consider the frozen xids as running.
  	 */
  	if (InHotStandby)
! 		ResolveRecoveryConflictWithSnapshot(cutoff_xid, xlrec->node);
  
  	RestoreBkpBlocks(lsn, record, false);
  
*** a/src/backend/access/nbtree/nbtxlog.c
--- b/src/backend/access/nbtree/nbtxlog.c
***************
*** 833,839 **** btree_redo(XLogRecPtr lsn, XLogRecord *record)
  		 * here is worth some thought and possibly some effort to
  		 * improve.
  		 */
! 		ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid);
  	}
  
  	/*
--- 833,839 ----
  		 * here is worth some thought and possibly some effort to
  		 * improve.
  		 */
! 		ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
  	}
  
  	/*
*** a/src/backend/commands/indexcmds.c
--- b/src/backend/commands/indexcmds.c
***************
*** 540,546 **** DefineIndex(RangeVar *heapRelation,
  	 * check for that.	Also, prepared xacts are not reported, which is fine
  	 * since they certainly aren't going to do anything more.
  	 */
! 	old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
  
  	while (VirtualTransactionIdIsValid(*old_lockholders))
  	{
--- 540,546 ----
  	 * check for that.	Also, prepared xacts are not reported, which is fine
  	 * since they certainly aren't going to do anything more.
  	 */
! 	old_lockholders = GetLockConflicts(&heaplocktag, ShareLock, InvalidTransactionId);
  
  	while (VirtualTransactionIdIsValid(*old_lockholders))
  	{
***************
*** 627,633 **** DefineIndex(RangeVar *heapRelation,
  	 * We once again wait until no transaction can have the table open with
  	 * the index marked as read-only for updates.
  	 */
! 	old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
  
  	while (VirtualTransactionIdIsValid(*old_lockholders))
  	{
--- 627,633 ----
  	 * We once again wait until no transaction can have the table open with
  	 * the index marked as read-only for updates.
  	 */
! 	old_lockholders = GetLockConflicts(&heaplocktag, ShareLock, InvalidTransactionId);
  
  	while (VirtualTransactionIdIsValid(*old_lockholders))
  	{
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***************
*** 66,71 **** typedef struct ProcArrayStruct
--- 66,72 ----
  
  	int			numKnownAssignedXids;	/* current number of known assigned xids */
  	int			maxKnownAssignedXids;	/* allocated size of known assigned xids */
+ 
  	/*
  	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
  	 * overflowing cached subxids in PGPROC entries.
***************
*** 73,78 **** typedef struct ProcArrayStruct
--- 74,86 ----
  	TransactionId	lastOverflowedXid;
  
  	/*
+ 	 * During Hot Standby we need to store a visibility limit for to defer recovery
+ 	 * conflict processing. The cached value is accessed during lock processing,
+ 	 * but stored on the procarray header. See lock.c for further details.
+ 	 */
+ 	TransactionId	latestRemovedXidCache[NUM_LOCK_PARTITIONS];
+ 
+ 	/*
  	 * We declare procs[] as 1 entry because C wants a fixed-size array, but
  	 * actually it is maxProcs entries long.
  	 */
***************
*** 220,225 **** CreateSharedProcArray(void)
--- 228,236 ----
  											  HASH_ELEM | HASH_FUNCTION);
  		if (!KnownAssignedXidsHash)
  			elog(FATAL, "could not initialize known assigned xids hash table");
+ 
+ 		/* Initialize the latestRemovedXidCache */
+ 		MemSet(procArray->latestRemovedXidCache, 0, NUM_LOCK_PARTITIONS * sizeof(TransactionId));
  	}
  }
  
***************
*** 1216,1221 **** GetSnapshotData(Snapshot snapshot)
--- 1227,1235 ----
  		RecentGlobalXmin = FirstNormalTransactionId;
  	RecentXmin = xmin;
  
+ 	/* Reset recovery conflict processing limit after deriving new snapshot */
+ 	MyProc->limitXmin = InvalidTransactionId;
+ 
  	snapshot->xmin = xmin;
  	snapshot->xmax = xmax;
  	snapshot->xcnt = count;
***************
*** 1739,1747 **** GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
  			{
  				VirtualTransactionId vxid;
  
! 				GET_VXID_FROM_PGPROC(vxid, *proc);
! 				if (VirtualTransactionIdIsValid(vxid))
! 					vxids[count++] = vxid;
  			}
  		}
  	}
--- 1753,1774 ----
  			{
  				VirtualTransactionId vxid;
  
! 				/*
! 				 * Avoid repeated re-listing of a process for conflict processing
! 				 * by excluding it once resolved up to a particular latestRemovedXid.
! 				 * We reset proc->limitXmin each time we take a new snapshot because
! 				 * that may need to have conflict processing performed on it, as
! 				 * discussed in header comments for this function.
! 				 */
! 				if (!TransactionIdIsValid(proc->limitXmin) ||
! 					TransactionIdFollows(limitXmin, proc->limitXmin))
! 				{
! 					proc->limitXmin = limitXmin;
! 
! 					GET_VXID_FROM_PGPROC(vxid, *proc);
! 					if (VirtualTransactionIdIsValid(vxid))
! 						vxids[count++] = vxid;
! 				}
  			}
  		}
  	}
***************
*** 2559,2561 **** KnownAssignedXidsDisplay(int trace_level)
--- 2586,2610 ----
  
  	pfree(buf.data);
  }
+ 
+ /*
+  * LatestRemovedXidCache provides a way of deferring recovery conflicts up to
+  * the point where a query requests a lock on a relation. These functions provide
+  * the cache required by recovery conflict processing. (see storage/lmgr/lock.c)
+  * Function prototypes are in standby.h, not procarray.h for convenience.
+  */
+ void
+ ProcArraySetLatestRemovedXidCache(int partition, TransactionId latestRemovedXid)
+ {
+ 	ProcArrayStruct *arrayP = procArray;
+ 
+ 	arrayP->latestRemovedXidCache[partition] = latestRemovedXid;
+ }
+ 
+ TransactionId
+ ProcArrayGetLatestRemovedXidCache(int partition)
+ {
+ 	ProcArrayStruct *arrayP = procArray;
+ 
+ 	return arrayP->latestRemovedXidCache[partition];
+ }
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
***************
*** 27,32 ****
--- 27,33 ----
  #include "storage/procarray.h"
  #include "storage/sinvaladt.h"
  #include "storage/standby.h"
+ #include "tcop/tcopprot.h"
  #include "utils/ps_status.h"
  
  int		vacuum_defer_cleanup_age;
***************
*** 34,40 **** int		vacuum_defer_cleanup_age;
  static List *RecoveryLockList;
  
  static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
! 									   ProcSignalReason reason);
  static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
  static void LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
  static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
--- 35,43 ----
  static List *RecoveryLockList;
  
  static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
! 									   ProcSignalReason reason,
! 									   TransactionId latestRemovedXid,
! 									   Oid dbOid, Oid relOid);
  static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
  static void LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
  static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
***************
*** 157,165 **** WaitExceedsMaxStandbyDelay(void)
   */
  static void
  ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
! 									   ProcSignalReason reason)
  {
  	char		waitactivitymsg[100];
  
  	while (VirtualTransactionIdIsValid(*waitlist))
  	{
--- 160,171 ----
   */
  static void
  ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
! 									   ProcSignalReason reason,
! 									   TransactionId latestRemovedXid,
! 									   Oid dbOid, Oid relOid)
  {
  	char		waitactivitymsg[100];
+ 	VirtualTransactionId *lockconflicts = NULL;
  
  	while (VirtualTransactionIdIsValid(*waitlist))
  	{
***************
*** 203,221 **** ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
  			if (WaitExceedsMaxStandbyDelay())
  			{
  				pid_t pid;
  
  				/*
! 				 * Now find out who to throw out of the balloon.
  				 */
! 				Assert(VirtualTransactionIdIsValid(*waitlist));
! 				pid = CancelVirtualTransaction(*waitlist, reason);
! 
! 				/*
! 				 * Wait awhile for it to die so that we avoid flooding an
! 				 * unresponsive backend when system is heavily loaded.
! 				 */
! 				if (pid != 0)
! 					pg_usleep(5000);
  			}
  		}
  
--- 209,279 ----
  			if (WaitExceedsMaxStandbyDelay())
  			{
  				pid_t pid;
+ 				bool cancel = true;
  
  				/*
! 				 * If it is a snapshot conflict decide whether we should resolve
! 				 * immediately or defer the resolution until later. This must be
! 				 * designed carefully so that it is a deferral in all cases, in
! 				 * particular we must not set the latestRemovedXid until when the
! 				 * stand_delay has reached max, which is now!
! 				 *
! 				 * If we had a snapshot conflict then we also check for lock conflicts.
! 				 * Lock conflicts disregard whether backends are holding or waiting for
! 				 * the lock, so we get everybody that has declared an intention on the
! 				 * lock. We are careful to acquire the lock partition lock in exclusive
! 				 * mode, so that no concurrent lock requestors slip by while we do this.
! 				 * While holding the lock partition lock we also record the latestRemovedXid
! 				 * so that anybody arriving after we release the lock will cancel
! 				 * themselves using the same errors they would have got here.
! 				 *
! 				 * If conflicts do exist yet don't conflict with the snapshot then we
! 				 * then we do not need to cancel. So we need to merge the two lists
! 				 * to determine who can be spared, for now.
! 				 *
! 				 * XXX O(N^2) at present, but neither list is long in the common case.
  				 */
! 				if (reason == PROCSIG_RECOVERY_CONFLICT_SNAPSHOT)
! 				{
! 					LOCKTAG			locktag;
! 					VirtualTransactionId *conflicts;
! 
! 					SET_LOCKTAG_RELATION(locktag, dbOid, relOid);
! 
! 					if (!lockconflicts)
! 						lockconflicts = GetLockConflicts(&locktag, AccessExclusiveLock,
! 															latestRemovedXid);
! 					conflicts = lockconflicts;
! 					cancel = false;
! 					while (VirtualTransactionIdIsValid(*conflicts))
! 					{
! 						if (waitlist->backendId == conflicts->backendId &&
! 							waitlist->localTransactionId == conflicts->localTransactionId)
! 						{
! 							cancel = true;
! 							break;
! 						}
! 						conflicts++;
! 					}
! 				}
! 
! 				if (cancel)
! 				{
! 					/*
! 					 * Now find out who to throw out of the balloon.
! 					 */
! 					Assert(VirtualTransactionIdIsValid(*waitlist));
! 					pid = CancelVirtualTransaction(*waitlist, reason);
! 
! 					/*
! 					 * Wait awhile for it to die so that we avoid flooding an
! 					 * unresponsive backend when system is heavily loaded.
! 					 */
! 					if (pid != 0)
! 						pg_usleep(5000);
! 				}
! 				else
! 					break;
  			}
  		}
  
***************
*** 232,238 **** ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
  }
  
  void
! ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid)
  {
  	VirtualTransactionId *backends;
  
--- 290,296 ----
  }
  
  void
! ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
  {
  	VirtualTransactionId *backends;
  
***************
*** 240,246 **** ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid)
  										 InvalidOid);
  
  	ResolveRecoveryConflictWithVirtualXIDs(backends,
! 										   PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
  }
  
  void
--- 298,318 ----
  										 InvalidOid);
  
  	ResolveRecoveryConflictWithVirtualXIDs(backends,
! 										   PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
! 										   latestRemovedXid,
! 										   node.dbNode, node.relNode);
! }
! 
! /*
!  * ResolveRecoveryConflictWithRemovedTransactionId is called to resolve
!  * a deferred conflict.
!  *
!  * Treat as a self-interrupt, but don't bother actually sending the signal.
!  */
! void
! ResolveRecoveryConflictWithRemovedTransactionId(void)
! {
! 	RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
  }
  
  void
***************
*** 270,276 **** ResolveRecoveryConflictWithTablespace(Oid tsid)
  	temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId,
  												InvalidOid);
  	ResolveRecoveryConflictWithVirtualXIDs(temp_file_users,
! 										   PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
  }
  
  void
--- 342,350 ----
  	temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId,
  												InvalidOid);
  	ResolveRecoveryConflictWithVirtualXIDs(temp_file_users,
! 										   PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
! 										   InvalidTransactionId,
! 										   0, 0);
  }
  
  void
***************
*** 321,327 **** ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid)
  	while (!lock_acquired)
  	{
  		if (++num_attempts < 3)
! 			backends = GetLockConflicts(&locktag, AccessExclusiveLock);
  		else
  		{
  			backends = GetConflictingVirtualXIDs(InvalidTransactionId,
--- 395,403 ----
  	while (!lock_acquired)
  	{
  		if (++num_attempts < 3)
! 			backends = GetLockConflicts(&locktag,
! 										AccessExclusiveLock,
! 										InvalidTransactionId);
  		else
  		{
  			backends = GetConflictingVirtualXIDs(InvalidTransactionId,
***************
*** 330,336 **** ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid)
  		}
  
  		ResolveRecoveryConflictWithVirtualXIDs(backends,
! 											   PROCSIG_RECOVERY_CONFLICT_LOCK);
  
  		if (LockAcquireExtended(&locktag, AccessExclusiveLock, true, true, false)
  											!= LOCKACQUIRE_NOT_AVAIL)
--- 406,414 ----
  		}
  
  		ResolveRecoveryConflictWithVirtualXIDs(backends,
! 											   PROCSIG_RECOVERY_CONFLICT_LOCK,
! 											   InvalidTransactionId,
! 											   0, 0);
  
  		if (LockAcquireExtended(&locktag, AccessExclusiveLock, true, true, false)
  											!= LOCKACQUIRE_NOT_AVAIL)
*** a/src/backend/storage/lmgr/lock.c
--- b/src/backend/storage/lmgr/lock.c
***************
*** 35,40 ****
--- 35,41 ----
  #include "access/transam.h"
  #include "access/twophase.h"
  #include "access/twophase_rmgr.h"
+ #include "access/xact.h"
  #include "miscadmin.h"
  #include "pg_trace.h"
  #include "pgstat.h"
***************
*** 439,444 **** ProcLockHashCode(const PROCLOCKTAG *proclocktag, uint32 hashcode)
--- 440,486 ----
  	return lockhash;
  }
  
+ /*
+  * SetRecoveryConflictLatestRemovedXid functions manage the latestRemovedXid cache.
+  * During Hot Standby we generate a list of conflicts and then resolve only those
+  * conflicts that have a clear and present lock conflict. We defer the conflict
+  * for other relations to some later time, hopefully never. The deferral mechanism
+  * is that we store the latestRemovedXid that applies to a relation behind the
+  * lock partition that applies for that relation. These routines exist to provide
+  * a level of modularity to allow us to alter the cacheing mechanism as needed.
+  *
+  * It is possible to defer the recovery conflict right up until the point that
+  * we access a particular data block. Doing that means we would have to cache
+  * latestRemovedXid for each data block, even those that have exited and re-entered
+  * cache. It would also mean we would need to re-check latestRemovedXid each time
+  * we access a data block. Both of those factors make that level of granularity
+  * more trouble than it would likely be worth in practice. So we check the
+  * latestRemovedXid once each time we lock a relation: smaller cache, fewer checks.
+  *
+  * We store the latestRemovedXid on the lock entry, giving easy access to the value
+  * that applies for that relation. If a lock is heavily used or held for a long
+  * duration then it will effectively have its own private cache. When the lock entry
+  * is removed, we lose the cached latestRemovedXid value.
+  *
+  * When a lock entry is created we set the value from the cache. The current cache
+  * is just a single value for each partition, though that could be extended to have
+  * multiple entries for frequently used relations. The current cache is stored on
+  * the header of the ProcArrayStruct as a convenient place to store shmem data
+  * and could be relocated elsewhere if required.
+  */
+ static void
+ LockSetRecoveryConflictLatestRemovedXidForPartition(int partition, TransactionId latestRemovedXid)
+ {
+ 	Assert(InHotStandby);
+ 
+ 	ProcArraySetLatestRemovedXidCache(partition, latestRemovedXid);
+ }
+ 
+ static TransactionId
+ LockGetRecoveryConflictLatestRemovedXidForPartition(int partition)
+ {
+ 	return ProcArrayGetLatestRemovedXidCache(partition);
+ }
  
  /*
   * LockAcquire -- Check for lock conflicts, sleep if conflict found,
***************
*** 632,637 **** LockAcquireExtended(const LOCKTAG *locktag,
--- 674,685 ----
  		MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
  		MemSet(lock->granted, 0, sizeof(int) * MAX_LOCKMODES);
  		LOCK_PRINT("LockAcquire: new", lock, lockmode);
+ 
+ 		/*
+ 		 * Start the latestRemovedXid from the partition's cached value
+ 		 */
+ 		if (TransactionStartedDuringRecovery())
+ 			lock->latestRemovedXid = LockGetRecoveryConflictLatestRemovedXidForPartition(partition);
  	}
  	else
  	{
***************
*** 642,647 **** LockAcquireExtended(const LOCKTAG *locktag,
--- 690,726 ----
  	}
  
  	/*
+ 	 * If latestRemovedXid is set, resolve any recovery conflict. We do this
+ 	 * here whether we wait or not. If our snapshot conflicts then we will
+ 	 * be cancelled, so there is no need for us to re-check this after we
+ 	 * wake from a wait for the lock.
+ 	 */
+ 	if (TransactionIdIsValid(lock->latestRemovedXid) &&
+ 		TransactionIdIsValid(MyProc->xmin) &&
+ 		TransactionIdFollowsOrEquals(lock->latestRemovedXid, MyProc->xmin))
+ 	{
+ 		if (lock->nRequested == 0)
+ 		{
+ 			/*
+ 			 * There are no other requestors of this lock, so garbage-collect
+ 			 * the lock object.  We *must* do this to avoid a permanent leak
+ 			 * of shared memory, because there won't be anything to cause
+ 			 * anyone to release the lock object later.
+ 			 */
+ 			Assert(SHMQueueEmpty(&(lock->procLocks)));
+ 			if (!hash_search_with_hash_value(LockMethodLockHash,
+ 											 (void *) &(lock->tag),
+ 											 hashcode,
+ 											 HASH_REMOVE,
+ 											 NULL))
+ 				elog(PANIC, "lock table corrupted");
+ 		}
+ 		LWLockRelease(partitionLock);
+ 		ResolveRecoveryConflictWithRemovedTransactionId();
+ 		/* Not reached */
+ 	}
+ 
+ 	/*
  	 * Create the hash key for the proclock table.
  	 */
  	proclocktag.myLock = lock;
***************
*** 1788,1796 **** LockReassignCurrentOwner(void)
   * uses of the result.
   */
  VirtualTransactionId *
! GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
  {
! 	VirtualTransactionId *vxids;
  	LOCKMETHODID lockmethodid = locktag->locktag_lockmethodid;
  	LockMethod	lockMethodTable;
  	LOCK	   *lock;
--- 1867,1875 ----
   * uses of the result.
   */
  VirtualTransactionId *
! GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, TransactionId latestRemovedXid)
  {
! 	static VirtualTransactionId *vxids;
  	LOCKMETHODID lockmethodid = locktag->locktag_lockmethodid;
  	LockMethod	lockMethodTable;
  	LOCK	   *lock;
***************
*** 1798,1803 **** GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
--- 1877,1883 ----
  	SHM_QUEUE  *procLocks;
  	PROCLOCK   *proclock;
  	uint32		hashcode;
+ 	int			partition;
  	LWLockId	partitionLock;
  	int			count = 0;
  
***************
*** 1812,1827 **** GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
  	 * need enough space for MaxBackends + a terminator, since prepared xacts
  	 * don't count.
  	 */
! 	vxids = (VirtualTransactionId *)
! 		palloc0(sizeof(VirtualTransactionId) * (MaxBackends + 1));
  
  	/*
  	 * Look up the lock object matching the tag.
  	 */
  	hashcode = LockTagHashCode(locktag);
  	partitionLock = LockHashPartitionLock(hashcode);
  
! 	LWLockAcquire(partitionLock, LW_SHARED);
  
  	lock = (LOCK *) hash_search_with_hash_value(LockMethodLockHash,
  												(void *) locktag,
--- 1892,1929 ----
  	 * need enough space for MaxBackends + a terminator, since prepared xacts
  	 * don't count.
  	 */
! 	if (!InHotStandby)
! 		vxids = (VirtualTransactionId *)
! 			palloc0(sizeof(VirtualTransactionId) * (MaxBackends + 1));
! 	else
! 	{
! 		if (vxids == NULL)
! 		{
! 			vxids = (VirtualTransactionId *)
! 				malloc(sizeof(VirtualTransactionId) * (MaxBackends + 1));
! 			if (vxids == NULL)
! 				ereport(ERROR,
! 					(errcode(ERRCODE_OUT_OF_MEMORY),
! 					 errmsg("out of memory")));
! 
! 		}
! 	}
  
  	/*
  	 * Look up the lock object matching the tag.
  	 */
  	hashcode = LockTagHashCode(locktag);
+ 	partition = LockHashPartition(hashcode);
  	partitionLock = LockHashPartitionLock(hashcode);
  
! 	/*
! 	 * If we are setting latestRemovedXid we need to get exclusive lock
! 	 * to avoid race conditions.
! 	 */
! 	if (TransactionIdIsValid(latestRemovedXid))
! 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
! 	else
! 		LWLockAcquire(partitionLock, LW_SHARED);
  
  	lock = (LOCK *) hash_search_with_hash_value(LockMethodLockHash,
  												(void *) locktag,
***************
*** 1831,1836 **** GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
--- 1933,1944 ----
  	if (!lock)
  	{
  		/*
+ 		 * Set the latestRemovedXid for the partition.
+ 		 */
+ 		if (TransactionIdIsValid(latestRemovedXid))
+ 			LockSetRecoveryConflictLatestRemovedXidForPartition(partition, latestRemovedXid);
+ 
+ 		/*
  		 * If the lock object doesn't exist, there is nothing holding a lock
  		 * on this lockable object.
  		 */
***************
*** 1839,1844 **** GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
--- 1947,1958 ----
  	}
  
  	/*
+ 	 * Set the latestRemovedXid for the lock
+ 	 */
+ 	if (TransactionIdIsValid(latestRemovedXid))
+ 		lock->latestRemovedXid = latestRemovedXid;
+ 
+ 	/*
  	 * Examine each existing holder (or awaiter) of the lock.
  	 */
  	conflictMask = lockMethodTable->conflictTab[lockmode];
*** a/src/include/storage/lock.h
--- b/src/include/storage/lock.h
***************
*** 312,317 **** typedef struct LOCK
--- 312,323 ----
  	int			nRequested;		/* total of requested[] array */
  	int			granted[MAX_LOCKMODES]; /* counts of granted locks */
  	int			nGranted;		/* total of granted[] array */
+ 
+ 	/*
+ 	 * For Hot Standby, the xid limit for access to this table. Allows deferred
+ 	 * and selective conflict processing based upon the relation id.
+ 	 */
+ 	TransactionId latestRemovedXid;
  } LOCK;
  
  #define LOCK_LOCKMETHOD(lock) ((LOCKMETHODID) (lock).tag.locktag_lockmethodid)
***************
*** 488,494 **** extern void LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks);
  extern void LockReleaseCurrentOwner(void);
  extern void LockReassignCurrentOwner(void);
  extern VirtualTransactionId *GetLockConflicts(const LOCKTAG *locktag,
! 				 LOCKMODE lockmode);
  extern void AtPrepare_Locks(void);
  extern void PostPrepare_Locks(TransactionId xid);
  extern int LockCheckConflicts(LockMethod lockMethodTable,
--- 494,500 ----
  extern void LockReleaseCurrentOwner(void);
  extern void LockReassignCurrentOwner(void);
  extern VirtualTransactionId *GetLockConflicts(const LOCKTAG *locktag,
! 				 LOCKMODE lockmode, TransactionId latestRemovedXid);
  extern void AtPrepare_Locks(void);
  extern void PostPrepare_Locks(TransactionId xid);
  extern int LockCheckConflicts(LockMethod lockMethodTable,
*** a/src/include/storage/proc.h
--- b/src/include/storage/proc.h
***************
*** 102,107 **** struct PGPROC
--- 102,112 ----
  	 */
  	bool		recoveryConflictPending;
  
+ 	TransactionId limitXmin;	/* The latest xid for which conflict processing
+ 								 * has been already performed for this proc.
+ 								 * Updated by snapshot conflicts, reset when
+ 								 * we take a new snapshot. */
+ 
  	/* Info about LWLock the process is currently waiting for, if any. */
  	bool		lwWaiting;		/* true if waiting for an LW lock */
  	bool		lwExclusive;	/* true if waiting for exclusive access */
*** a/src/include/storage/standby.h
--- b/src/include/storage/standby.h
***************
*** 16,34 ****
  
  #include "access/xlog.h"
  #include "storage/lock.h"
  
  extern int	vacuum_defer_cleanup_age;
  
  extern void InitRecoveryTransactionEnvironment(void);
  extern void ShutdownRecoveryTransactionEnvironment(void);
  
! extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid);
  extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
  extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
  
  extern void ResolveRecoveryConflictWithBufferPin(void);
  extern void SendRecoveryConflictWithBufferPin(void);
  
  /*
   * Standby Rmgr (RM_STANDBY_ID)
   *
--- 16,41 ----
  
  #include "access/xlog.h"
  #include "storage/lock.h"
+ #include "storage/relfilenode.h"
  
  extern int	vacuum_defer_cleanup_age;
  
  extern void InitRecoveryTransactionEnvironment(void);
  extern void ShutdownRecoveryTransactionEnvironment(void);
  
! extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
! 												RelFileNode node);
! extern void ResolveRecoveryConflictWithRemovedTransactionId(void);
  extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
  extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
  
  extern void ResolveRecoveryConflictWithBufferPin(void);
  extern void SendRecoveryConflictWithBufferPin(void);
  
+ /* Prototypes here rather than in procarray.h */
+ extern void ProcArraySetLatestRemovedXidCache(int partition, TransactionId latestRemovedXid);
+ extern TransactionId ProcArrayGetLatestRemovedXidCache(int partition);
+ 
  /*
   * Standby Rmgr (RM_STANDBY_ID)
   *

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 16 years ago

In reply to: Simon Riggs (#1)

Re: Hot Standby: Relation-specific deferred conflict resolution

Simon Riggs wrote:

Conflict resolution improvements are important to include in this
release, as discussed many times. Proposal given here
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php
presents a viable design to improve this.

Following patch is a complete working implementation of that design.
I'm still testing it, but its worth publishing as early as possible to
allow discussion. Not for commit, just yet, but soon.

Um, you're not considering this for 9.0, are you? I think it's time to
concentrate on the must-fix issues and fix the rough edges in what we have.

For example, the "can't start hot standby mode from a shutdown
checkpoint" issue is a must-fix issue in my opinion, about 10x as
important as this. When that was last discussed, many others agreed. I
run into that all the time when testing streaming replication, and every
time I go "Huh, why isn't the standby opening up for connections?", and
then, "Ahh, it's this stupid shutdown checkpoint issue again".

And the VACUUM FULL issue is still hanging too. And maybe you could help
with some other things on the open items or commitfest list.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Heikki Linnakangas (#2)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 08:26 +0200, Heikki Linnakangas wrote:

Simon Riggs wrote:

Conflict resolution improvements are important to include in this
release, as discussed many times. Proposal given here
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php
presents a viable design to improve this.

Following patch is a complete working implementation of that design.
I'm still testing it, but its worth publishing as early as possible to
allow discussion. Not for commit, just yet, but soon.

Um, you're not considering this for 9.0, are you? I think it's time to
concentrate on the must-fix issues and fix the rough edges in what we have.

Yes, it is important.

For example, the "can't start hot standby mode from a shutdown
checkpoint" issue is a must-fix issue in my opinion, about 10x as
important as this. When that was last discussed, many others agreed. I
run into that all the time when testing streaming replication, and every
time I go "Huh, why isn't the standby opening up for connections?", and
then, "Ahh, it's this stupid shutdown checkpoint issue again".

That was not the feedback I have received. Nobody has commented on that
to me, though many have commented on the need for the current patch. As
mentioned, I went to the trouble of running a meeting to gain additional
feedback and the result was very clear.

Of course, if we ignore any feature, then someone will say "its that
stupid issue again", but that won't imply we got our priority wrong.

And the VACUUM FULL issue is still hanging too.

Yes, that is a must fix.

--
Simon Riggs www.2ndQuadrant.com

Guillaume Smet

guillaume.smet@gmail.com

almost 16 years ago

In reply to: Simon Riggs (#3)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, Jan 29, 2010 at 9:03 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

That was not the feedback I have received. Nobody has commented on that
to me, though many have commented on the need for the current patch. As
mentioned, I went to the trouble of running a meeting to gain additional
feedback and the result was very clear.

I don't have a technical opinion about this problem yet as I haven't
tested HS+SR yet but I'm not sure it's a good idea to base technical
decisions and priorities on user polls (I'm pretty sure most of them
don't use HS+SR as much as Heikki these days).
If you ask people what they want in their future cars, they won't
answer they want wheels or an engine: it's something obvious for them.
AFAICS (but I might be wrong), you asked this question to people who
are interested in HS+SR but don't have any idea of what it's like to
use HS+SR daily with or without this limitation.

There are perhaps better arguments for not doing it but this one seems
a bit weird to me.

--
Guillaume

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Guillaume Smet (#4)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 09:20 +0100, Guillaume Smet wrote:

On Fri, Jan 29, 2010 at 9:03 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

That was not the feedback I have received. Nobody has commented on that
to me, though many have commented on the need for the current patch. As
mentioned, I went to the trouble of running a meeting to gain additional
feedback and the result was very clear.

I don't have a technical opinion about this problem yet as I haven't
tested HS+SR yet but I'm not sure it's a good idea to base technical
decisions and priorities on user polls (I'm pretty sure most of them
don't use HS+SR as much as Heikki these days).
If you ask people what they want in their future cars, they won't
answer they want wheels or an engine: it's something obvious for them.
AFAICS (but I might be wrong), you asked this question to people who
are interested in HS+SR but don't have any idea of what it's like to
use HS+SR daily with or without this limitation.

Well, you are correct that a larger group of users *could* have avoided
an obvious and important issue. Though if you deploy that argument it
can be applied both ways: Heikki may also be missing an obvious and
important issue. Where does that leave us?

I am not against putting both into this release. If I am forced to
choose just one, I've at least given reasons why that should be so.

--
Simon Riggs www.2ndQuadrant.com

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 16 years ago

In reply to: Simon Riggs (#3)

Re: Hot Standby: Relation-specific deferred conflict resolution

Simon Riggs wrote:

On Fri, 2010-01-29 at 08:26 +0200, Heikki Linnakangas wrote:

Simon Riggs wrote:

Conflict resolution improvements are important to include in this
release, as discussed many times. Proposal given here
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php
presents a viable design to improve this.

Following patch is a complete working implementation of that design.
I'm still testing it, but its worth publishing as early as possible to
allow discussion. Not for commit, just yet, but soon.

Um, you're not considering this for 9.0, are you? I think it's time to
concentrate on the must-fix issues and fix the rough edges in what we have.

Yes, it is important.

For example, the "can't start hot standby mode from a shutdown
checkpoint" issue is a must-fix issue in my opinion, about 10x as
important as this. When that was last discussed, many others agreed. I
run into that all the time when testing streaming replication, and every
time I go "Huh, why isn't the standby opening up for connections?", and
then, "Ahh, it's this stupid shutdown checkpoint issue again".

That was not the feedback I have received. Nobody has commented on that
to me,

Yes they have. I have on several occasions, as have other people on this
mailing list:

http://archives.postgresql.org/message-id/603c8f070912201611h4951087craa080ff6b48a97cd@mail.gmail.com
http://archives.postgresql.org/message-id/4B30AE53.6020202@gmail.com
http://archives.postgresql.org/message-id/407d949e0912220738je1e0141m87d7b688dd4ba27f@mail.gmail.com

I even *fixed* that already, but you decided to take it out before
committing. I then added it to the list of must-fix items in the TODO
list, but you took that out too. I have no objection to doing things in
smaller steps, but this *is* a must-fix item before release. I still
don't understand why you took it out, nor why you're objecting to that.

though many have commented on the need for the current patch.

Who?

As
mentioned, I went to the trouble of running a meeting to gain additional
feedback and the result was very clear.

So what was the clear result?

If you're looking for things to do, I agree with Greg Stark that the
removal of max_standby_delay=-1 option is not good. That should be fixed
too.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Heikki Linnakangas (#6)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote:

So what was the clear result?

I have spoken clearly enough. You were welcome to attend the Hot Standby
User Group. The fact that you did not expresses your own priorities
quite well, ISTM. Your protestations to know more about the wishes of
users than they do themselves isn't hugely impressive.

There are many features we should add. I will add them in priority order
until forced to stop.

If you or anyone else believes features are essential, then either add
them yourselves or have the courage to stand up and say the release
should be delayed so that I can. To do otherwise is to admit you do not
actually consider them essential. It cannot be both ways.

--
Simon Riggs www.2ndQuadrant.com

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

almost 16 years ago

In reply to: Heikki Linnakangas (#2)

Re: Hot Standby: Relation-specific deferred conflict resolution

Heikki Linnakangas wrote:

Simon Riggs wrote:

Conflict resolution improvements are important to include in this
release, as discussed many times. Proposal given here
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php
presents a viable design to improve this.

Following patch is a complete working implementation of that design.
I'm still testing it, but its worth publishing as early as possible to
allow discussion. Not for commit, just yet, but soon.

Um, you're not considering this for 9.0, are you? I think it's time to
concentrate on the must-fix issues and fix the rough edges in what we have.

I agree - this looks like a completely new feature development to me
that tries to move the goalpost quite far for 9.0.

For example, the "can't start hot standby mode from a shutdown
checkpoint" issue is a must-fix issue in my opinion, about 10x as
important as this. When that was last discussed, many others agreed. I
run into that all the time when testing streaming replication, and every
time I go "Huh, why isn't the standby opening up for connections?", and
then, "Ahh, it's this stupid shutdown checkpoint issue again".

Yeah I ran into that one during testing as well - and I consider it a
serious issue

And the VACUUM FULL issue is still hanging too. And maybe you could help
with some other things on the open items or commitfest list.

yeah and we keep finding major bugs nearly daily so I don't think we
should add features that way now.
First is stability and reliability, optimization and new features are
imho clearly 9.1+ material. Just calling the release 9.0 and saying "we
do that because it is a radical change and we expect some instability"
should NOT mean we are free to put in every feature at the last minute
with "yeah thats fine because people expect instability anyways".

Stefan

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

almost 16 years ago

In reply to: Simon Riggs (#7)

Re: Hot Standby: Relation-specific deferred conflict resolution

Simon Riggs wrote:

On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote:

So what was the clear result?

I have spoken clearly enough. You were welcome to attend the Hot Standby
User Group. The fact that you did not expresses your own priorities
quite well, ISTM. Your protestations to know more about the wishes of
users than they do themselves isn't hugely impressive.

huh? traditionally discussions of that kind had to happen on -hackers
and not in some online place some unnamed people attended.

There are many features we should add. I will add them in priority order
until forced to stop.

we are past the point of adding new features for 9.0 imho

If you or anyone else believes features are essential, then either add
them yourselves or have the courage to stand up and say the release
should be delayed so that I can. To do otherwise is to admit you do not
actually consider them essential. It cannot be both ways.

bugfix and stabilization mode is what we are in now (except for the
stuff that already made it into the commitfest).

Stefan

#10

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Stefan Kaltenbrunner (#8)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 11:10 +0100, Stefan Kaltenbrunner wrote:

yeah and we keep finding major bugs nearly daily

Facts, please?

--
Simon Riggs www.2ndQuadrant.com

#11

Magnus Hagander

magnus@hagander.net

almost 16 years ago

In reply to: Stefan Kaltenbrunner (#9)

Re: Hot Standby: Relation-specific deferred conflict resolution

2010/1/29 Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>:

Simon Riggs wrote:

On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote:

So what was the clear result?

I have spoken clearly enough. You were welcome to attend the Hot Standby
User Group. The fact that you did not expresses your own priorities
quite well, ISTM. Your protestations to know more about the wishes of
users than they do themselves isn't hugely impressive.

huh? traditionally discussions of that kind had to happen on -hackers and not in some online place some unnamed people attended.

+1.

Haven't we had enough communications failures with off-hackers groups
trying to come up with something only to have it not be agreed upon by
hackers later on?

(win32 would be the biggest thing so far, but it's not like we haven't
done it before in more cases)

There are many features we should add. I will add them in priority order
until forced to stop.

we are past the point of adding new features for 9.0 imho

If you or anyone else believes features are essential, then either add
them yourselves or have the courage to stand up and say the release
should be delayed so that I can. To do otherwise is to admit you do not
actually consider them essential. It cannot be both ways.

bugfix and stabilization mode is what we are in now (except for the stuff that already made it into the commitfest).

Well, per some recent discussions, it seems small features are still
ok. But I doubt this qualifies as such.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#12

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Stefan Kaltenbrunner (#9)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 11:12 +0100, Stefan Kaltenbrunner wrote:

There are many features we should add. I will add them in priority order
until forced to stop.

we are past the point of adding new features for 9.0 imho

So presumably we cannot add the new feature to start hot standby at
shutdown checkpoints then either?

If you or anyone else believes features are essential, then either add
them yourselves or have the courage to stand up and say the release
should be delayed so that I can. To do otherwise is to admit you do not
actually consider them essential. It cannot be both ways.

bugfix and stabilization mode is what we are in now (except for the
stuff that already made it into the commitfest).

That's where we'd like to be, but these new features have not been in
the tree long enough for what you say to be the actual position. We can
pretend it is, but that doesn't make it so.

--
Simon Riggs www.2ndQuadrant.com

#13

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

almost 16 years ago

In reply to: Simon Riggs (#10)

Re: Hot Standby: Relation-specific deferred conflict resolution

Simon Riggs wrote:

On Fri, 2010-01-29 at 11:10 +0100, Stefan Kaltenbrunner wrote:

yeah and we keep finding major bugs nearly daily

Facts, please?

5 seconds of time spent on archives.postgresql.org show at least the
following SR/HS related bugs in the last 7 days or so:

http://archives.postgresql.org/pgsql-committers/2010-01/msg00400.php
http://archives.postgresql.org/pgsql-committers/2010-01/msg00410.php
http://archives.postgresql.org/pgsql-committers/2010-01/msg00396.php
http://archives.postgresql.org/pgsql-committers/2010-01/msg00323.php

some of those you might call "minor" but they are bugs and given the
current rate we are seeing them is imho a clear sign of "code by far not
stable enough to consider new features that late in the cycle"

Stefan

#14

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

almost 16 years ago

In reply to: Simon Riggs (#12)

Re: Hot Standby: Relation-specific deferred conflict resolution

Simon Riggs wrote:

On Fri, 2010-01-29 at 11:12 +0100, Stefan Kaltenbrunner wrote:

There are many features we should add. I will add them in priority order
until forced to stop.

we are past the point of adding new features for 9.0 imho

So presumably we cannot add the new feature to start hot standby at
shutdown checkpoints then either?

well as far as I recall that ones has been proposed much earlier than
mid of december(like the patch in discussion here) and I agree with
heikki that I'm not clear on what your actual objections to that patch
were - care to provide a link to the archives please?

If you or anyone else believes features are essential, then either add
them yourselves or have the courage to stand up and say the release
should be delayed so that I can. To do otherwise is to admit you do not
actually consider them essential. It cannot be both ways.

bugfix and stabilization mode is what we are in now (except for the
stuff that already made it into the commitfest).

That's where we'd like to be, but these new features have not been in
the tree long enough for what you say to be the actual position. We can
pretend it is, but that doesn't make it so.

Not sure I follow (maybe because I'm not a native speaker) but are you
trying to say that we can simply add new features late in the release
cycle to stuff commited because it is not long in the tree instead of
focusing stabilizing what we have?
If you are so sure that we NEED those features to be releaseworthy -
maybe it was premature to commit HS and SR for this cycle?

Stefan

#15

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Stefan Kaltenbrunner (#13)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 11:20 +0100, Stefan Kaltenbrunner wrote:

Simon Riggs wrote:

On Fri, 2010-01-29 at 11:10 +0100, Stefan Kaltenbrunner wrote:

yeah and we keep finding major bugs nearly daily

Facts, please?

5 seconds of time spent on archives.postgresql.org show at least the
following SR/HS related bugs in the last 7 days or so:

http://archives.postgresql.org/pgsql-committers/2010-01/msg00400.php
http://archives.postgresql.org/pgsql-committers/2010-01/msg00410.php
http://archives.postgresql.org/pgsql-committers/2010-01/msg00396.php
http://archives.postgresql.org/pgsql-committers/2010-01/msg00323.php

some of those you might call "minor" but they are bugs and given the
current rate we are seeing them is imho a clear sign of "code by far not
stable enough to consider new features that late in the cycle"

I don't think two very minor bugs in Hot Standby, reported and fixed 7
days apart is any indication of instability. It isn't the "daily bugs
reported" you suggested. Personally, I think it indicates quite the
opposite - if those are the only bugs I can find now, I'm ecstatic.

I think your argument does apply to Streaming Rep, at this point. We
should consider releasing Alpha4 and then later going to Beta.

My point of view expressed here is not built in a few seconds, it is
built on discussion and feedback over 18 months. The conflict issue was
discussed by me with hackers at the May 2008 dev meeting. It should be
improved upon in this release and it has been the main issue concerning
the full range of people I have discussed HS with.

--
Simon Riggs www.2ndQuadrant.com

#16

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Simon Riggs (#7)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, Jan 29, 2010 at 5:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote:

So what was the clear result?

I have spoken clearly enough. You were welcome to attend the Hot Standby
User Group. The fact that you did not expresses your own priorities
quite well, ISTM. Your protestations to know more about the wishes of
users than they do themselves isn't hugely impressive.

This doesn't make any sense at all. It is not the case that people
who attended the Hot Standby user group are the only ones who are
entitled to provide any feedback, and that people who read
pgsql-hackers are not (unless they also happen to have attended the
Hot Standby user group). If anything, that's 100% backwards, but at
any rate Heikki has probably spent more time over the last year on
this feature than anyone in the world with the exception of yourself;
I think that counts for a lot more than one two-hour meeting.

...Robert

#17

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Robert Haas (#16)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 07:01 -0500, Robert Haas wrote:

On Fri, Jan 29, 2010 at 5:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote:

So what was the clear result?

I have spoken clearly enough. You were welcome to attend the Hot Standby
User Group. The fact that you did not expresses your own priorities
quite well, ISTM. Your protestations to know more about the wishes of
users than they do themselves isn't hugely impressive.

This doesn't make any sense at all.

I am busy working on the right features for the most number of people,
as expressed to me. I accept there are people that disagree and I am
sorry for that. Others are welcome to add code to do the things I will
not be doing through lack of time, if they are not satisfied with my
priority.

I think we should extend the time available to make sure we have a
sensible set of features for 9.0. The heat of this discussion tells me
that we are going to be lacking features that are must-have to someone,
whether or not they are in the majority.

--
Simon Riggs www.2ndQuadrant.com

#18

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Heikki Linnakangas (#6)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote:

I even *fixed* that already, but you decided to take it out before
committing. I then added it to the list of must-fix items in the TODO
list, but you took that out too. I have no objection to doing things
in smaller steps, but this *is* a must-fix item before release. I
still don't understand why you took it out, nor why you're objecting
to that.

This is a misrepresentation of what has happened. The item you mention
was added to the TODO by me in response to your comments and has never
been removed at any point; it is still there now. You added a duplicate
and the duplicate was removed. I removed code that you mentioned was
buggy because I don't have time to fix it and it is not high enough up
the priority list. We have discussed all of these things before yet you
raise them again as if those things have never been said.

I am working on TODO in a priority order. The priority list has changed
over time in response to comments received from both you and other
people. I understand you don't like my current sequence of actions and
I'm sorry for that. I am trying to be rational and balance what has been
said to me for the benefit of the community from all stakeholders and
have consulted widely to gather thoughts.

I am more than happy for other people to work on items on the list, as
Joachim and Andres have done.

--
Simon Riggs www.2ndQuadrant.com

#19

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 16 years ago

In reply to: Simon Riggs (#18)

Re: Hot Standby: Relation-specific deferred conflict resolution

Simon Riggs wrote:

I removed code that you mentioned was
buggy because I don't have time to fix it and it is not high enough up
the priority list. We have discussed all of these things before yet you
raise them again as if those things have never been said.

*sigh*. Yeah, we've been through this. As I've said before, I never said
the code was buggy, that was just a misunderstanding at your end.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#20

Greg Stark

gsstark@mit.edu

almost 16 years ago

In reply to: Simon Riggs (#18)

Re: Hot Standby: Relation-specific deferred conflict resolution

The fundamental disagreement here is over what qualifies as a
"wishlist" item, aka a feature or added functionality. And what
qualifies as a must-fix bug.

Priorities are context sensitive. If this were early in the cycle then
working on bigger impact features like conflict resolution code might
be more important because it's important to get them into the code
base where other people can see if it solves their problems and the
behaviour can be tweaked. Fixing rare but outright broken things might
not be high priority because while they have to be done by release
nobody is blocking on on the solution before then.

On the other hand near release we stop trying to incorporate new
features and focus on basic bug features. The new features don't
become any less important to the users, it's just that they'll make it
into the next release. There will always be more features that can
help users and we'll always think of cool new things to knock off the
rough edges off what we have now and get it out so we can go back to
the playground for the next release.

You said "I think we should extend the time available to make sure we
have a sensible set of features for 9.0." Perhaps part of the problem
is that I couldn't understand what your patch did from the description
you posted and can't evaluate whether it's fixing a problem that makes
the current feature set incoherent. Can you explain what it does in
more detail so we can understand why it's necessary for a sensible set
of features?

#21

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Heikki Linnakangas (#19)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 16:44 +0200, Heikki Linnakangas wrote:

Simon Riggs wrote:

I removed code that you mentioned was
buggy because I don't have time to fix it and it is not high enough up
the priority list. We have discussed all of these things before yet you
raise them again as if those things have never been said.

*sigh*. Yeah, we've been through this. As I've said before, I never said
the code was buggy, that was just a misunderstanding at your end.

OK, if you say I misunderstood, then I accept that.

The deed is done and the code removed. Putting it back takes time and
given enough of that rare cloth, it will eventually be put back.

--
Simon Riggs www.2ndQuadrant.com

#22

Joshua D. Drake

jd@commandprompt.com

almost 16 years ago

In reply to: Simon Riggs (#17)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 12:56 +0000, Simon Riggs wrote:

I think we should extend the time available to make sure we have a
sensible set of features for 9.0. The heat of this discussion tells me
that we are going to be lacking features that are must-have to someone,
whether or not they are in the majority.

-1

Missing release dates because of some patch that isn't done is something
the community has been trying to get away from, aggressively. The way
this is supposed to work is:

We have a release date
Features that aren't going to make that date, don't.
We release

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.

#23

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Greg Stark (#20)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 14:52 +0000, Greg Stark wrote:

You said "I think we should extend the time available to make sure we
have a sensible set of features for 9.0." Perhaps part of the problem
is that I couldn't understand what your patch did from the description
you posted and can't evaluate whether it's fixing a problem that makes
the current feature set incoherent. Can you explain what it does in
more detail so we can understand why it's necessary for a sensible set
of features?

I'll break down the patch into two pieces to make it easier to review,
and add more description, as you suggest.

--
Simon Riggs www.2ndQuadrant.com

#24

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

almost 16 years ago

In reply to: Simon Riggs (#15)

Re: Hot Standby: Relation-specific deferred conflict resolution

Simon Riggs wrote:

On Fri, 2010-01-29 at 11:20 +0100, Stefan Kaltenbrunner wrote:

Simon Riggs wrote:

On Fri, 2010-01-29 at 11:10 +0100, Stefan Kaltenbrunner wrote:

yeah and we keep finding major bugs nearly daily

Facts, please?

5 seconds of time spent on archives.postgresql.org show at least the
following SR/HS related bugs in the last 7 days or so:

http://archives.postgresql.org/pgsql-committers/2010-01/msg00400.php
http://archives.postgresql.org/pgsql-committers/2010-01/msg00410.php
http://archives.postgresql.org/pgsql-committers/2010-01/msg00396.php
http://archives.postgresql.org/pgsql-committers/2010-01/msg00323.php

some of those you might call "minor" but they are bugs and given the
current rate we are seeing them is imho a clear sign of "code by far not
stable enough to consider new features that late in the cycle"

I don't think two very minor bugs in Hot Standby, reported and fixed 7
days apart is any indication of instability. It isn't the "daily bugs
reported" you suggested. Personally, I think it indicates quite the
opposite - if those are the only bugs I can find now, I'm ecstatic.

well we have not even made a realistic release (not even an alpha!) with
the current HS/SR code, we still have "must fix" issues on the table(for
both SR and HS) AND we find/fix more than a bug every two days in those
two features that are the cause for moving to 9.0.
If we want to release in anya realistic timeframe (and I recall you
advocating for doing that in the past) we really need to wrap up what we
have now, make it robust and see what we have left for all the further
releases.

I think your argument does apply to Streaming Rep, at this point. We
should consider releasing Alpha4 and then later going to Beta.

so you basically say that the current codebase(as a whole) is in need of
stabilisation and we need to make the cut off and release alpha4 now?
Not sure how that fits into proposing new features for other parts of
the system...

My point of view expressed here is not built in a few seconds, it is
built on discussion and feedback over 18 months. The conflict issue was
discussed by me with hackers at the May 2008 dev meeting. It should be
improved upon in this release and it has been the main issue concerning
the full range of people I have discussed HS with.

"in this release" refers to the current patch I guess - because there
was no HS in older versions of pg :)

Stefan

#25

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Joshua D. Drake (#22)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, Jan 29, 2010 at 11:32 AM, Joshua D. Drake <jd@commandprompt.com> wrote:

On Fri, 2010-01-29 at 12:56 +0000, Simon Riggs wrote:

I think we should extend the time available to make sure we have a
sensible set of features for 9.0. The heat of this discussion tells me
that we are going to be lacking features that are must-have to someone,
whether or not they are in the majority.

-1

Missing release dates because of some patch that isn't done is something
the community has been trying to get away from, aggressively. The way
this is supposed to work is:

We have a release date
Features that aren't going to make that date, don't.
We release

Exactly. It would be nice to see 9.0 come out in 2010, and we're not
going to get there unless we start fixing the issues that are actually
release-blockers, rather than adding new features. Hot Standby was
committed with at least one known release blocker (VACUUM FULL) on the
assumption that that release blocker would be fixed by the committer
who introduced it (isn't that the rule?). Two months on, there is
zero sign of any activity on that front, and instead we're now being
bombarded with a series of other patches that fix issues that are not
release-blockers under the theory that the feature isn't good enough
to be used without them. If that's really true, it wasn't ready for
commit in the first place.

If this were any other patch, I'd propose reverting it.

...Robert

#26

Joshua D. Drake

jd@commandprompt.com

almost 16 years ago

In reply to: Robert Haas (#25)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote:

Exactly. It would be nice to see 9.0 come out in 2010, and we're not
going to get there unless we start fixing the issues that are actually
release-blockers, rather than adding new features. Hot Standby was
committed with at least one known release blocker (VACUUM FULL) on the
assumption that that release blocker would be fixed by the committer
who introduced it (isn't that the rule?). Two months on, there is
zero sign of any activity on that front, and instead we're now being
bombarded with a series of other patches that fix issues that are not
release-blockers under the theory that the feature isn't good enough
to be used without them. If that's really true, it wasn't ready for
commit in the first place.

If this were any other patch, I'd propose reverting it.

I would suggest that if we don't see activity on the release blockers,
pretty much stat... we revert it.

Joshua D. Drake

...Robert

#27

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Robert Haas (#25)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote:

Two months on, there is
zero sign of any activity on that front

I'm surprised that you call 14 commits in 28 days following a publicly
available priority list: "zero sign of activity".

Further discussion seems pointless.

--
Simon Riggs www.2ndQuadrant.com

#28

Joshua D. Drake

jd@commandprompt.com

almost 16 years ago

In reply to: Simon Riggs (#27)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 18:08 +0000, Simon Riggs wrote:

On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote:

Two months on, there is
zero sign of any activity on that front

I'm surprised that you call 14 commits in 28 days following a publicly
available priority list: "zero sign of activity".

Further discussion seems pointless.

Let's be clear. Robert is discussing release blockers. He is not
suggesting that no work has been done. I believe the community considers
release-blocks "the priority".

Joshua D. Drake

#29

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Simon Riggs (#27)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, Jan 29, 2010 at 1:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote:

Two months on, there is
zero sign of any activity on that front

I'm surprised that you call 14 commits in 28 days following a publicly
available priority list: "zero sign of activity".

Further discussion seems pointless.

Wow, that was an awesome way to quote what I said out of context and
make it sound like I said something ridiculous. The problem I and
others have is not with the quantity of your commits but with the
issues you are choosing (not) to address.

...Robert

#30

Josh Berkus

josh@agliodbs.com

almost 16 years ago

In reply to: Simon Riggs (#18)

Re: Hot Standby: Relation-specific deferred conflict resolution

All,

Is there a working list of HS must-fix items somewhere which people
agree on? Or are we still lacking consensus?

--Josh Berkus

#31

Joshua D. Drake

jd@commandprompt.com

almost 16 years ago

In reply to: Josh Berkus (#30)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 11:41 -0800, Josh Berkus wrote:

All,

Is there a working list of HS must-fix items somewhere which people
agree on? Or are we still lacking consensus?

VACUUM FULL, I believe is one.

Joshua D. Drake

--Josh Berkus

#32

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Greg Stark (#20)

1 attachment(s)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 14:52 +0000, Greg Stark wrote:

Can you explain what it does in
more detail so we can understand why it's necessary for a sensible set
of features?

I've slimmed down the patch to make it clearer what it does, having
committed some refactoring.

Problem: Currently when we perform conflict resolution we do not use the
relid from the WAL record, we target all users regardless of which
relations they have accessed or intend to access. So changes to table X
can cause cancelation of someone accessing table Y because **they might
later in the transaction access table X**. That is too heavy handed and
is most often overkill. This is the same problem you and I have
discussed many times, over the last 14 months, though the problem itself
has been discussed on hackers many times over last 20 months and many
potential solutions offered by me.

An example of current behaviour, using tables A, B and C
T0: An AccessExclusiveLock is applied to B
T1: Q1 takes snapshot, takes lock on A and begins query
T2: Q2 takes snapshot, queues for lock on B behind AccessExclusiveLock
T3: Cleanup on table C is handled that will conflict with both snapshots
T4: Q3 takes snapshot, takes lock on C and begins query (if possible)
T5: Cleanup on table C is handled that will conflict with Q3

Current: At T3, current conflict resolution will wait for
max_standby_delay and then cancel Q1 and Q2. Q3 can begin processing
immediately because the snapshot it takes will always be same or later
than the xmin that generated the cleanup at T3. At T5, Q3 will be
quickly cancelled because all the standby delay was used up at T3 and
there is none left to spend on delaying for Q3.

Proposed Resolution:
as presented to hackers in 12/2009
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php

Let's look at the effect first, then return to the detail.

In this proposal, the above sequence of actions will look like this:
Conflict resolution will wait at T3 until we hit max_standby_delay, at
which point we learn that Q1 and Q2 do not conflict and we let them
continue on their way. At T5, Q3 will be cancelled without much delay,
because we have now used up most of max_standby_delay.

So in both approaches, Q3 that accessed table C will be canceled fairly
quickly. The key to this is that in the new proposal, Q1 and Q2 will not
be canceled: they will continue to completion.

How it works: When we process a snapshot conflict we check which queries
have snapshots that conflict. We then wait for max_standby_delay and the
check lock conflicts. (We do it this way because of a timing issue
described on the link above, pointed out by Greg). When we check for
lock conflicts we also set a latestRemovedXid on "that relation", so
that we capture all current lockers *and* allow all future lockers to
check the latestRemovedXid against their snapshot. In either case, if a
lock conflict occurs then we will cancel the query.

I mention "that relation" because *where* we record the xid limit for
each relation is an important aspect of the design. In the current patch
we take a simple approach, others are possible. If there is already a
lock in the shared lock table, then we add the latestRemovedXid to that.
If not, we keep track of the latestRemovedXid for the whole lock
partition. So we aren't tracking each relation separately in most cases,
except for when a table is being frequently accessed, or access for a
long period.

There is also an optimization added here. When we defer cancelation of
queries the same query keeps re-appearing in the conflict list for later
WAL records. As a result there is a mechanism to avoid constant
re-listing of a conflict.

The attached patch is for review and discussion only at this stage.

I'm working on other areas now while discussion takes place, or not.

--
Simon Riggs www.2ndQuadrant.com

Attachments:

relation_specific_conflict_res.v3.patchtext/x-patch; charset=UTF-8; name=relation_specific_conflict_res.v3.patchDownload

*** a/src/backend/commands/indexcmds.c
--- b/src/backend/commands/indexcmds.c
***************
*** 540,546 **** DefineIndex(RangeVar *heapRelation,
  	 * check for that.	Also, prepared xacts are not reported, which is fine
  	 * since they certainly aren't going to do anything more.
  	 */
! 	old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
  
  	while (VirtualTransactionIdIsValid(*old_lockholders))
  	{
--- 540,546 ----
  	 * check for that.	Also, prepared xacts are not reported, which is fine
  	 * since they certainly aren't going to do anything more.
  	 */
! 	old_lockholders = GetLockConflicts(&heaplocktag, ShareLock, InvalidTransactionId);
  
  	while (VirtualTransactionIdIsValid(*old_lockholders))
  	{
***************
*** 627,633 **** DefineIndex(RangeVar *heapRelation,
  	 * We once again wait until no transaction can have the table open with
  	 * the index marked as read-only for updates.
  	 */
! 	old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
  
  	while (VirtualTransactionIdIsValid(*old_lockholders))
  	{
--- 627,633 ----
  	 * We once again wait until no transaction can have the table open with
  	 * the index marked as read-only for updates.
  	 */
! 	old_lockholders = GetLockConflicts(&heaplocktag, ShareLock, InvalidTransactionId);
  
  	while (VirtualTransactionIdIsValid(*old_lockholders))
  	{
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***************
*** 66,71 **** typedef struct ProcArrayStruct
--- 66,72 ----
  
  	int			numKnownAssignedXids;	/* current number of known assigned xids */
  	int			maxKnownAssignedXids;	/* allocated size of known assigned xids */
+ 
  	/*
  	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
  	 * overflowing cached subxids in PGPROC entries.
***************
*** 73,78 **** typedef struct ProcArrayStruct
--- 74,86 ----
  	TransactionId	lastOverflowedXid;
  
  	/*
+ 	 * During Hot Standby we need to store a visibility limit for to defer recovery
+ 	 * conflict processing. The cached value is accessed during lock processing,
+ 	 * but stored on the procarray header. See lock.c for further details.
+ 	 */
+ 	TransactionId	latestRemovedXidCache[NUM_LOCK_PARTITIONS];
+ 
+ 	/*
  	 * We declare procs[] as 1 entry because C wants a fixed-size array, but
  	 * actually it is maxProcs entries long.
  	 */
***************
*** 220,225 **** CreateSharedProcArray(void)
--- 228,236 ----
  											  HASH_ELEM | HASH_FUNCTION);
  		if (!KnownAssignedXidsHash)
  			elog(FATAL, "could not initialize known assigned xids hash table");
+ 
+ 		/* Initialize the latestRemovedXidCache */
+ 		MemSet(procArray->latestRemovedXidCache, 0, NUM_LOCK_PARTITIONS * sizeof(TransactionId));
  	}
  }
  
***************
*** 1216,1221 **** GetSnapshotData(Snapshot snapshot)
--- 1227,1235 ----
  		RecentGlobalXmin = FirstNormalTransactionId;
  	RecentXmin = xmin;
  
+ 	/* Reset recovery conflict processing limit after deriving new snapshot */
+ 	MyProc->limitXmin = InvalidTransactionId;
+ 
  	snapshot->xmin = xmin;
  	snapshot->xmax = xmax;
  	snapshot->xcnt = count;
***************
*** 1739,1747 **** GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
  			{
  				VirtualTransactionId vxid;
  
! 				GET_VXID_FROM_PGPROC(vxid, *proc);
! 				if (VirtualTransactionIdIsValid(vxid))
! 					vxids[count++] = vxid;
  			}
  		}
  	}
--- 1753,1774 ----
  			{
  				VirtualTransactionId vxid;
  
! 				/*
! 				 * Avoid repeated re-listing of a process for conflict processing
! 				 * by excluding it once resolved up to a particular latestRemovedXid.
! 				 * We reset proc->limitXmin each time we take a new snapshot because
! 				 * that may need to have conflict processing performed on it, as
! 				 * discussed in header comments for this function.
! 				 */
! 				if (!TransactionIdIsValid(proc->limitXmin) ||
! 					TransactionIdFollows(limitXmin, proc->limitXmin))
! 				{
! 					proc->limitXmin = limitXmin;
! 
! 					GET_VXID_FROM_PGPROC(vxid, *proc);
! 					if (VirtualTransactionIdIsValid(vxid))
! 						vxids[count++] = vxid;
! 				}
  			}
  		}
  	}
***************
*** 2559,2561 **** KnownAssignedXidsDisplay(int trace_level)
--- 2586,2610 ----
  
  	pfree(buf.data);
  }
+ 
+ /*
+  * LatestRemovedXidCache provides a way of deferring recovery conflicts up to
+  * the point where a query requests a lock on a relation. These functions provide
+  * the cache required by recovery conflict processing. (see storage/lmgr/lock.c)
+  * Function prototypes are in standby.h, not procarray.h for convenience.
+  */
+ void
+ ProcArraySetLatestRemovedXidCache(int partition, TransactionId latestRemovedXid)
+ {
+ 	ProcArrayStruct *arrayP = procArray;
+ 
+ 	arrayP->latestRemovedXidCache[partition] = latestRemovedXid;
+ }
+ 
+ TransactionId
+ ProcArrayGetLatestRemovedXidCache(int partition)
+ {
+ 	ProcArrayStruct *arrayP = procArray;
+ 
+ 	return arrayP->latestRemovedXidCache[partition];
+ }
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
***************
*** 27,32 ****
--- 27,33 ----
  #include "storage/procarray.h"
  #include "storage/sinvaladt.h"
  #include "storage/standby.h"
+ #include "tcop/tcopprot.h"
  #include "utils/ps_status.h"
  
  int		vacuum_defer_cleanup_age;
***************
*** 34,40 **** int		vacuum_defer_cleanup_age;
  static List *RecoveryLockList;
  
  static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
! 									   ProcSignalReason reason);
  static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
  static void LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
  static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
--- 35,43 ----
  static List *RecoveryLockList;
  
  static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
! 									   ProcSignalReason reason,
! 									   TransactionId latestRemovedXid,
! 									   Oid dbOid, Oid relOid);
  static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
  static void LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
  static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
***************
*** 157,165 **** WaitExceedsMaxStandbyDelay(void)
   */
  static void
  ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
! 									   ProcSignalReason reason)
  {
  	char		waitactivitymsg[100];
  
  	while (VirtualTransactionIdIsValid(*waitlist))
  	{
--- 160,171 ----
   */
  static void
  ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
! 									   ProcSignalReason reason,
! 									   TransactionId latestRemovedXid,
! 									   Oid dbOid, Oid relOid)
  {
  	char		waitactivitymsg[100];
+ 	VirtualTransactionId *lockconflicts = NULL;
  
  	while (VirtualTransactionIdIsValid(*waitlist))
  	{
***************
*** 203,221 **** ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
  			if (WaitExceedsMaxStandbyDelay())
  			{
  				pid_t pid;
  
  				/*
! 				 * Now find out who to throw out of the balloon.
  				 */
! 				Assert(VirtualTransactionIdIsValid(*waitlist));
! 				pid = CancelVirtualTransaction(*waitlist, reason);
! 
! 				/*
! 				 * Wait awhile for it to die so that we avoid flooding an
! 				 * unresponsive backend when system is heavily loaded.
! 				 */
! 				if (pid != 0)
! 					pg_usleep(5000);
  			}
  		}
  
--- 209,279 ----
  			if (WaitExceedsMaxStandbyDelay())
  			{
  				pid_t pid;
+ 				bool cancel = true;
  
  				/*
! 				 * If it is a snapshot conflict decide whether we should resolve
! 				 * immediately or defer the resolution until later. This must be
! 				 * designed carefully so that it is a deferral in all cases, in
! 				 * particular we must not set the latestRemovedXid until when the
! 				 * stand_delay has reached max, which is now!
! 				 *
! 				 * If we had a snapshot conflict then we also check for lock conflicts.
! 				 * Lock conflicts disregard whether backends are holding or waiting for
! 				 * the lock, so we get everybody that has declared an intention on the
! 				 * lock. We are careful to acquire the lock partition lock in exclusive
! 				 * mode, so that no concurrent lock requestors slip by while we do this.
! 				 * While holding the lock partition lock we also record the latestRemovedXid
! 				 * so that anybody arriving after we release the lock will cancel
! 				 * themselves using the same errors they would have got here.
! 				 *
! 				 * If conflicts do exist yet don't conflict with the snapshot then we
! 				 * then we do not need to cancel. So we need to merge the two lists
! 				 * to determine who can be spared, for now.
! 				 *
! 				 * XXX O(N^2) at present, but neither list is long in the common case.
  				 */
! 				if (reason == PROCSIG_RECOVERY_CONFLICT_SNAPSHOT)
! 				{
! 					LOCKTAG			locktag;
! 					VirtualTransactionId *conflicts;
! 
! 					SET_LOCKTAG_RELATION(locktag, dbOid, relOid);
! 
! 					if (!lockconflicts)
! 						lockconflicts = GetLockConflicts(&locktag, AccessExclusiveLock,
! 															latestRemovedXid);
! 					conflicts = lockconflicts;
! 					cancel = false;
! 					while (VirtualTransactionIdIsValid(*conflicts))
! 					{
! 						if (waitlist->backendId == conflicts->backendId &&
! 							waitlist->localTransactionId == conflicts->localTransactionId)
! 						{
! 							cancel = true;
! 							break;
! 						}
! 						conflicts++;
! 					}
! 				}
! 
! 				if (cancel)
! 				{
! 					/*
! 					 * Now find out who to throw out of the balloon.
! 					 */
! 					Assert(VirtualTransactionIdIsValid(*waitlist));
! 					pid = CancelVirtualTransaction(*waitlist, reason);
! 
! 					/*
! 					 * Wait awhile for it to die so that we avoid flooding an
! 					 * unresponsive backend when system is heavily loaded.
! 					 */
! 					if (pid != 0)
! 						pg_usleep(5000);
! 				}
! 				else
! 					break;
  			}
  		}
  
***************
*** 232,238 **** ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
  }
  
  void
! ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid)
  {
  	VirtualTransactionId *backends;
  
--- 290,296 ----
  }
  
  void
! ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
  {
  	VirtualTransactionId *backends;
  
***************
*** 240,246 **** ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid)
  										 InvalidOid);
  
  	ResolveRecoveryConflictWithVirtualXIDs(backends,
! 										   PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
  }
  
  void
--- 298,318 ----
  										 InvalidOid);
  
  	ResolveRecoveryConflictWithVirtualXIDs(backends,
! 										   PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
! 										   latestRemovedXid,
! 										   node.dbNode, node.relNode);
! }
! 
! /*
!  * ResolveRecoveryConflictWithRemovedTransactionId is called to resolve
!  * a deferred conflict.
!  *
!  * Treat as a self-interrupt, but don't bother actually sending the signal.
!  */
! void
! ResolveRecoveryConflictWithRemovedTransactionId(void)
! {
! 	RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
  }
  
  void
***************
*** 270,276 **** ResolveRecoveryConflictWithTablespace(Oid tsid)
  	temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId,
  												InvalidOid);
  	ResolveRecoveryConflictWithVirtualXIDs(temp_file_users,
! 										   PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
  }
  
  void
--- 342,350 ----
  	temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId,
  												InvalidOid);
  	ResolveRecoveryConflictWithVirtualXIDs(temp_file_users,
! 										   PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
! 										   InvalidTransactionId,
! 										   0, 0);
  }
  
  void
***************
*** 321,327 **** ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid)
  	while (!lock_acquired)
  	{
  		if (++num_attempts < 3)
! 			backends = GetLockConflicts(&locktag, AccessExclusiveLock);
  		else
  		{
  			backends = GetConflictingVirtualXIDs(InvalidTransactionId,
--- 395,403 ----
  	while (!lock_acquired)
  	{
  		if (++num_attempts < 3)
! 			backends = GetLockConflicts(&locktag,
! 										AccessExclusiveLock,
! 										InvalidTransactionId);
  		else
  		{
  			backends = GetConflictingVirtualXIDs(InvalidTransactionId,
***************
*** 330,336 **** ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid)
  		}
  
  		ResolveRecoveryConflictWithVirtualXIDs(backends,
! 											   PROCSIG_RECOVERY_CONFLICT_LOCK);
  
  		if (LockAcquireExtended(&locktag, AccessExclusiveLock, true, true, false)
  											!= LOCKACQUIRE_NOT_AVAIL)
--- 406,414 ----
  		}
  
  		ResolveRecoveryConflictWithVirtualXIDs(backends,
! 											   PROCSIG_RECOVERY_CONFLICT_LOCK,
! 											   InvalidTransactionId,
! 											   0, 0);
  
  		if (LockAcquireExtended(&locktag, AccessExclusiveLock, true, true, false)
  											!= LOCKACQUIRE_NOT_AVAIL)
*** a/src/backend/storage/lmgr/lock.c
--- b/src/backend/storage/lmgr/lock.c
***************
*** 35,40 ****
--- 35,41 ----
  #include "access/transam.h"
  #include "access/twophase.h"
  #include "access/twophase_rmgr.h"
+ #include "access/xact.h"
  #include "miscadmin.h"
  #include "pg_trace.h"
  #include "pgstat.h"
***************
*** 439,444 **** ProcLockHashCode(const PROCLOCKTAG *proclocktag, uint32 hashcode)
--- 440,486 ----
  	return lockhash;
  }
  
+ /*
+  * SetRecoveryConflictLatestRemovedXid functions manage the latestRemovedXid cache.
+  * During Hot Standby we generate a list of conflicts and then resolve only those
+  * conflicts that have a clear and present lock conflict. We defer the conflict
+  * for other relations to some later time, hopefully never. The deferral mechanism
+  * is that we store the latestRemovedXid that applies to a relation behind the
+  * lock partition that applies for that relation. These routines exist to provide
+  * a level of modularity to allow us to alter the cacheing mechanism as needed.
+  *
+  * It is possible to defer the recovery conflict right up until the point that
+  * we access a particular data block. Doing that means we would have to cache
+  * latestRemovedXid for each data block, even those that have exited and re-entered
+  * cache. It would also mean we would need to re-check latestRemovedXid each time
+  * we access a data block. Both of those factors make that level of granularity
+  * more trouble than it would likely be worth in practice. So we check the
+  * latestRemovedXid once each time we lock a relation: smaller cache, fewer checks.
+  *
+  * We store the latestRemovedXid on the lock entry, giving easy access to the value
+  * that applies for that relation. If a lock is heavily used or held for a long
+  * duration then it will effectively have its own private cache. When the lock entry
+  * is removed, we lose the cached latestRemovedXid value.
+  *
+  * When a lock entry is created we set the value from the cache. The current cache
+  * is just a single value for each partition, though that could be extended to have
+  * multiple entries for frequently used relations. The current cache is stored on
+  * the header of the ProcArrayStruct as a convenient place to store shmem data
+  * and could be relocated elsewhere if required.
+  */
+ static void
+ LockSetRecoveryConflictLatestRemovedXidForPartition(int partition, TransactionId latestRemovedXid)
+ {
+ 	Assert(InHotStandby);
+ 
+ 	ProcArraySetLatestRemovedXidCache(partition, latestRemovedXid);
+ }
+ 
+ static TransactionId
+ LockGetRecoveryConflictLatestRemovedXidForPartition(int partition)
+ {
+ 	return ProcArrayGetLatestRemovedXidCache(partition);
+ }
  
  /*
   * LockAcquire -- Check for lock conflicts, sleep if conflict found,
***************
*** 632,637 **** LockAcquireExtended(const LOCKTAG *locktag,
--- 674,685 ----
  		MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
  		MemSet(lock->granted, 0, sizeof(int) * MAX_LOCKMODES);
  		LOCK_PRINT("LockAcquire: new", lock, lockmode);
+ 
+ 		/*
+ 		 * Start the latestRemovedXid from the partition's cached value
+ 		 */
+ 		if (TransactionStartedDuringRecovery())
+ 			lock->latestRemovedXid = LockGetRecoveryConflictLatestRemovedXidForPartition(partition);
  	}
  	else
  	{
***************
*** 642,647 **** LockAcquireExtended(const LOCKTAG *locktag,
--- 690,726 ----
  	}
  
  	/*
+ 	 * If latestRemovedXid is set, resolve any recovery conflict. We do this
+ 	 * here whether we wait or not. If our snapshot conflicts then we will
+ 	 * be cancelled, so there is no need for us to re-check this after we
+ 	 * wake from a wait for the lock.
+ 	 */
+ 	if (TransactionIdIsValid(lock->latestRemovedXid) &&
+ 		TransactionIdIsValid(MyProc->xmin) &&
+ 		TransactionIdFollowsOrEquals(lock->latestRemovedXid, MyProc->xmin))
+ 	{
+ 		if (lock->nRequested == 0)
+ 		{
+ 			/*
+ 			 * There are no other requestors of this lock, so garbage-collect
+ 			 * the lock object.  We *must* do this to avoid a permanent leak
+ 			 * of shared memory, because there won't be anything to cause
+ 			 * anyone to release the lock object later.
+ 			 */
+ 			Assert(SHMQueueEmpty(&(lock->procLocks)));
+ 			if (!hash_search_with_hash_value(LockMethodLockHash,
+ 											 (void *) &(lock->tag),
+ 											 hashcode,
+ 											 HASH_REMOVE,
+ 											 NULL))
+ 				elog(PANIC, "lock table corrupted");
+ 		}
+ 		LWLockRelease(partitionLock);
+ 		ResolveRecoveryConflictWithRemovedTransactionId();
+ 		/* Not reached */
+ 	}
+ 
+ 	/*
  	 * Create the hash key for the proclock table.
  	 */
  	proclocktag.myLock = lock;
***************
*** 1788,1796 **** LockReassignCurrentOwner(void)
   * uses of the result.
   */
  VirtualTransactionId *
! GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
  {
! 	VirtualTransactionId *vxids;
  	LOCKMETHODID lockmethodid = locktag->locktag_lockmethodid;
  	LockMethod	lockMethodTable;
  	LOCK	   *lock;
--- 1867,1875 ----
   * uses of the result.
   */
  VirtualTransactionId *
! GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, TransactionId latestRemovedXid)
  {
! 	static VirtualTransactionId *vxids;
  	LOCKMETHODID lockmethodid = locktag->locktag_lockmethodid;
  	LockMethod	lockMethodTable;
  	LOCK	   *lock;
***************
*** 1798,1803 **** GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
--- 1877,1883 ----
  	SHM_QUEUE  *procLocks;
  	PROCLOCK   *proclock;
  	uint32		hashcode;
+ 	int			partition;
  	LWLockId	partitionLock;
  	int			count = 0;
  
***************
*** 1812,1827 **** GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
  	 * need enough space for MaxBackends + a terminator, since prepared xacts
  	 * don't count.
  	 */
! 	vxids = (VirtualTransactionId *)
! 		palloc0(sizeof(VirtualTransactionId) * (MaxBackends + 1));
  
  	/*
  	 * Look up the lock object matching the tag.
  	 */
  	hashcode = LockTagHashCode(locktag);
  	partitionLock = LockHashPartitionLock(hashcode);
  
! 	LWLockAcquire(partitionLock, LW_SHARED);
  
  	lock = (LOCK *) hash_search_with_hash_value(LockMethodLockHash,
  												(void *) locktag,
--- 1892,1929 ----
  	 * need enough space for MaxBackends + a terminator, since prepared xacts
  	 * don't count.
  	 */
! 	if (!InHotStandby)
! 		vxids = (VirtualTransactionId *)
! 			palloc0(sizeof(VirtualTransactionId) * (MaxBackends + 1));
! 	else
! 	{
! 		if (vxids == NULL)
! 		{
! 			vxids = (VirtualTransactionId *)
! 				malloc(sizeof(VirtualTransactionId) * (MaxBackends + 1));
! 			if (vxids == NULL)
! 				ereport(ERROR,
! 					(errcode(ERRCODE_OUT_OF_MEMORY),
! 					 errmsg("out of memory")));
! 
! 		}
! 	}
  
  	/*
  	 * Look up the lock object matching the tag.
  	 */
  	hashcode = LockTagHashCode(locktag);
+ 	partition = LockHashPartition(hashcode);
  	partitionLock = LockHashPartitionLock(hashcode);
  
! 	/*
! 	 * If we are setting latestRemovedXid we need to get exclusive lock
! 	 * to avoid race conditions.
! 	 */
! 	if (TransactionIdIsValid(latestRemovedXid))
! 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
! 	else
! 		LWLockAcquire(partitionLock, LW_SHARED);
  
  	lock = (LOCK *) hash_search_with_hash_value(LockMethodLockHash,
  												(void *) locktag,
***************
*** 1831,1836 **** GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
--- 1933,1944 ----
  	if (!lock)
  	{
  		/*
+ 		 * Set the latestRemovedXid for the partition.
+ 		 */
+ 		if (TransactionIdIsValid(latestRemovedXid))
+ 			LockSetRecoveryConflictLatestRemovedXidForPartition(partition, latestRemovedXid);
+ 
+ 		/*
  		 * If the lock object doesn't exist, there is nothing holding a lock
  		 * on this lockable object.
  		 */
***************
*** 1839,1844 **** GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
--- 1947,1958 ----
  	}
  
  	/*
+ 	 * Set the latestRemovedXid for the lock
+ 	 */
+ 	if (TransactionIdIsValid(latestRemovedXid))
+ 		lock->latestRemovedXid = latestRemovedXid;
+ 
+ 	/*
  	 * Examine each existing holder (or awaiter) of the lock.
  	 */
  	conflictMask = lockMethodTable->conflictTab[lockmode];
*** a/src/include/storage/lock.h
--- b/src/include/storage/lock.h
***************
*** 312,317 **** typedef struct LOCK
--- 312,323 ----
  	int			nRequested;		/* total of requested[] array */
  	int			granted[MAX_LOCKMODES]; /* counts of granted locks */
  	int			nGranted;		/* total of granted[] array */
+ 
+ 	/*
+ 	 * For Hot Standby, the xid limit for access to this table. Allows deferred
+ 	 * and selective conflict processing based upon the relation id.
+ 	 */
+ 	TransactionId latestRemovedXid;
  } LOCK;
  
  #define LOCK_LOCKMETHOD(lock) ((LOCKMETHODID) (lock).tag.locktag_lockmethodid)
***************
*** 488,494 **** extern void LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks);
  extern void LockReleaseCurrentOwner(void);
  extern void LockReassignCurrentOwner(void);
  extern VirtualTransactionId *GetLockConflicts(const LOCKTAG *locktag,
! 				 LOCKMODE lockmode);
  extern void AtPrepare_Locks(void);
  extern void PostPrepare_Locks(TransactionId xid);
  extern int LockCheckConflicts(LockMethod lockMethodTable,
--- 494,500 ----
  extern void LockReleaseCurrentOwner(void);
  extern void LockReassignCurrentOwner(void);
  extern VirtualTransactionId *GetLockConflicts(const LOCKTAG *locktag,
! 				 LOCKMODE lockmode, TransactionId latestRemovedXid);
  extern void AtPrepare_Locks(void);
  extern void PostPrepare_Locks(TransactionId xid);
  extern int LockCheckConflicts(LockMethod lockMethodTable,
*** a/src/include/storage/proc.h
--- b/src/include/storage/proc.h
***************
*** 102,107 **** struct PGPROC
--- 102,112 ----
  	 */
  	bool		recoveryConflictPending;
  
+ 	TransactionId limitXmin;	/* The latest xid for which conflict processing
+ 								 * has been already performed for this proc.
+ 								 * Updated by snapshot conflicts, reset when
+ 								 * we take a new snapshot. */
+ 
  	/* Info about LWLock the process is currently waiting for, if any. */
  	bool		lwWaiting;		/* true if waiting for an LW lock */
  	bool		lwExclusive;	/* true if waiting for exclusive access */
*** a/src/include/storage/standby.h
--- b/src/include/storage/standby.h
***************
*** 29,34 **** extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
--- 29,38 ----
  extern void ResolveRecoveryConflictWithBufferPin(void);
  extern void SendRecoveryConflictWithBufferPin(void);
  
+ /* Prototypes here rather than in procarray.h */
+ extern void ProcArraySetLatestRemovedXidCache(int partition, TransactionId latestRemovedXid);
+ extern TransactionId ProcArrayGetLatestRemovedXidCache(int partition);
+ 
  /*
   * Standby Rmgr (RM_STANDBY_ID)
   *

#33

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Simon Riggs (#21)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Fri, 2010-01-29 at 15:01 +0000, Simon Riggs wrote:

Putting it back takes time and
given enough of that rare cloth, it will eventually be put back.

Looks like I'll have time to add the starts-at-shutdown-checkpoint item
back in after all.

I'd appreciate it if you could review the relation-specific conflict
patch, 'cos it's still important.

--
Simon Riggs www.2ndQuadrant.com

#34

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 16 years ago

In reply to: Simon Riggs (#33)

Re: Hot Standby: Relation-specific deferred conflict resolution

Simon Riggs wrote:

On Fri, 2010-01-29 at 15:01 +0000, Simon Riggs wrote:

Putting it back takes time and
given enough of that rare cloth, it will eventually be put back.

Looks like I'll have time to add the starts-at-shutdown-checkpoint item
back in after all.

Great! Thank you, much appreciated.

I'd appreciate it if you could review the relation-specific conflict
patch, 'cos it's still important.

One fundamental gripe I have about that approach is that it's hard to
predict when you will be saved by the cache and when your query will be
canceled. For example, the patch stores only one "latestRemovedXid"
value per lock partition. So if you have two tables that hash to
different lock partitions, and are never both accessed in a single
transaction, the cache will save your query every time. So far so good,
but then you do a dump/restore, and the tables happen to be assigned to
the same lock partition. Oops, a system that used to work fine starts to
get "snapshot too old" errors.

It's often better to be consistent and predictable, even if it means
cancelling more queries. I think wë́'d need to have a much more
fine-grained system before it's worthwhile to do deferred resolution.
There's just too much "false sharing" otherwise.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#35

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Heikki Linnakangas (#34)

Re: Hot Standby: Relation-specific deferred conflict resolution

On Tue, 2010-02-02 at 20:27 +0200, Heikki Linnakangas wrote:

I'd appreciate it if you could review the relation-specific conflict
patch, 'cos it's still important.

One fundamental gripe I have about that approach is that it's hard to
predict when you will be saved by the cache and when your query will be
canceled. For example, the patch stores only one "latestRemovedXid"
value per lock partition. So if you have two tables that hash to
different lock partitions, and are never both accessed in a single
transaction, the cache will save your query every time. So far so good,
but then you do a dump/restore, and the tables happen to be assigned to
the same lock partition. Oops, a system that used to work fine starts to
get "snapshot too old" errors.

It's often better to be consistent and predictable, even if it means
cancelling more queries. I think wë́'d need to have a much more
fine-grained system before it's worthwhile to do deferred resolution.
There's just too much "false sharing" otherwise.

ISTM that this is exactly backwards. There is already way too many false
positives and this patch would reduce them very significantly. Plus the
cancelation is hardly predictable since it relies on whether or not a
btree delete takes place during execution and the arrival time and rate
of those is sporadic. There is no safe, predicatable behaviour in the
current code.

The gripe about the cache cannot be a fundamental one, since we can
easily change the size and mechanism by which the cache operates without
changing the patch very much at all.

I am being told this area is a must-fix issue for this release. Tom's
reaction to this issue (on other thread) illustrates that beautifully:

On Sun, 2010-01-31 at 15:41 -0500, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

(snip)

2. no matter if they haven't accessed the index being cleaned (they
might later, is the thinking...)

That seems seriously horrid. What is the rationale for #2 in
particular? I would hope that at worst this would affect sessions
that are actively competing for the index being cleaned.

--
Simon Riggs www.2ndQuadrant.com