How much do the hint bits help?

Started by Merlin Moncureabout 15 years ago44 messages
#1Merlin Moncure
mmoncure@gmail.com
2 attachment(s)

I've been playing around with postgresql hint bits in order to teach
myself more about the internals of the MVCC system.  I noticed that
the hint bit system has been around forever (Vadim era) and predates
several backend improvements that might affect their usefulness.  So I
started playing around, trying to quantify the benefit they provide
with an eye of optimizing clog lookups if it turned out to be
necessary say by mmap-ing a big transaction status file just to see if
that helped.

Attached is an incomplete patch disabling hint bits based on compile
switch.  It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines.  However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around.  Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.

So far, at least doing pgbench runs and another test designed to
exercise clog lookups, the performance loss of always doing full
lookup hasn't materialized.  Note that in these cases the clog lru
cache is pretty effective, and it's pretty likely I may have blown it
in some other way, so take the results for a grain of salt.   But,
here are the following questions/points:

*) relative to when the hint bits where implemented, the amount of
transactions to map has shrunk, while hardware has improved by a
couple of orders of magnitude.  Also the postgres architecture has
changed considerably.  Are they still necessary?

*) what's a good way to stress the clog severely? I'd like to pick a
degenerate case to get a better idea of the way things stand without
them.

*) is there community interest in a full patch that fills in the
missing details not implemented here?

merlin

Attachments:

disble_hints.difftext/x-patch; charset=US-ASCII; name=disble_hints.diffDownload
diff -r -C6 ./src/backend/access/heap/heapam.c ../postgresql-9.0.1_hb2/src/backend/access/heap/heapam.c
*** ./src/backend/access/heap/heapam.c	2010-10-01 10:25:44.000000000 -0400
--- ../postgresql-9.0.1_hb2/src/backend/access/heap/heapam.c	2010-12-16 12:29:20.000000000 -0500
***************
*** 1731,1744 ****
--- 1731,1751 ----
  		if (valid)
  			*tid = ctid;
  
  		/*
  		 * If there's a valid t_ctid link, follow it, else we're done.
  		 */
+ #ifndef DISABLE_HINT_BITS 
  		if ((tp.t_data->t_infomask & (HEAP_XMAX_INVALID | HEAP_IS_LOCKED)) ||
  			ItemPointerEquals(&tp.t_self, &tp.t_data->t_ctid))
+ #else
+ /* this is probalby wrong -- merlin */
+     if ((tp.t_data->t_infomask & (HEAP_IS_LOCKED)) ||
+       ItemPointerEquals(&tp.t_self, &tp.t_data->t_ctid))
+ 
+ #endif 
  		{
  			UnlockReleaseBuffer(buffer);
  			break;
  		}
  
  		ctid = tp.t_data->t_ctid;
***************
*** 1755,1766 ****
--- 1762,1774 ----
   * If the transaction aborted, we guarantee the XMAX_INVALID hint bit will
   * be set on exit.	If the transaction committed, we set the XMAX_COMMITTED
   * hint bit if possible --- but beware that that may not yet be possible,
   * if the transaction committed asynchronously.  Hence callers should look
   * only at XMAX_INVALID.
   */
+ #ifndef DISABLE_HINT_BITS
  static void
  UpdateXmaxHintBits(HeapTupleHeader tuple, Buffer buffer, TransactionId xid)
  {
  	Assert(TransactionIdEquals(HeapTupleHeaderGetXmax(tuple), xid));
  
  	if (!(tuple->t_infomask & (HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID)))
***************
*** 1770,1781 ****
--- 1778,1792 ----
  								 xid);
  		else
  			HeapTupleSetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
  								 InvalidTransactionId);
  	}
  }
+ #else
+ #define UpdateXmaxHintBits(a,b,c) do{;}while(0)  
+ #endif
  
  
  /*
   * GetBulkInsertState - prepare status object for a bulk insert
   */
  BulkInsertState
***************
*** 1861,1873 ****
--- 1872,1886 ----
  		/* check there is not space for an OID */
  		Assert(!(tup->t_data->t_infomask & HEAP_HASOID));
  	}
  
  	tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
  	tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
+ #ifndef DISABLE_HINT_BITS
  	tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
+ #endif
  	HeapTupleHeaderSetXmin(tup->t_data, xid);
  	HeapTupleHeaderSetCmin(tup->t_data, cid);
  	HeapTupleHeaderSetXmax(tup->t_data, 0);		/* for cleanliness */
  	tup->t_tableOid = RelationGetRelid(relation);
  
  	/*
***************
*** 2161,2174 ****
--- 2174,2192 ----
  		}
  
  		/*
  		 * We may overwrite if previous xmax aborted, or if it committed but
  		 * only locked the tuple without updating it.
  		 */
+ #ifndef DISABLE_HINT_BITS
  		if (tp.t_data->t_infomask & (HEAP_XMAX_INVALID |
  									 HEAP_IS_LOCKED))
+ #else
+     if (tp.t_data->t_infomask & HEAP_IS_LOCKED )
+ 
+ #endif
  			result = HeapTupleMayBeUpdated;
  		else
  			result = HeapTupleUpdated;
  	}
  
  	if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
***************
*** 2210,2226 ****
--- 2228,2250 ----
  	{
  		all_visible_cleared = true;
  		PageClearAllVisible(page);
  	}
  
  	/* store transaction information of xact deleting the tuple */
+ #ifndef DISABLE_HINT_BITS
  	tp.t_data->t_infomask &= ~(HEAP_XMAX_COMMITTED |
  							   HEAP_XMAX_INVALID |
  							   HEAP_XMAX_IS_MULTI |
  							   HEAP_IS_LOCKED |
  							   HEAP_MOVED);
+ #else
+   tp.t_data->t_infomask &= ~(HEAP_XMAX_IS_MULTI |
+                  HEAP_IS_LOCKED |
+                  HEAP_MOVED);
+ #endif
  	HeapTupleHeaderClearHotUpdated(tp.t_data);
  	HeapTupleHeaderSetXmax(tp.t_data, xid);
  	HeapTupleHeaderSetCmax(tp.t_data, cid, iscombo);
  	/* Make sure there is no forward chain link in t_ctid */
  	tp.t_data->t_ctid = tp.t_self;
  
***************
*** 2513,2526 ****
--- 2537,2554 ----
  		}
  
  		/*
  		 * We may overwrite if previous xmax aborted, or if it committed but
  		 * only locked the tuple without updating it.
  		 */
+ #ifndef DISABLE_HINT_BITS
  		if (oldtup.t_data->t_infomask & (HEAP_XMAX_INVALID |
  										 HEAP_IS_LOCKED))
+ #else
+     if (oldtup.t_data->t_infomask & (HEAP_IS_LOCKED))
+ #endif
  			result = HeapTupleMayBeUpdated;
  		else
  			result = HeapTupleUpdated;
  	}
  
  	if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
***************
*** 2559,2571 ****
--- 2587,2603 ----
  		/* check there is not space for an OID */
  		Assert(!(newtup->t_data->t_infomask & HEAP_HASOID));
  	}
  
  	newtup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
  	newtup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
+ #ifndef DISABLE_HINT_BITS
  	newtup->t_data->t_infomask |= (HEAP_XMAX_INVALID | HEAP_UPDATED);
+ #else
+   newtup->t_data->t_infomask |= (HEAP_UPDATED);
+ #endif
  	HeapTupleHeaderSetXmin(newtup->t_data, xid);
  	HeapTupleHeaderSetCmin(newtup->t_data, cid);
  	HeapTupleHeaderSetXmax(newtup->t_data, 0);	/* for cleanliness */
  	newtup->t_tableOid = RelationGetRelid(relation);
  
  	/*
***************
*** 2601,2614 ****
--- 2633,2650 ----
  
  	newtupsize = MAXALIGN(newtup->t_len);
  
  	if (need_toast || newtupsize > pagefree)
  	{
  		/* Clear obsolete visibility flags ... */
+ #ifndef DISABLE_HINT_BITS
  		oldtup.t_data->t_infomask &= ~(HEAP_XMAX_COMMITTED |
  									   HEAP_XMAX_INVALID |
+ #else
+     oldtup.t_data->t_infomask &= ~(
+ #endif
  									   HEAP_XMAX_IS_MULTI |
  									   HEAP_IS_LOCKED |
  									   HEAP_MOVED);
  		HeapTupleClearHotUpdated(&oldtup);
  		/* ... and store info about transaction updating this tuple */
  		HeapTupleHeaderSetXmax(oldtup.t_data, xid);
***************
*** 2747,2760 ****
--- 2783,2800 ----
  
  	RelationPutHeapTuple(relation, newbuf, heaptup);	/* insert new tuple */
  
  	if (!already_marked)
  	{
  		/* Clear obsolete visibility flags ... */
+ #ifndef DISABLE_HINT_BITS
  		oldtup.t_data->t_infomask &= ~(HEAP_XMAX_COMMITTED |
  									   HEAP_XMAX_INVALID |
+ #else
+     oldtup.t_data->t_infomask &= ~(
+ #endif
  									   HEAP_XMAX_IS_MULTI |
  									   HEAP_IS_LOCKED |
  									   HEAP_MOVED);
  		/* ... and store info about transaction updating this tuple */
  		HeapTupleHeaderSetXmax(oldtup.t_data, xid);
  		HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo);
***************
*** 3237,3250 ****
--- 3277,3294 ----
  		/*
  		 * We may lock if previous xmax aborted, or if it committed but only
  		 * locked the tuple without updating it.  The case where we didn't
  		 * wait because we are joining an existing shared lock is correctly
  		 * handled, too.
  		 */
+ #ifndef DISABLE_HINT_BITS
  		if (tuple->t_data->t_infomask & (HEAP_XMAX_INVALID |
  										 HEAP_IS_LOCKED))
+ #else
+     if (tuple->t_data->t_infomask & (HEAP_IS_LOCKED))
+ #endif
  			result = HeapTupleMayBeUpdated;
  		else
  			result = HeapTupleUpdated;
  	}
  
  	if (result != HeapTupleMayBeUpdated)
***************
*** 3269,3283 ****
  	 * Note in particular that this covers the case where we already hold
  	 * exclusive lock on the tuple and the caller only wants shared lock. It
  	 * would certainly not do to give up the exclusive lock.
  	 */
  	xmax = HeapTupleHeaderGetXmax(tuple->t_data);
  	old_infomask = tuple->t_data->t_infomask;
! 
  	if (!(old_infomask & (HEAP_XMAX_INVALID |
  						  HEAP_XMAX_COMMITTED |
  						  HEAP_XMAX_IS_MULTI)) &&
  		(mode == LockTupleShared ?
  		 (old_infomask & HEAP_IS_LOCKED) :
  		 (old_infomask & HEAP_XMAX_EXCL_LOCK)) &&
  		TransactionIdIsCurrentTransactionId(xmax))
  	{
--- 3313,3330 ----
  	 * Note in particular that this covers the case where we already hold
  	 * exclusive lock on the tuple and the caller only wants shared lock. It
  	 * would certainly not do to give up the exclusive lock.
  	 */
  	xmax = HeapTupleHeaderGetXmax(tuple->t_data);
  	old_infomask = tuple->t_data->t_infomask;
! #ifndef DISABLE_HINT_BITS
  	if (!(old_infomask & (HEAP_XMAX_INVALID |
  						  HEAP_XMAX_COMMITTED |
+ #else
+   if (!(old_infomask & (
+ #endif
  						  HEAP_XMAX_IS_MULTI)) &&
  		(mode == LockTupleShared ?
  		 (old_infomask & HEAP_IS_LOCKED) :
  		 (old_infomask & HEAP_XMAX_EXCL_LOCK)) &&
  		TransactionIdIsCurrentTransactionId(xmax))
  	{
***************
*** 3291,3305 ****
  	/*
  	 * Compute the new xmax and infomask to store into the tuple.  Note we do
  	 * not modify the tuple just yet, because that would leave it in the wrong
  	 * state if multixact.c elogs.
  	 */
  	xid = GetCurrentTransactionId();
! 
  	new_infomask = old_infomask & ~(HEAP_XMAX_COMMITTED |
  									HEAP_XMAX_INVALID |
  									HEAP_XMAX_IS_MULTI |
  									HEAP_IS_LOCKED |
  									HEAP_MOVED);
  
  	if (mode == LockTupleShared)
  	{
--- 3338,3355 ----
  	/*
  	 * Compute the new xmax and infomask to store into the tuple.  Note we do
  	 * not modify the tuple just yet, because that would leave it in the wrong
  	 * state if multixact.c elogs.
  	 */
  	xid = GetCurrentTransactionId();
! #ifndef DISABLE_HINT_BITS
  	new_infomask = old_infomask & ~(HEAP_XMAX_COMMITTED |
  									HEAP_XMAX_INVALID |
+ #else
+   new_infomask = old_infomask & ~(
+ #endif
  									HEAP_XMAX_IS_MULTI |
  									HEAP_IS_LOCKED |
  									HEAP_MOVED);
  
  	if (mode == LockTupleShared)
  	{
***************
*** 3327,3340 ****
--- 3377,3392 ----
  		 *
  		 * There is a similar race condition possible when the old xmax was a
  		 * regular TransactionId.  We test TransactionIdIsInProgress again
  		 * just to narrow the window, but it's still possible to end up
  		 * creating an unnecessary MultiXactId.  Fortunately this is harmless.
  		 */
+ #ifndef DISABLE_HINT_BITS
  		if (!(old_infomask & (HEAP_XMAX_INVALID | HEAP_XMAX_COMMITTED)))
  		{
+ #endif
  			if (old_infomask & HEAP_XMAX_IS_MULTI)
  			{
  				/*
  				 * If the XMAX is already a MultiXactId, then we need to
  				 * expand it to include our own TransactionId.
  				 */
***************
*** 3357,3376 ****
--- 3409,3430 ----
  				 * Can get here iff HeapTupleSatisfiesUpdate saw the old xmax
  				 * as running, but it finished before
  				 * TransactionIdIsInProgress() got to run.	Treat it like
  				 * there's no locker in the tuple.
  				 */
  			}
+ #ifndef DISABLE_HINT_BITS
  		}
  		else
  		{
  			/*
  			 * There was no previous locker, so just insert our own
  			 * TransactionId.
  			 */
  		}
+ #endif
  	}
  	else
  	{
  		/* We want an exclusive lock on the tuple */
  		new_infomask |= HEAP_XMAX_EXCL_LOCK;
  	}
***************
*** 3598,3611 ****
--- 3652,3667 ----
  		HeapTupleHeaderSetXmin(tuple, FrozenTransactionId);
  
  		/*
  		 * Might as well fix the hint bits too; usually XMIN_COMMITTED will
  		 * already be set here, but there's a small chance not.
  		 */
+ #ifndef DISABLE_HINT_BITS
  		Assert(!(tuple->t_infomask & HEAP_XMIN_INVALID));
  		tuple->t_infomask |= HEAP_XMIN_COMMITTED;
+ #endif
  		changed = true;
  	}
  
  	/*
  	 * When we release shared lock, it's possible for someone else to change
  	 * xmax before we get the lock back, so repeat the check after acquiring
***************
*** 3632,3645 ****
--- 3688,3703 ----
  
  			/*
  			 * The tuple might be marked either XMAX_INVALID or XMAX_COMMITTED
  			 * + LOCKED.  Normalize to INVALID just to be sure no one gets
  			 * confused.
  			 */
+ #ifndef DISABLE_HINT_BITS
  			tuple->t_infomask &= ~HEAP_XMAX_COMMITTED;
  			tuple->t_infomask |= HEAP_XMAX_INVALID;
+ #endif
  			HeapTupleHeaderClearHotUpdated(tuple);
  			changed = true;
  		}
  	}
  	else
  	{
***************
*** 3697,3710 ****
--- 3755,3770 ----
  				HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
  
  			/*
  			 * Might as well fix the hint bits too; usually XMIN_COMMITTED
  			 * will already be set here, but there's a small chance not.
  			 */
+ #ifndef DISABLE_HINT_BITS
  			Assert(!(tuple->t_infomask & HEAP_XMIN_INVALID));
  			tuple->t_infomask |= HEAP_XMIN_COMMITTED;
+ #endif
  			changed = true;
  		}
  	}
  
  	return changed;
  }
***************
*** 4332,4345 ****
--- 4392,4409 ----
  
  	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
  		elog(PANIC, "heap_delete_redo: invalid lp");
  
  	htup = (HeapTupleHeader) PageGetItem(page, lp);
  
+ #ifndef DISABLE_HINT_BITS
  	htup->t_infomask &= ~(HEAP_XMAX_COMMITTED |
  						  HEAP_XMAX_INVALID |
+ #else
+   htup->t_infomask &= ~(
+ #endif
  						  HEAP_XMAX_IS_MULTI |
  						  HEAP_IS_LOCKED |
  						  HEAP_MOVED);
  	HeapTupleHeaderClearHotUpdated(htup);
  	HeapTupleHeaderSetXmax(htup, record->xl_xid);
  	HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
***************
*** 4533,4546 ****
--- 4597,4614 ----
  
  	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
  		elog(PANIC, "heap_update_redo: invalid lp");
  
  	htup = (HeapTupleHeader) PageGetItem(page, lp);
  
+ #ifndef DISABLE_HINT_BITS
  	htup->t_infomask &= ~(HEAP_XMAX_COMMITTED |
  						  HEAP_XMAX_INVALID |
+ #else
+   htup->t_infomask &= ~(
+ #endif
  						  HEAP_XMAX_IS_MULTI |
  						  HEAP_IS_LOCKED |
  						  HEAP_MOVED);
  	if (hot_update)
  		HeapTupleHeaderSetHotUpdated(htup);
  	else
***************
*** 4707,4720 ****
--- 4775,4792 ----
  
  	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
  		elog(PANIC, "heap_lock_redo: invalid lp");
  
  	htup = (HeapTupleHeader) PageGetItem(page, lp);
  
+ #ifndef DISABLE_HINT_BITS
  	htup->t_infomask &= ~(HEAP_XMAX_COMMITTED |
  						  HEAP_XMAX_INVALID |
+ #else
+   htup->t_infomask &= ~(
+ #endif
  						  HEAP_XMAX_IS_MULTI |
  						  HEAP_IS_LOCKED |
  						  HEAP_MOVED);
  	if (xlrec->xid_is_mxact)
  		htup->t_infomask |= HEAP_XMAX_IS_MULTI;
  	if (xlrec->shared_lock)
diff -r -C6 ./src/backend/access/heap/rewriteheap.c ../postgresql-9.0.1_hb2/src/backend/access/heap/rewriteheap.c
*** ./src/backend/access/heap/rewriteheap.c	2010-10-01 10:25:44.000000000 -0400
--- ../postgresql-9.0.1_hb2/src/backend/access/heap/rewriteheap.c	2010-12-16 12:30:06.000000000 -0500
***************
*** 351,363 ****
--- 351,367 ----
  	 */
  	ItemPointerSetInvalid(&new_tuple->t_data->t_ctid);
  
  	/*
  	 * If the tuple has been updated, check the old-to-new mapping hash table.
  	 */
+ #ifndef DISABLE_HINT_BITS
  	if (!(old_tuple->t_data->t_infomask & (HEAP_XMAX_INVALID |
+ #else
+ if (!(old_tuple->t_data->t_infomask & (
+ #endif
  										   HEAP_IS_LOCKED)) &&
  		!(ItemPointerEquals(&(old_tuple->t_self),
  							&(old_tuple->t_data->t_ctid))))
  	{
  		OldToNewMapping mapping;
  
diff -r -C6 ./src/backend/commands/sequence.c ../postgresql-9.0.1_hb2/src/backend/commands/sequence.c
*** ./src/backend/commands/sequence.c	2010-10-01 10:25:44.000000000 -0400
--- ../postgresql-9.0.1_hb2/src/backend/commands/sequence.c	2010-12-16 12:31:09.000000000 -0500
***************
*** 260,275 ****
  		Item		item;
  
  		itemId = PageGetItemId((Page) page, FirstOffsetNumber);
  		item = PageGetItem((Page) page, itemId);
  
  		HeapTupleHeaderSetXmin((HeapTupleHeader) item, FrozenTransactionId);
  		((HeapTupleHeader) item)->t_infomask |= HEAP_XMIN_COMMITTED;
! 
  		HeapTupleHeaderSetXmin(tuple->t_data, FrozenTransactionId);
  		tuple->t_data->t_infomask |= HEAP_XMIN_COMMITTED;
  	}
  
  	MarkBufferDirty(buf);
  
  	/* XLOG stuff */
  	if (!rel->rd_istemp)
--- 260,278 ----
  		Item		item;
  
  		itemId = PageGetItemId((Page) page, FirstOffsetNumber);
  		item = PageGetItem((Page) page, itemId);
  
  		HeapTupleHeaderSetXmin((HeapTupleHeader) item, FrozenTransactionId);
+ #ifndef DISABLE_HINT_BITS
  		((HeapTupleHeader) item)->t_infomask |= HEAP_XMIN_COMMITTED;
! #endif
  		HeapTupleHeaderSetXmin(tuple->t_data, FrozenTransactionId);
+ #ifndef DISABLE_HINT_BITS
  		tuple->t_data->t_infomask |= HEAP_XMIN_COMMITTED;
+ #endif
  	}
  
  	MarkBufferDirty(buf);
  
  	/* XLOG stuff */
  	if (!rel->rd_istemp)
diff -r -C6 ./src/backend/commands/vacuumlazy.c ../postgresql-9.0.1_hb2/src/backend/commands/vacuumlazy.c
*** ./src/backend/commands/vacuumlazy.c	2010-10-01 10:25:44.000000000 -0400
--- ../postgresql-9.0.1_hb2/src/backend/commands/vacuumlazy.c	2010-12-16 15:24:59.000000000 -0500
***************
*** 591,613 ****
  					 * that.
  					 */
  					if (all_visible)
  					{
  						TransactionId xmin;
  
  						if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))
  						{
  							all_visible = false;
  							break;
  						}
  
! 						/*
! 						 * The inserter definitely committed. But is it old
! 						 * enough that everyone sees it as committed?
! 						 */
  						xmin = HeapTupleHeaderGetXmin(tuple.t_data);
  						if (!TransactionIdPrecedes(xmin, OldestXmin))
  						{
  							all_visible = false;
  							break;
  						}
  					}
--- 591,622 ----
  					 * that.
  					 */
  					if (all_visible)
  					{
  						TransactionId xmin;
  
+ #ifndef DISABLE_HINT_BITS
  						if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))
  						{
  							all_visible = false;
  							break;
  						}
  
!             /*
!              * The inserter definitely committed. But is it old
!              * enough that everyone sees it as committed?
!              */
!             xmin = HeapTupleHeaderGetXmin(tuple.t_data);
! #else
  						xmin = HeapTupleHeaderGetXmin(tuple.t_data);
+ 						if (!TransactionIdDidCommit(xmin))
+ 						{
+               all_visible = false;
+               break;
+ 						}
+ #endif
  						if (!TransactionIdPrecedes(xmin, OldestXmin))
  						{
  							all_visible = false;
  							break;
  						}
  					}
diff -r -C6 ./src/backend/utils/time/combocid.c ../postgresql-9.0.1_hb2/src/backend/utils/time/combocid.c
*** ./src/backend/utils/time/combocid.c	2010-10-01 10:25:44.000000000 -0400
--- ../postgresql-9.0.1_hb2/src/backend/utils/time/combocid.c	2010-12-16 12:35:59.000000000 -0500
***************
*** 149,161 ****
--- 149,165 ----
  	/*
  	 * If we're marking a tuple deleted that was inserted by (any
  	 * subtransaction of) our transaction, we need to use a combo command id.
  	 * Test for HEAP_XMIN_COMMITTED first, because it's cheaper than a
  	 * TransactionIdIsCurrentTransactionId call.
  	 */
+ #ifndef DISABLE_HINT_BITS
  	if (!(tup->t_infomask & HEAP_XMIN_COMMITTED) &&
+ #else
+   if (!(0) &&
+ #endif
  		TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tup)))
  	{
  		CommandId	cmin = HeapTupleHeaderGetCmin(tup);
  
  		*cmax = GetComboCommandId(cmin, *cmax);
  		*iscombo = true;
diff -r -C6 ./src/backend/utils/time/tqual.c ../postgresql-9.0.1_hb2/src/backend/utils/time/tqual.c
*** ./src/backend/utils/time/tqual.c	2010-10-01 10:25:44.000000000 -0400
--- ../postgresql-9.0.1_hb2/src/backend/utils/time/tqual.c	2010-12-16 12:38:24.000000000 -0500
***************
*** 101,112 ****
--- 101,113 ----
   * Normal commits may be asynchronous, so for those we need to get the LSN
   * of the transaction and then check whether this is flushed.
   *
   * The caller should pass xid as the XID of the transaction to check, or
   * InvalidTransactionId if no check is needed.
   */
+ #ifndef DISABLE_HINT_BITS
  static inline void
  SetHintBits(HeapTupleHeader tuple, Buffer buffer,
  			uint16 infomask, TransactionId xid)
  {
  	if (TransactionIdIsValid(xid))
  	{
***************
*** 130,142 ****
--- 131,147 ----
  void
  HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
  					 uint16 infomask, TransactionId xid)
  {
  	SetHintBits(tuple, buffer, infomask, xid);
  }
+ #else
  
+ #define SetHintBits(a,b,c,d) do{;}while(0)
+ 
+ #endif
  
  /*
   * HeapTupleSatisfiesSelf
   *		True iff heap tuple is valid "for itself".
   *
   *	Here, we consider the effects of:
***************
*** 159,175 ****
   *			(Xmax != my-transaction &&			the row was deleted by another transaction
   *			 Xmax is not committed)))			that has not been committed
   */
  bool
  HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  {
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return false;
! 
  		/* Used by pre-9.0 binary upgrades */
  		if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
--- 164,181 ----
   *			(Xmax != my-transaction &&			the row was deleted by another transaction
   *			 Xmax is not committed)))			that has not been committed
   */
  bool
  HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  {
+ #ifndef DISABLE_HINT_BITS
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return false;
! #endif
  		/* Used by pre-9.0 binary upgrades */
  		if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
***************
*** 205,219 ****
  					return false;
  				}
  			}
  		}
  		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
  		{
  			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
  				return true;
! 
  			if (tuple->t_infomask & HEAP_IS_LOCKED)		/* not deleter */
  				return true;
  
  			Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
  
  			if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
--- 211,226 ----
  					return false;
  				}
  			}
  		}
  		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
  		{
+ #ifndef DISABLE_HINT_BITS
  			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
  				return true;
! #endif
  			if (tuple->t_infomask & HEAP_IS_LOCKED)		/* not deleter */
  				return true;
  
  			Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
  
  			if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
***************
*** 235,259 ****
  		{
  			/* it must have aborted or crashed */
  			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
  						InvalidTransactionId);
  			return false;
  		}
  	}
  
  	/* by here, the inserting transaction has committed */
- 
  	if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid or aborted */
  		return true;
  
  	if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
  	{
  		if (tuple->t_infomask & HEAP_IS_LOCKED)
  			return true;
  		return false;			/* updated by other */
  	}
  
  	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  	{
  		/* MultiXacts are currently only allowed to lock tuples */
  		Assert(tuple->t_infomask & HEAP_IS_LOCKED);
  		return true;
--- 242,267 ----
  		{
  			/* it must have aborted or crashed */
  			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
  						InvalidTransactionId);
  			return false;
  		}
+ #ifndef DISABLE_HINT_BITS
  	}
  
  	/* by here, the inserting transaction has committed */
  	if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid or aborted */
  		return true;
  
  	if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
  	{
  		if (tuple->t_infomask & HEAP_IS_LOCKED)
  			return true;
  		return false;			/* updated by other */
  	}
+ #endif
  
  	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  	{
  		/* MultiXacts are currently only allowed to lock tuples */
  		Assert(tuple->t_infomask & HEAP_IS_LOCKED);
  		return true;
***************
*** 282,294 ****
  	if (tuple->t_infomask & HEAP_IS_LOCKED)
  	{
  		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
  					InvalidTransactionId);
  		return true;
  	}
- 
  	SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
  				HeapTupleHeaderGetXmax(tuple));
  	return false;
  }
  
  /*
--- 290,301 ----
***************
*** 332,348 ****
   *		the serializability guarantees we provide don't extend to xacts
   *		that do catalog accesses.  this is unfortunate, but not critical.
   */
  bool
  HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  {
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return false;
! 
  		/* Used by pre-9.0 binary upgrades */
  		if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
--- 339,356 ----
   *		the serializability guarantees we provide don't extend to xacts
   *		that do catalog accesses.  this is unfortunate, but not critical.
   */
  bool
  HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  {
+ #ifndef DISABLE_HINT_BITS
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return false;
! #endif
  		/* Used by pre-9.0 binary upgrades */
  		if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
***************
*** 380,395 ****
  			}
  		}
  		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
  		{
  			if (HeapTupleHeaderGetCmin(tuple) >= GetCurrentCommandId(false))
  				return false;	/* inserted after scan started */
! 
  			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
  				return true;
! 
  			if (tuple->t_infomask & HEAP_IS_LOCKED)		/* not deleter */
  				return true;
  
  			Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
  
  			if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
--- 388,403 ----
  			}
  		}
  		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
  		{
  			if (HeapTupleHeaderGetCmin(tuple) >= GetCurrentCommandId(false))
  				return false;	/* inserted after scan started */
! #ifndef DISABLE_HINT_BITS
  			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
  				return true;
! #endif
  			if (tuple->t_infomask & HEAP_IS_LOCKED)		/* not deleter */
  				return true;
  
  			Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
  
  			if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
***************
*** 414,425 ****
--- 422,434 ----
  		{
  			/* it must have aborted or crashed */
  			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
  						InvalidTransactionId);
  			return false;
  		}
+ #ifndef DISABLE_HINT_BITS
  	}
  
  	/* by here, the inserting transaction has committed */
  
  	if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid or aborted */
  		return true;
***************
*** 427,438 ****
--- 436,448 ----
  	if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
  	{
  		if (tuple->t_infomask & HEAP_IS_LOCKED)
  			return true;
  		return false;
  	}
+ #endif
  
  	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  	{
  		/* MultiXacts are currently only allowed to lock tuples */
  		Assert(tuple->t_infomask & HEAP_IS_LOCKED);
  		return true;
***************
*** 457,476 ****
  		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
  					InvalidTransactionId);
  		return true;
  	}
  
  	/* xmax transaction committed */
! 
  	if (tuple->t_infomask & HEAP_IS_LOCKED)
  	{
  		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
  					InvalidTransactionId);
  		return true;
  	}
! 
  	SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
  				HeapTupleHeaderGetXmax(tuple));
  	return false;
  }
  
  /*
--- 467,486 ----
  		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
  					InvalidTransactionId);
  		return true;
  	}
  
  	/* xmax transaction committed */
! #ifndef DISABLE_HINT_BITS
  	if (tuple->t_infomask & HEAP_IS_LOCKED)
  	{
  		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
  					InvalidTransactionId);
  		return true;
  	}
! #endif
  	SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
  				HeapTupleHeaderGetXmax(tuple));
  	return false;
  }
  
  /*
***************
*** 498,514 ****
   * table.
   */
  bool
  HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
  						Buffer buffer)
  {
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return false;
! 
  		/* Used by pre-9.0 binary upgrades */
  		if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
--- 508,525 ----
   * table.
   */
  bool
  HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
  						Buffer buffer)
  {
+ #ifndef DISABLE_HINT_BITS
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return false;
! #endif
  		/* Used by pre-9.0 binary upgrades */
  		if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
***************
*** 542,554 ****
--- 553,567 ----
  					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
  								InvalidTransactionId);
  					return false;
  				}
  			}
  		}
+ #ifndef DISABLE_HINT_BITS
  	}
+ #endif
  
  	/* otherwise assume the tuple is valid for TOAST. */
  	return true;
  }
  
  /*
***************
*** 579,595 ****
   *	distinguish that case must test for it themselves.)
   */
  HTSU_Result
  HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
  						 Buffer buffer)
  {
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return HeapTupleInvisible;
! 
  		/* Used by pre-9.0 binary upgrades */
  		if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
--- 592,609 ----
   *	distinguish that case must test for it themselves.)
   */
  HTSU_Result
  HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
  						 Buffer buffer)
  {
+ #ifndef DISABLE_HINT_BITS
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return HeapTupleInvisible;
! #endif
  		/* Used by pre-9.0 binary upgrades */
  		if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
***************
*** 627,642 ****
  			}
  		}
  		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
  		{
  			if (HeapTupleHeaderGetCmin(tuple) >= curcid)
  				return HeapTupleInvisible;		/* inserted after scan started */
! 
  			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
  				return HeapTupleMayBeUpdated;
! 
  			if (tuple->t_infomask & HEAP_IS_LOCKED)		/* not deleter */
  				return HeapTupleMayBeUpdated;
  
  			Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
  
  			if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
--- 641,656 ----
  			}
  		}
  		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
  		{
  			if (HeapTupleHeaderGetCmin(tuple) >= curcid)
  				return HeapTupleInvisible;		/* inserted after scan started */
! #ifndef DISABLE_HINT_BITS
  			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
  				return HeapTupleMayBeUpdated;
! #endif
  			if (tuple->t_infomask & HEAP_IS_LOCKED)		/* not deleter */
  				return HeapTupleMayBeUpdated;
  
  			Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
  
  			if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
***************
*** 661,672 ****
--- 675,687 ----
  		{
  			/* it must have aborted or crashed */
  			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
  						InvalidTransactionId);
  			return HeapTupleInvisible;
  		}
+ #ifndef DISABLE_HINT_BITS
  	}
  
  	/* by here, the inserting transaction has committed */
  
  	if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid or aborted */
  		return HeapTupleMayBeUpdated;
***************
*** 674,686 ****
  	if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
  	{
  		if (tuple->t_infomask & HEAP_IS_LOCKED)
  			return HeapTupleMayBeUpdated;
  		return HeapTupleUpdated;	/* updated by other */
  	}
! 
  	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  	{
  		/* MultiXacts are currently only allowed to lock tuples */
  		Assert(tuple->t_infomask & HEAP_IS_LOCKED);
  
  		if (MultiXactIdIsRunning(HeapTupleHeaderGetXmax(tuple)))
--- 689,701 ----
  	if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
  	{
  		if (tuple->t_infomask & HEAP_IS_LOCKED)
  			return HeapTupleMayBeUpdated;
  		return HeapTupleUpdated;	/* updated by other */
  	}
! #endif
  	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  	{
  		/* MultiXacts are currently only allowed to lock tuples */
  		Assert(tuple->t_infomask & HEAP_IS_LOCKED);
  
  		if (MultiXactIdIsRunning(HeapTupleHeaderGetXmax(tuple)))
***************
*** 747,764 ****
   */
  bool
  HeapTupleSatisfiesDirty(HeapTupleHeader tuple, Snapshot snapshot,
  						Buffer buffer)
  {
  	snapshot->xmin = snapshot->xmax = InvalidTransactionId;
! 
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return false;
! 
  		/* Used by pre-9.0 binary upgrades */
  		if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
--- 762,779 ----
   */
  bool
  HeapTupleSatisfiesDirty(HeapTupleHeader tuple, Snapshot snapshot,
  						Buffer buffer)
  {
  	snapshot->xmin = snapshot->xmax = InvalidTransactionId;
! #ifndef DISABLE_HINT_BITS
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return false;
! #endif
  		/* Used by pre-9.0 binary upgrades */
  		if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
***************
*** 794,811 ****
  					return false;
  				}
  			}
  		}
  		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
  		{
  			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
  				return true;
  
  			if (tuple->t_infomask & HEAP_IS_LOCKED)		/* not deleter */
  				return true;
! 
  			Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
  
  			if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
  			{
  				/* deleting subtransaction must have aborted */
  				SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
--- 809,827 ----
  					return false;
  				}
  			}
  		}
  		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
  		{
+ #ifndef DISABLE_HINT_BITS
  			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
  				return true;
  
  			if (tuple->t_infomask & HEAP_IS_LOCKED)		/* not deleter */
  				return true;
! #endif
  			Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
  
  			if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
  			{
  				/* deleting subtransaction must have aborted */
  				SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
***************
*** 828,839 ****
--- 844,856 ----
  		{
  			/* it must have aborted or crashed */
  			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
  						InvalidTransactionId);
  			return false;
  		}
+ #ifndef DISABLE_HINT_BITS
  	}
  
  	/* by here, the inserting transaction has committed */
  
  	if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid or aborted */
  		return true;
***************
*** 841,853 ****
  	if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
  	{
  		if (tuple->t_infomask & HEAP_IS_LOCKED)
  			return true;
  		return false;			/* updated by other */
  	}
! 
  	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  	{
  		/* MultiXacts are currently only allowed to lock tuples */
  		Assert(tuple->t_infomask & HEAP_IS_LOCKED);
  		return true;
  	}
--- 858,870 ----
  	if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
  	{
  		if (tuple->t_infomask & HEAP_IS_LOCKED)
  			return true;
  		return false;			/* updated by other */
  	}
! #endif
  	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  	{
  		/* MultiXacts are currently only allowed to lock tuples */
  		Assert(tuple->t_infomask & HEAP_IS_LOCKED);
  		return true;
  	}
***************
*** 909,925 ****
   * can't see it.)
   */
  bool
  HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
  					   Buffer buffer)
  {
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return false;
! 
  		/* Used by pre-9.0 binary upgrades */
  		if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
--- 926,943 ----
   * can't see it.)
   */
  bool
  HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
  					   Buffer buffer)
  {
+ #ifndef DISABLE_HINT_BITS
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return false;
! #endif
  		/* Used by pre-9.0 binary upgrades */
  		if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
***************
*** 957,972 ****
  			}
  		}
  		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
  		{
  			if (HeapTupleHeaderGetCmin(tuple) >= snapshot->curcid)
  				return false;	/* inserted after scan started */
! 
  			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
  				return true;
! 
  			if (tuple->t_infomask & HEAP_IS_LOCKED)		/* not deleter */
  				return true;
  
  			Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
  
  			if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
--- 975,990 ----
  			}
  		}
  		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
  		{
  			if (HeapTupleHeaderGetCmin(tuple) >= snapshot->curcid)
  				return false;	/* inserted after scan started */
! #ifndef DISABLE_HINT_BITS
  			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
  				return true;
! #endif
  			if (tuple->t_infomask & HEAP_IS_LOCKED)		/* not deleter */
  				return true;
  
  			Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
  
  			if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
***************
*** 991,1026 ****
  		{
  			/* it must have aborted or crashed */
  			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
  						InvalidTransactionId);
  			return false;
  		}
  	}
  
  	/*
  	 * By here, the inserting transaction has committed - have to check
  	 * when...
  	 */
  	if (XidInMVCCSnapshot(HeapTupleHeaderGetXmin(tuple), snapshot))
  		return false;			/* treat as still in progress */
! 
  	if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid or aborted */
  		return true;
! 
  	if (tuple->t_infomask & HEAP_IS_LOCKED)
  		return true;
  
  	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  	{
  		/* MultiXacts are currently only allowed to lock tuples */
  		Assert(tuple->t_infomask & HEAP_IS_LOCKED);
  		return true;
  	}
! 
  	if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
  	{
  		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
  		{
  			if (HeapTupleHeaderGetCmax(tuple) >= snapshot->curcid)
  				return true;	/* deleted after scan started */
  			else
  				return false;	/* deleted before scan started */
--- 1009,1047 ----
  		{
  			/* it must have aborted or crashed */
  			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
  						InvalidTransactionId);
  			return false;
  		}
+ #ifndef DISABLE_HINT_BITS
  	}
+ #endif
  
  	/*
  	 * By here, the inserting transaction has committed - have to check
  	 * when...
  	 */
  	if (XidInMVCCSnapshot(HeapTupleHeaderGetXmin(tuple), snapshot))
  		return false;			/* treat as still in progress */
! #ifndef DISABLE_HINT_BITS
  	if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid or aborted */
  		return true;
! #endif
  	if (tuple->t_infomask & HEAP_IS_LOCKED)
  		return true;
  
  	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  	{
  		/* MultiXacts are currently only allowed to lock tuples */
  		Assert(tuple->t_infomask & HEAP_IS_LOCKED);
  		return true;
  	}
! #ifndef DISABLE_HINT_BITS
  	if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
  	{
+ #endif
  		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
  		{
  			if (HeapTupleHeaderGetCmax(tuple) >= snapshot->curcid)
  				return true;	/* deleted after scan started */
  			else
  				return false;	/* deleted before scan started */
***************
*** 1037,1049 ****
--- 1058,1072 ----
  			return true;
  		}
  
  		/* xmax transaction committed */
  		SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
  					HeapTupleHeaderGetXmax(tuple));
+ #ifndef DISABLE_HINT_BITS
  	}
+ #endif
  
  	/*
  	 * OK, the deleting transaction committed too ... but when?
  	 */
  	if (XidInMVCCSnapshot(HeapTupleHeaderGetXmax(tuple), snapshot))
  		return true;			/* treat as still in progress */
***************
*** 1071,1088 ****
  	/*
  	 * Has inserting transaction committed?
  	 *
  	 * If the inserting transaction aborted, then the tuple was never visible
  	 * to any other transaction, so we can delete it immediately.
  	 */
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return HEAPTUPLE_DEAD;
! 		/* Used by pre-9.0 binary upgrades */
! 		else if (tuple->t_infomask & HEAP_MOVED_OFF)
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
  				return HEAPTUPLE_DELETE_IN_PROGRESS;
  			if (TransactionIdIsInProgress(xvac))
--- 1094,1115 ----
  	/*
  	 * Has inserting transaction committed?
  	 *
  	 * If the inserting transaction aborted, then the tuple was never visible
  	 * to any other transaction, so we can delete it immediately.
  	 */
+ #ifndef DISABLE_HINT_BITS
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
  			return HEAPTUPLE_DEAD;
!     /* Used by pre-9.0 binary upgrades */
!     else if (tuple->t_infomask & HEAP_MOVED_OFF)
! #else
! 		if (tuple->t_infomask & HEAP_MOVED_OFF)
! #endif
  		{
  			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
  				return HEAPTUPLE_DELETE_IN_PROGRESS;
  			if (TransactionIdIsInProgress(xvac))
***************
*** 1114,1127 ****
--- 1141,1156 ----
  							InvalidTransactionId);
  				return HEAPTUPLE_DEAD;
  			}
  		}
  		else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
  		{
+ #ifndef DISABLE_HINT_BITS
  			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
  				return HEAPTUPLE_INSERT_IN_PROGRESS;
+ #endif
  			if (tuple->t_infomask & HEAP_IS_LOCKED)
  				return HEAPTUPLE_INSERT_IN_PROGRESS;
  			/* inserted and then deleted by same xact */
  			return HEAPTUPLE_DELETE_IN_PROGRESS;
  		}
  		else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
***************
*** 1139,1171 ****
  
  		/*
  		 * At this point the xmin is known committed, but we might not have
  		 * been able to set the hint bit yet; so we can no longer Assert that
  		 * it's set.
  		 */
  	}
  
  	/*
  	 * Okay, the inserter committed, so it was good at some point.	Now what
  	 * about the deleting transaction?
  	 */
  	if (tuple->t_infomask & HEAP_XMAX_INVALID)
  		return HEAPTUPLE_LIVE;
! 
  	if (tuple->t_infomask & HEAP_IS_LOCKED)
  	{
  		/*
  		 * "Deleting" xact really only locked it, so the tuple is live in any
  		 * case.  However, we should make sure that either XMAX_COMMITTED or
  		 * XMAX_INVALID gets set once the xact is gone, to reduce the costs of
  		 * examining the tuple for future xacts.  Also, marking dead
  		 * MultiXacts as invalid here provides defense against MultiXactId
  		 * wraparound (see also comments in heap_freeze_tuple()).
  		 */
  		if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
  		{
  			if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  			{
  				if (MultiXactIdIsRunning(HeapTupleHeaderGetXmax(tuple)))
  					return HEAPTUPLE_LIVE;
  			}
  			else
--- 1168,1203 ----
  
  		/*
  		 * At this point the xmin is known committed, but we might not have
  		 * been able to set the hint bit yet; so we can no longer Assert that
  		 * it's set.
  		 */
+ #ifndef DISABLE_HINT_BITS
  	}
  
  	/*
  	 * Okay, the inserter committed, so it was good at some point.	Now what
  	 * about the deleting transaction?
  	 */
  	if (tuple->t_infomask & HEAP_XMAX_INVALID)
  		return HEAPTUPLE_LIVE;
! #endif
  	if (tuple->t_infomask & HEAP_IS_LOCKED)
  	{
  		/*
  		 * "Deleting" xact really only locked it, so the tuple is live in any
  		 * case.  However, we should make sure that either XMAX_COMMITTED or
  		 * XMAX_INVALID gets set once the xact is gone, to reduce the costs of
  		 * examining the tuple for future xacts.  Also, marking dead
  		 * MultiXacts as invalid here provides defense against MultiXactId
  		 * wraparound (see also comments in heap_freeze_tuple()).
  		 */
+ #ifndef DISABLE_HINT_BITS
  		if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
  		{
+ #endif
  			if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  			{
  				if (MultiXactIdIsRunning(HeapTupleHeaderGetXmax(tuple)))
  					return HEAPTUPLE_LIVE;
  			}
  			else
***************
*** 1178,1202 ****
  			 * We don't really care whether xmax did commit, abort or crash.
  			 * We know that xmax did lock the tuple, but it did not and will
  			 * never actually update it.
  			 */
  			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
  						InvalidTransactionId);
  		}
  		return HEAPTUPLE_LIVE;
  	}
  
  	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  	{
  		/* MultiXacts are currently only allowed to lock tuples */
  		Assert(tuple->t_infomask & HEAP_IS_LOCKED);
  		return HEAPTUPLE_LIVE;
  	}
! 
  	if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
  	{
  		if (TransactionIdIsInProgress(HeapTupleHeaderGetXmax(tuple)))
  			return HEAPTUPLE_DELETE_IN_PROGRESS;
  		else if (TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
  			SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
  						HeapTupleHeaderGetXmax(tuple));
  		else
--- 1210,1237 ----
  			 * We don't really care whether xmax did commit, abort or crash.
  			 * We know that xmax did lock the tuple, but it did not and will
  			 * never actually update it.
  			 */
  			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
  						InvalidTransactionId);
+ #ifndef DISABLE_HINT_BITS
  		}
+ #endif
  		return HEAPTUPLE_LIVE;
  	}
  
  	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
  	{
  		/* MultiXacts are currently only allowed to lock tuples */
  		Assert(tuple->t_infomask & HEAP_IS_LOCKED);
  		return HEAPTUPLE_LIVE;
  	}
! #ifndef DISABLE_HINT_BITS
  	if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
  	{
+ #endif
  		if (TransactionIdIsInProgress(HeapTupleHeaderGetXmax(tuple)))
  			return HEAPTUPLE_DELETE_IN_PROGRESS;
  		else if (TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
  			SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
  						HeapTupleHeaderGetXmax(tuple));
  		else
***************
*** 1211,1224 ****
  
  		/*
  		 * At this point the xmax is known committed, but we might not have
  		 * been able to set the hint bit yet; so we can no longer Assert that
  		 * it's set.
  		 */
  	}
! 
  	/*
  	 * Deleter committed, but check special cases.
  	 */
  
  	if (TransactionIdEquals(HeapTupleHeaderGetXmin(tuple),
  							HeapTupleHeaderGetXmax(tuple)))
--- 1246,1260 ----
  
  		/*
  		 * At this point the xmax is known committed, but we might not have
  		 * been able to set the hint bit yet; so we can no longer Assert that
  		 * it's set.
  		 */
+ #ifndef DISABLE_HINT_BITS
  	}
! #endif
  	/*
  	 * Deleter committed, but check special cases.
  	 */
  
  	if (TransactionIdEquals(HeapTupleHeaderGetXmin(tuple),
  							HeapTupleHeaderGetXmax(tuple)))
diff -r -C6 ./src/include/access/htup.h ../postgresql-9.0.1_hb2/src/include/access/htup.h
*** ./src/include/access/htup.h	2010-10-01 10:25:44.000000000 -0400
--- ../postgresql-9.0.1_hb2/src/include/access/htup.h	2010-12-16 12:17:56.000000000 -0500
***************
*** 166,181 ****
--- 166,185 ----
  /* bit 0x0010 is available */
  #define HEAP_COMBOCID			0x0020	/* t_cid is a combo cid */
  #define HEAP_XMAX_EXCL_LOCK		0x0040	/* xmax is exclusive locker */
  #define HEAP_XMAX_SHARED_LOCK	0x0080	/* xmax is shared locker */
  /* if either LOCK bit is set, xmax hasn't deleted the tuple, only locked it */
  #define HEAP_IS_LOCKED	(HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_SHARED_LOCK)
+ 
+ #ifndef DISABLE_HINT_BITS
  #define HEAP_XMIN_COMMITTED		0x0100	/* t_xmin committed */
  #define HEAP_XMIN_INVALID		0x0200	/* t_xmin invalid/aborted */
  #define HEAP_XMAX_COMMITTED		0x0400	/* t_xmax committed */
  #define HEAP_XMAX_INVALID		0x0800	/* t_xmax invalid/aborted */
+ #endif 
+ 
  #define HEAP_XMAX_IS_MULTI		0x1000	/* t_xmax is a MultiXactId */
  #define HEAP_UPDATED			0x2000	/* this is UPDATEd version of row */
  #define HEAP_MOVED_OFF			0x4000	/* moved to another place by pre-9.0
  										 * VACUUM FULL; kept for binary
  										 * upgrade support */
  #define HEAP_MOVED_IN			0x8000	/* moved from another place by pre-9.0
***************
*** 309,325 ****
--- 313,337 ----
  /*
   * Note that we stop considering a tuple HOT-updated as soon as it is known
   * aborted or the would-be updating transaction is known aborted.  For best
   * efficiency, check tuple visibility before using this macro, so that the
   * INVALID bits will be as up to date as possible.
   */
+ #ifndef DISABLE_HINT_BITS
  #define HeapTupleHeaderIsHotUpdated(tup) \
  ( \
  	((tup)->t_infomask2 & HEAP_HOT_UPDATED) != 0 && \
  	((tup)->t_infomask & (HEAP_XMIN_INVALID | HEAP_XMAX_INVALID)) == 0 \
  )
+ #else
+ 
+ #define HeapTupleHeaderIsHotUpdated(tup) \
+ ( \
+   ((tup)->t_infomask2 & HEAP_HOT_UPDATED) != 0 \
+ )
+ #endif
  
  #define HeapTupleHeaderSetHotUpdated(tup) \
  ( \
  	(tup)->t_infomask2 |= HEAP_HOT_UPDATED \
  )
  
diff -r -C6 ./src/include/utils/tqual.h ../postgresql-9.0.1_hb2/src/include/utils/tqual.h
*** ./src/include/utils/tqual.h	2010-10-01 10:25:44.000000000 -0400
--- ../postgresql-9.0.1_hb2/src/include/utils/tqual.h	2010-12-16 12:08:09.000000000 -0500
***************
*** 81,90 ****
--- 81,94 ----
  /* Special "satisfies" routines with different APIs */
  extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTupleHeader tuple,
  						 CommandId curcid, Buffer buffer);
  extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple,
  						 TransactionId OldestXmin, Buffer buffer);
  
+ #ifndef DISABLE_HINT_BITS
  extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
  					 uint16 infomask, TransactionId xid);
+ #else
+ #define HeapTupleSetHintBits(a,b,c,d) do{;}while(0)
+ #endif
  
  #endif   /* TQUAL_H */
clog_stress.sqltext/x-sql; charset=US-ASCII; name=clog_stress.sqlDownload
#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Merlin Moncure (#1)
Re: How much do the hint bits help?

Merlin Moncure <mmoncure@gmail.com> wrote:

*) what's a good way to stress the clog severely? I'd like to pick
a degenerate case to get a better idea of the way things stand
without them.

The worst I can think of is a large database with a 90/10 mix of
reads to writes -- all short transactions. Maybe someone else can
do better. In particular, I'm not sure how savepoints might play
into a degenerate case.

Since we're always talking about how to do better with hint bits
during an unlogged bulk load, it would be interesting to benchmark
one of those followed by a `select count(*) from newtable;` with and
without the patch, on a data set too big to fit in RAM.

*) is there community interest in a full patch that fills in the
missing details not implemented here?

I'm certainly curious to see real numbers.

-Kevin

#3Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Merlin Moncure (#1)
Re: How much do the hint bits help?

On 22/12/10 11:42, Merlin Moncure wrote:

Attached is an incomplete patch disabling hint bits based on compile
switch. It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines. However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around. Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.

Looks like a great idea to test, however I don't seem to be able to
compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of
src/include/pg_config_manual.h)

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -I../../../../src/include -D_GNU_SOURCE -c -o heapam.o heapam.c
heapam.c: In function �HeapTupleHeaderAdvanceLatestRemovedXid�:
heapam.c:3867: error: �HEAP_XMIN_COMMITTED� undeclared (first use in
this function)
heapam.c:3867: error: (Each undeclared identifier is reported only once
heapam.c:3867: error: for each function it appears in.)
heapam.c:3869: error: �HEAP_XMIN_INVALID� undeclared (first use in this
function)
make[4]: *** [heapam.o] Error 1

#4Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#3)
Re: How much do the hint bits help?

On 22/12/10 13:05, Mark Kirkwood wrote:

On 22/12/10 11:42, Merlin Moncure wrote:

Attached is an incomplete patch disabling hint bits based on compile
switch. It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines. However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around. Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.

Looks like a great idea to test, however I don't seem to be able to
compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of
src/include/pg_config_manual.h)

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -I../../../../src/include -D_GNU_SOURCE -c -o heapam.o
heapam.c
heapam.c: In function �HeapTupleHeaderAdvanceLatestRemovedXid�:
heapam.c:3867: error: �HEAP_XMIN_COMMITTED� undeclared (first use in
this function)
heapam.c:3867: error: (Each undeclared identifier is reported only once
heapam.c:3867: error: for each function it appears in.)
heapam.c:3869: error: �HEAP_XMIN_INVALID� undeclared (first use in
this function)
make[4]: *** [heapam.o] Error 1

Arrg, sorry - against git head on Ubuntu 10.03 (gcc 4.4.3)

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Mark Kirkwood (#4)
Re: How much do the hint bits help?

On Tue, Dec 21, 2010 at 7:06 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:

On 22/12/10 13:05, Mark Kirkwood wrote:

On 22/12/10 11:42, Merlin Moncure wrote:

Attached is an incomplete patch disabling hint bits based on compile
switch.  It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines.  However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around.  Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.

Looks like a great idea to test, however I don't seem to be able to
compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of
src/include/pg_config_manual.h)

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
-I../../../../src/include -D_GNU_SOURCE -c -o heapam.o heapam.c
heapam.c: In function ‘HeapTupleHeaderAdvanceLatestRemovedXid’:
heapam.c:3867: error: ‘HEAP_XMIN_COMMITTED’ undeclared (first use in this
function)
heapam.c:3867: error: (Each undeclared identifier is reported only once
heapam.c:3867: error: for each function it appears in.)
heapam.c:3869: error: ‘HEAP_XMIN_INVALID’ undeclared (first use in this
function)
make[4]: *** [heapam.o] Error 1

Arrg, sorry - against git head on Ubuntu 10.03 (gcc 4.4.3)

did you check to see if the patch applied clean? btw I was working
against postgresql-9.0.1...

it looks like you are missing at least some of the changes to htup.h:

../postgresql-9.0.1_hb2/src/include/access/htup.h

#ifndef DISABLE_HINT_BITS
#define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
#define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
#define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */
#define HEAP_XMAX_INVALID 0x0800 /* t_xmax invalid/aborted */
#endif

merlin

#6Merlin Moncure
mmoncure@gmail.com
In reply to: Merlin Moncure (#5)
Re: How much do the hint bits help?

On Tue, Dec 21, 2010 at 7:20 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Tue, Dec 21, 2010 at 7:06 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:

On 22/12/10 13:05, Mark Kirkwood wrote:

On 22/12/10 11:42, Merlin Moncure wrote:

Attached is an incomplete patch disabling hint bits based on compile
switch.  It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines.  However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around.  Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.

Looks like a great idea to test, however I don't seem to be able to
compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of
src/include/pg_config_manual.h)

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
-I../../../../src/include -D_GNU_SOURCE -c -o heapam.o heapam.c
heapam.c: In function ‘HeapTupleHeaderAdvanceLatestRemovedXid’:
heapam.c:3867: error: ‘HEAP_XMIN_COMMITTED’ undeclared (first use in this
function)
heapam.c:3867: error: (Each undeclared identifier is reported only once
heapam.c:3867: error: for each function it appears in.)
heapam.c:3869: error: ‘HEAP_XMIN_INVALID’ undeclared (first use in this
function)
make[4]: *** [heapam.o] Error 1

Arrg, sorry - against git head on Ubuntu 10.03 (gcc 4.4.3)

did you check to see if the patch applied clean? btw I was working
against postgresql-9.0.1...

ah, this is the problem (9.0.1 vs head). to work vs head it prob
needs a few more tweaks. you can also try removing it yourself --
most of the changes follow a similar pattern.

merlin

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#1)
Re: How much do the hint bits help?

Merlin Moncure <mmoncure@gmail.com> writes:

Attached is an incomplete patch disabling hint bits based on compile
switch. ...
So far, at least doing pgbench runs and another test designed to
exercise clog lookups, the performance loss of always doing full
lookup hasn't materialized.

The standard pgbench test would be just about 100% useless for stressing
this, because its net database activity is only about one row
touched/updated per query. You need a test case that hits lots of rows
per query, else you're just measuring parse+plan+network overhead.

regards, tom lane

#8Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#7)
Re: How much do the hint bits help?

On Tue, Dec 21, 2010 at 7:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

Attached is an incomplete patch disabling hint bits based on compile
switch. ...
So far, at least doing pgbench runs and another test designed to
exercise clog lookups, the performance loss of always doing full
lookup hasn't materialized.

The standard pgbench test would be just about 100% useless for stressing
this, because its net database activity is only about one row
touched/updated per query.  You need a test case that hits lots of rows
per query, else you're just measuring parse+plan+network overhead.

right -- see the attached clog_stress.sql above. It creates a script
that inserts records in blocks of 10000, deletes half of them, and
vacuums. Neither the execution of the script nor a seq scan following
its execution showed an interesting performance difference (which I am
arbitrarily calling 5% in either direction). Like I said though, I
don't trust the patch or the results yet.

@Mark: apparently the cvs server is behind git and there are some
recent changes to heapam.c that need more attention. I need to get
git going on my box, but try changing this:

if ((tuple->t_infomask & HEAP_XMIN_COMMITTED) ||
(!(tuple->t_infomask & HEAP_XMIN_COMMITTED) &&
!(tuple->t_infomask & HEAP_XMIN_INVALID) &&
TransactionIdDidCommit(xmin)))

to this:

if (TransactionIdDidCommit(xmin))

also, isn't the extra check vs HEAP_XMIN_COMMITTED redundant, and if
you do have to look up clog, why not set the hint bit?

merlin

#9Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Merlin Moncure (#8)
Re: How much do the hint bits help?

On 22/12/10 13:56, Merlin Moncure wrote:

On Tue, Dec 21, 2010 at 7:45 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

@Mark: apparently the cvs server is behind git and there are some
recent changes to heapam.c that need more attention. I need to get
git going on my box, but try changing this:

if ((tuple->t_infomask& HEAP_XMIN_COMMITTED) ||
(!(tuple->t_infomask& HEAP_XMIN_COMMITTED)&&
!(tuple->t_infomask& HEAP_XMIN_INVALID)&&
TransactionIdDidCommit(xmin)))

to this:

if (TransactionIdDidCommit(xmin))

also, isn't the extra check vs HEAP_XMIN_COMMITTED redundant, and if
you do have to look up clog, why not set the hint bit?

That gets it compiling.

#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Merlin Moncure (#8)
Re: How much do the hint bits help?

On 22.12.2010 02:56, Merlin Moncure wrote:

On Tue, Dec 21, 2010 at 7:45 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Merlin Moncure<mmoncure@gmail.com> writes:

Attached is an incomplete patch disabling hint bits based on compile
switch. ...
So far, at least doing pgbench runs and another test designed to
exercise clog lookups, the performance loss of always doing full
lookup hasn't materialized.

The standard pgbench test would be just about 100% useless for stressing
this, because its net database activity is only about one row
touched/updated per query. You need a test case that hits lots of rows
per query, else you're just measuring parse+plan+network overhead.

right -- see the attached clog_stress.sql above. It creates a script
that inserts records in blocks of 10000, deletes half of them, and
vacuums. Neither the execution of the script nor a seq scan following
its execution showed an interesting performance difference (which I am
arbitrarily calling 5% in either direction). Like I said though, I
don't trust the patch or the results yet.

Make sure you have a good mix of different xids in the table,
TransactionLogFetch has a one-item cache so repeatedly checking the same
xid is much faster than the general case.

Perhaps run pgbench for a while, and then do "SELECT COUNT(*)" on the
resulting tables.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#11Simon Riggs
simon@2ndQuadrant.com
In reply to: Merlin Moncure (#1)
Re: How much do the hint bits help?

On Tue, 2010-12-21 at 17:42 -0500, Merlin Moncure wrote:

*) is there community interest in a full patch that fills in the
missing details not implemented here?

You're thinking seems sound to me. We now have all-visible flags, fewer
xids, much better clog concurrency. Avoiding hint bits would also
noticeably reduce number of dirty writes, especially at checkpoint.

Hot Standby already ignores hint bits and I've not heard a single
complaint, so we are already doing this in the code.

I don't see any reason to believe that there is not an equally effective
optimisation that we can apply to bring performance back up, if it is
shown to drop in particular use cases.

I would vote to put this into 9.1 as a non-default option at restart,
opening the door to other features which hint bits are frustrating.
People can then choose between those features and the "power of hint
bits". I think many people would choose db block checksums.

If you need support, or direct help with the code, just ask. Am happy to
be your committer also.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#12Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#11)
Re: How much do the hint bits help?

On 22.12.2010 15:21, Simon Riggs wrote:

On Tue, 2010-12-21 at 17:42 -0500, Merlin Moncure wrote:

*) is there community interest in a full patch that fills in the
missing details not implemented here?

You're thinking seems sound to me. We now have all-visible flags, fewer
xids, much better clog concurrency. Avoiding hint bits would also
noticeably reduce number of dirty writes, especially at checkpoint.

Yep.

Hot Standby already ignores hint bits and I've not heard a single
complaint, so we are already doing this in the code.

No, the XMIN/XMAX committed/invalid hint bits on each heap tuple are
used during hot sandby just like during normal operation. We ignore the
index tuples marked as dead during hot standby, but that's a different
issue.

I would vote to put this into 9.1 as a non-default option at restart,
opening the door to other features which hint bits are frustrating.
People can then choose between those features and the "power of hint
bits". I think many people would choose db block checksums.

Making it optional would add some ifs in the critical paths, possibly
making it slower.

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#12)
Re: How much do the hint bits help?

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

I would vote to put this into 9.1 as a non-default option at restart,
opening the door to other features which hint bits are frustrating.
People can then choose between those features and the "power of hint
bits". I think many people would choose db block checksums.

Making it optional would add some ifs in the critical paths, possibly
making it slower.

Hardly. A server-start parameter is going to be constant during
execution and branch prediction will just snuff that away to nothing.

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.

That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness. This isn't a discussion about hint
bits, its a discussion about opening the way for other features.

ISTM there are other ways of optimising any clog issues that may remain,
so clutching to this ancient optimisation has no further benefit for me.

Merlin's idea seems to me to be original, useful *and* reasonable.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#14Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#13)
Re: How much do the hint bits help?

On 22.12.2010 15:59, Simon Riggs wrote:

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.

That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness.

It does? The problem with block checksums is that if you modify a page
and don't have a corresponding WAL record for it, like a hint bit
update, you can have a torn page so that the checksum doesn't match.
Refraining from dirtying the page when a hint bit is updated avoids the
problem. With that change, we only ever write pages to disk that have a
WAL record associated with it, with full-page images as necessary to
avoid torn pages.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#14)
Re: How much do the hint bits help?

On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:

On 22.12.2010 15:59, Simon Riggs wrote:

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.

That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness.

It does? The problem with block checksums is that if you modify a page
and don't have a corresponding WAL record for it, like a hint bit
update, you can have a torn page so that the checksum doesn't match.
Refraining from dirtying the page when a hint bit is updated avoids the
problem. With that change, we only ever write pages to disk that have a
WAL record associated with it, with full-page images as necessary to
avoid torn pages.

Which then leads to a block CRC not matching the block in memory. Sure,
we can avoid CRC checking the hint bits, but that requires a much more
expensive and complex CRC check.

So what you suggest works only if we restrict CRC checking to blocks
incoming to the buffer cache, but leaves us unable to do CRC checks on
blocks once in the buffer cache. Since many blocks stay in cache almost
constantly, we're left with the situation that the most heavily used
parts of the database seldom get CRC checked.

Postgres needs CRC checking more than it needs hint bits.

I think we should allow this as an option, and if it proves to be an
issue during beta then we can remove it before we go live, assuming we
cannot get a reasonable alternate optimisation.

I think its important for Postgres to implement this in the same release
as sync rep. They complement each other: confirmed robustness. Exactly
the features we need to prove to the rest of the world to trust us with
their data.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#16Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#15)
Re: How much do the hint bits help?

On Wed, Dec 22, 2010 at 9:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I think its important for Postgres to implement this in the same release
as sync rep.

i.e. never, at the rate sync rep has been progressing for the last few months?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#15)
Re: How much do the hint bits help?

On 22.12.2010 16:52, Simon Riggs wrote:

On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:

On 22.12.2010 15:59, Simon Riggs wrote:

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.

That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness.

It does? The problem with block checksums is that if you modify a page
and don't have a corresponding WAL record for it, like a hint bit
update, you can have a torn page so that the checksum doesn't match.
Refraining from dirtying the page when a hint bit is updated avoids the
problem. With that change, we only ever write pages to disk that have a
WAL record associated with it, with full-page images as necessary to
avoid torn pages.

Which then leads to a block CRC not matching the block in memory.

What do you mean?

Do you envision that the CRC is calculated at every update, or only when
a page is written out from the buffer cache? If the former, you could
recalculate the CRC at a hint bit update too. If the latter, the hint
bits are included in the page image that you checksum just like any
other data.

So what you suggest works only if we restrict CRC checking to blocks
incoming to the buffer cache, but leaves us unable to do CRC checks on
blocks once in the buffer cache. Since many blocks stay in cache almost
constantly, we're left with the situation that the most heavily used
parts of the database seldom get CRC checked.

There's plenty of stuff in memory that's not covered by an
application-level CRC. That's what ECC RAM is for. Updating the CRC at
every update to a page seems really expensive, but it's an orthogonal
issue to hint bits.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#18Aidan Van Dyk
aidan@highrise.ca
In reply to: Simon Riggs (#15)
Re: How much do the hint bits help?

On Wed, Dec 22, 2010 at 9:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

So what you suggest works only if we restrict CRC checking to blocks
incoming to the buffer cache, but leaves us unable to do CRC checks on
blocks once in the buffer cache. Since many blocks stay in cache almost
constantly, we're left with the situation that the most heavily used
parts of the database seldom get CRC checked.

With this statement, you just moved the goal posts on the checksumming
ideas. In fact, you didn't just move the goal posts, you picked the
ball up and teleported it to another stadium.

I believe that most of the people talking about and wanting checksums
so far have been wanting them to verify I/O, not to verify that PG has
no bugs, that RAM is staying charged correctly, and that no stray bits
have been flipped, and that nobody else happens to be scribbling over
our shared buffers.

Being able to arbitrary (i.e at any point in time) prove that the
shared buffers contents are exactly what they should be may be a
worthy goal, but that's many orders of magnitude more difficult than
verifying that the bytes we read from disk are the ones we wrote to
disk.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#17)
Re: How much do the hint bits help?

On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:

On 22.12.2010 16:52, Simon Riggs wrote:

On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:

On 22.12.2010 15:59, Simon Riggs wrote:

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.

That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness.

It does? The problem with block checksums is that if you modify a page
and don't have a corresponding WAL record for it, like a hint bit
update, you can have a torn page so that the checksum doesn't match.
Refraining from dirtying the page when a hint bit is updated avoids the
problem. With that change, we only ever write pages to disk that have a
WAL record associated with it, with full-page images as necessary to
avoid torn pages.

Which then leads to a block CRC not matching the block in memory.

Do you envision that the CRC is calculated at every update, or only when
a page is written out from the buffer cache?

At every update, so there is a clear assertion that the CRC matches the
block.

If the former, you could
recalculate the CRC at a hint bit update too. If the latter, the hint
bits are included in the page image that you checksum just like any
other data.

If we didn't have hint bits, we wouldn't need to recalculate the CRC
each time one was updated...

So what you suggest works only if we restrict CRC checking to blocks
incoming to the buffer cache, but leaves us unable to do CRC checks on
blocks once in the buffer cache. Since many blocks stay in cache almost
constantly, we're left with the situation that the most heavily used
parts of the database seldom get CRC checked.

There's plenty of stuff in memory that's not covered by an
application-level CRC. That's what ECC RAM is for.

http://www.google.com/research/pubs/archive/35162.pdf

Google research shows that each DIMM has an 8% chance per annum of
uncorrectable memory errors, even on ECC.

If you have large RAM, like everybody now does, your incidence of this
type of error will be much higher than it was in previous years, so our
perception of what is necessary now to protect databases is out of date.

We have data under our care, and will be much more likely to receive
this kind of error because of the amount of RAM we use.

Updating the CRC at
every update to a page seems really expensive, but it's an orthogonal
issue to hint bits.

Clearly, the frequency with which we set hint bits affects the frequency
we can sensibly update CRCs. It shouldn't be up to us to decide how much
protection a user wants to give their data.

There might be two or three settings that make sense, but clearly we
need to be able to limit hint-bit setting to allow us to have a usable
CRC check. So there is a very string connection between turning this
optimisation off and gaining CRC checking as a feature.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#20Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#19)
Re: How much do the hint bits help?

On 22.12.2010 17:31, Simon Riggs wrote:

On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:

Do you envision that the CRC is calculated at every update, or only when
a page is written out from the buffer cache?

At every update, so there is a clear assertion that the CRC matches the
block.

Umm, when do you check the CRC? Every time the page is locked? Every
time it's updated? If don't verify the CRC, what is it good for?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#21Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#19)
Re: How much do the hint bits help?

On 22.12.2010 17:31, Simon Riggs wrote:

On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:

There's plenty of stuff in memory that's not covered by an
application-level CRC. That's what ECC RAM is for.

http://www.google.com/research/pubs/archive/35162.pdf

Google research shows that each DIMM has an 8% chance per annum of
uncorrectable memory errors, even on ECC.

You misread that paper. From summary:

About a third of machines and over 8% of DIMMs in
our fleet saw at least one *correctable* error per year.

Emphasis mine.

Our
per-DIMM rates of correctable errors translate to an aver-
age of 25,000–75,000 FIT (failures in time per billion hours
of operation) per Mbit and a median FIT range of 778 –
25,000 per Mbit (median for DIMMs with errors), while pre-
vious studies report 200-5,000 FIT per Mbit. The number of
correctable errors per DIMM is highly variable, with some
DIMMs experiencing a huge number of errors, compared to
others. The annual incidence of uncorrectable errors was
1.3% per machine and 0.22% per DIMM.

So the real figure of uncorrectable errors is 0.22% per DIMM.

Anyway, unreliable RAM calls for more ECC bits in DIMMs, not invasive
architectural changes to every single application in the system.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Aidan Van Dyk (#18)
Re: How much do the hint bits help?

Aidan Van Dyk <aidan@highrise.ca> writes:

With this statement, you just moved the goal posts on the checksumming
ideas. In fact, you didn't just move the goal posts, you picked the
ball up and teleported it to another stadium.

What he said. I can't imagine that anyone will be interested in any
case other than "set the CRC immediately before writing, and check it
upon first reading the page in". Maintaining it continuously while the
page is in shared memory is completely insane from a cost-versus-benefit
perspective.

regards, tom lane

#23Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#22)
Re: How much do the hint bits help?

On Wed, 2010-12-22 at 10:45 -0500, Tom Lane wrote:

Aidan Van Dyk <aidan@highrise.ca> writes:

With this statement, you just moved the goal posts on the checksumming
ideas. In fact, you didn't just move the goal posts, you picked the
ball up and teleported it to another stadium.

What he said. I can't imagine that anyone will be interested in any
case other than "set the CRC immediately before writing, and check it
upon first reading the page in". Maintaining it continuously while the
page is in shared memory is completely insane from a cost-versus-benefit
perspective.

If you insist on setting hint-bits, then that is probably true.

Many people experience almost no I/O these days, and there's a strong
correlation between people caring about their data and also being
willing to spend big $s on cache. We need to protect our users, however
much money they spent on cache; I would argue the more money they spent
on cache the harder we should be trying to protect them.

I'm sure it will take a little while for everybody to understand why a
full CRC implementation is both necessary and now possible. Paradigm
shifts of thought do seem like teleports, but they can be beneficial.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#24Aidan Van Dyk
aidan@highrise.ca
In reply to: Simon Riggs (#23)
Re: How much do the hint bits help?

On Wed, Dec 22, 2010 at 10:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I'm sure it will take a little while for everybody to understand why a
full CRC implementation is both necessary and now possible. Paradigm
shifts of thought do seem like teleports, but they can be beneficial.

But please don't deny the rest of us airbags while you keep working on
teleportation ;-)

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#12)
Re: How much do the hint bits help?

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them.

I think it's far more likely that that could be acceptable than the
radical method of removing hint bits altogether.

I have not looked into what's wrong with Merlin's test case, but my
thinking about it goes like this: we know that contention for buffer
lookup is significant at high loads, despite the facts that the accesses
are distributed across a lot of independently-usable buffers and we've
done much work to partition the lookup locks. If we remove hint bits
and thereby force an access to clog for every tuple touch, we can expect
that the contention for clog access will be comparable to the worst case
for buffer access contention ... except that in many cases, it will be
distributed across far fewer pages and so the actual interference rate
will be far higher. This will make our past experiences with "context
swap storms" look like a day at the beach.

regards, tom lane

#26Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#21)
Re: How much do the hint bits help?

On Wed, 2010-12-22 at 17:42 +0200, Heikki Linnakangas wrote:

On 22.12.2010 17:31, Simon Riggs wrote:

On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:

There's plenty of stuff in memory that's not covered by an
application-level CRC. That's what ECC RAM is for.

http://www.google.com/research/pubs/archive/35162.pdf

Google research shows that each DIMM has an 8% chance per annum of
uncorrectable memory errors, even on ECC.

You misread that paper. From summary:

I read the paper in detail before I posted. If you think that finding an
error in my quote disproves anything, you should read the whole paper. I
see this:

Conclusion 1
"... Nonetheless, the remaining incidence of 0.22% per DIMM
per year makes a crash-tolerant application layer indispens-
able for large-scale server farms."

What you are arguing for is a protection system that will reduce in
effectiveness as we add more cache.

What I am arguing in favour of is an option to allow people to protect
their data, whatever the size of their cache. I'm not forcing you or
anyone to use it, but I think its an important option to be offering to
our users.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#27Merlin Moncure
mmoncure@gmail.com
In reply to: Aidan Van Dyk (#24)
Re: How much do the hint bits help?

On Wed, Dec 22, 2010 at 10:55 AM, Aidan Van Dyk <aidan@highrise.ca> wrote:

On Wed, Dec 22, 2010 at 10:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I'm sure it will take a little while for everybody to understand why a
full CRC implementation is both necessary and now possible. Paradigm
shifts of thought do seem like teleports, but they can be beneficial.

But please don't deny the rest of us airbags while you keep working on
teleportation ;-)

well, simon's point that hint bits complicate checksum may nor may not
be the case, but no hint bits = less i/o = less checksumming (unless
you checksum around the hint bits). This lowers the expense of doing
it, which is nice. Maybe that doesn't matter in the end, we'll see.

merlin

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#27)
Re: How much do the hint bits help?

Merlin Moncure <mmoncure@gmail.com> writes:

well, simon's point that hint bits complicate checksum may nor may not
be the case, but no hint bits = less i/o = less checksumming (unless
you checksum around the hint bits).

I think you're optimistically assuming the extra clog accesses don't
cost any I/O.

regards, tom lane

#29Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#25)
Re: How much do the hint bits help?

On Wed, Dec 22, 2010 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them.

I think it's far more likely that that could be acceptable than the
radical method of removing hint bits altogether.

I have not looked into what's wrong with Merlin's test case, but my
thinking about it goes like this: we know that contention for buffer
lookup is significant at high loads, despite the facts that the accesses
are distributed across a lot of independently-usable buffers and we've
done much work to partition the lookup locks.  If we remove hint bits
and thereby force an access to clog for every tuple touch, we can expect
that the contention for clog access will be comparable to the worst case
for buffer access contention ... except that in many cases, it will be
distributed across far fewer pages and so the actual interference rate
will be far higher.  This will make our past experiences with "context
swap storms" look like a day at the beach.

right. note I'm not suggesting they they should actually be removed,
at least not yet. I was just playing around and noticed that the cost
of not having them is not immediately obvious in highly synthetic
tests. The cost of clog access in best case scenario appears to be
near zero, which I thought was interesting enough to point out. What
I'm after here is the worst case scenario, how likely it is to happen,
and looking into possible remedies (if any).

I'm going to do lots more testing over the holidays. I'm fishing for
ideas on good ways to flesh things out more.

merlin

#30Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#28)
Re: How much do the hint bits help?

On Wed, Dec 22, 2010 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

well, simon's point that hint bits complicate checksum may nor may not
be the case, but no hint bits = less i/o = less checksumming (unless
you checksum around the hint bits).

I think you're optimistically assuming the extra clog accesses don't
cost any I/O.

right, but clog is much more highly packed which is both a good and a
bad thing. my conjecture here is that jamming the clog files is
actually good, because that keeps them 'hot' and more than compensates
the extra heap i/o. the extra lock of course is scary.

here's the thing, compared to the 90's when they were put in, the
transaction space has shrunk by half and we put gigabytes, not
megabytes of memory into servers. what does this mean for the clog?
that's what i'm after.

merlin

#31Merlin Moncure
mmoncure@gmail.com
In reply to: Merlin Moncure (#30)
Re: How much do the hint bits help?

On Wed, Dec 22, 2010 at 11:12 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Wed, Dec 22, 2010 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

well, simon's point that hint bits complicate checksum may nor may not
be the case, but no hint bits = less i/o = less checksumming (unless
you checksum around the hint bits).

I think you're optimistically assuming the extra clog accesses don't
cost any I/O.

right, but clog is much more highly packed which is both a good and a
bad thing.  my conjecture here is that jamming the clog files is
actually good, because that keeps them 'hot' and more than compensates
the extra heap i/o.  the extra lock of course is scary.

er, should have said, plus less heap i/o compensates the extra clog i/o.

merlin

#32Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#25)
Re: How much do the hint bits help?

On Wed, 2010-12-22 at 10:59 -0500, Tom Lane wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them.

I think it's far more likely that that could be acceptable than the
radical method of removing hint bits altogether.

I haven't argued to remove them, just have an option to not set them.

I have not looked into what's wrong with Merlin's test case, but my
thinking about it goes like this: we know that contention for buffer
lookup is significant at high loads, despite the facts that the accesses
are distributed across a lot of independently-usable buffers and we've
done much work to partition the lookup locks. If we remove hint bits
and thereby force an access to clog for every tuple touch, we can expect
that the contention for clog access will be comparable to the worst case
for buffer access contention ... except that in many cases, it will be
distributed across far fewer pages and so the actual interference rate
will be far higher. This will make our past experiences with "context
swap storms" look like a day at the beach.

I think you're right, but I also think there are other ways we could
optimise that other than hint bits.

For example, the single item cache might be changed, or we might
buffer/batch clog updates, or we might use a hash table of known aborted
transactions etc.

As Merlin points out, we don't have much evidence for their value or
lack of value, so we need a parameter to allow wide scale testing.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#29)
Re: How much do the hint bits help?

Merlin Moncure <mmoncure@gmail.com> writes:

I'm going to do lots more testing over the holidays. I'm fishing for
ideas on good ways to flesh things out more.

Based on the analogy to past bufmgr contention problems, I'd suggest
going back through the archives to look for the test cases associated
with context swap storm discussions. The cases themselves might not
be quite right for this, but they'd at least show a structure for
stressing things at the tuple-access level.

regards, tom lane

#34David Fetter
david@fetter.org
In reply to: Simon Riggs (#26)
Re: How much do the hint bits help?

On Wed, Dec 22, 2010 at 04:00:30PM +0000, Simon Riggs wrote:

On Wed, 2010-12-22 at 17:42 +0200, Heikki Linnakangas wrote:

On 22.12.2010 17:31, Simon Riggs wrote:

On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:

There's plenty of stuff in memory that's not covered by an
application-level CRC. That's what ECC RAM is for.

http://www.google.com/research/pubs/archive/35162.pdf

Google research shows that each DIMM has an 8% chance per annum of
uncorrectable memory errors, even on ECC.

You misread that paper. From summary:

I read the paper in detail before I posted. If you think that finding an
error in my quote disproves anything, you should read the whole paper. I
see this:

Conclusion 1
"... Nonetheless, the remaining incidence of 0.22% per DIMM
per year makes a crash-tolerant application layer indispens-
able for large-scale server farms."

What you are arguing for is a protection system that will reduce in
effectiveness as we add more cache.

What I am arguing in favour of is an option to allow people to protect
their data, whatever the size of their cache. I'm not forcing you or
anyone to use it, but I think its an important option to be offering to
our users.

For what version of PostgreSQL are you proposing that we provide this
protection? Let's assume that it's before 10.0 so we can get some
idea of how this will arise :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#35Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Merlin Moncure (#30)
Re: How much do the hint bits help?

On 22.12.2010 18:12, Merlin Moncure wrote:

On Wed, Dec 22, 2010 at 11:06 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Merlin Moncure<mmoncure@gmail.com> writes:

well, simon's point that hint bits complicate checksum may nor may not
be the case, but no hint bits = less i/o = less checksumming (unless
you checksum around the hint bits).

I think you're optimistically assuming the extra clog accesses don't
cost any I/O.

right, but clog is much more highly packed which is both a good and a
bad thing.

As a sidenote: note that the clog is not currently CRC'd.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#36Josh Berkus
josh@agliodbs.com
In reply to: Merlin Moncure (#8)
Re: How much do the hint bits help?

right -- see the attached clog_stress.sql above. It creates a script
that inserts records in blocks of 10000, deletes half of them, and
vacuums. Neither the execution of the script nor a seq scan following
its execution showed an interesting performance difference (which I am
arbitrarily calling 5% in either direction). Like I said though, I
don't trust the patch or the results yet.

Given that DBT2 stressed the bufrmgr contention pretty well, it seems
like it'd be worth trying this for hint bits in the test servers. We
should see if Mark Wong can do this in the new year.

I might be able to test on some client workloads. We'll see; currently
I lack the harness to simulate a high level of client contention.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#37Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Merlin Moncure (#29)
Re: How much do the hint bits help?

On 23/12/10 05:06, Merlin Moncure wrote:

On Wed, Dec 22, 2010 at 10:59 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them.

I think it's far more likely that that could be acceptable than the
radical method of removing hint bits altogether.

I have not looked into what's wrong with Merlin's test case, but my
thinking about it goes like this: we know that contention for buffer
lookup is significant at high loads, despite the facts that the accesses
are distributed across a lot of independently-usable buffers and we've
done much work to partition the lookup locks. If we remove hint bits
and thereby force an access to clog for every tuple touch, we can expect
that the contention for clog access will be comparable to the worst case
for buffer access contention ... except that in many cases, it will be
distributed across far fewer pages and so the actual interference rate
will be far higher. This will make our past experiences with "context
swap storms" look like a day at the beach.

right. note I'm not suggesting they they should actually be removed,
at least not yet. I was just playing around and noticed that the cost
of not having them is not immediately obvious in highly synthetic
tests. The cost of clog access in best case scenario appears to be
near zero, which I thought was interesting enough to point out. What
I'm after here is the worst case scenario, how likely it is to happen,
and looking into possible remedies (if any).

I'm going to do lots more testing over the holidays. I'm fishing for
ideas on good ways to flesh things out more.

Certainly having a choice about configuring them would be a good
addition in itself, e.g for data warehousing use the hint bits can be a
considerable impediment so the *ability* to not have them would be a
huge advantage.

if I have time over the early new year I'll do some testing too.

Cheers

Mark

#38Josh Berkus
josh@agliodbs.com
In reply to: Aidan Van Dyk (#18)
Re: CRC checks WAS: How much do the hint bits help?

I believe that most of the people talking about and wanting checksums
so far have been wanting them to verify I/O, not to verify that PG has
no bugs, that RAM is staying charged correctly, and that no stray bits
have been flipped, and that nobody else happens to be scribbling over
our shared buffers.

I agree that this should be our first goal. Yes, we want to protect
users against memory errors as well. However, that's a much tougher
feature to implement; I've done some hashing this out with engineers on
other DBMSes and nobody has good answers right now. The overhead of
what Simon proposes would be enormous, and few users would be interested
in paying that cost.

Doing a CRC check-on-write, as well as checking for format corruption
before write would catch a majority of real-world problems. Please
don't hold that up in pursuit of the bit-flipping problem, which
*nobody* has solved.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#39Josh Berkus
josh@agliodbs.com
In reply to: Mark Kirkwood (#37)
Re: How much do the hint bits help?

Certainly having a choice about configuring them would be a good
addition in itself, e.g for data warehousing use the hint bits can be a
considerable impediment so the *ability* to not have them would be a
huge advantage.

Would need to be a restart option, no?

Regarding the contention which Tom expects: the extra load on the CLOG
would be 100% reads, no? If it's *all* reads, why would we have any
more contention than we have now?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#39)
Re: How much do the hint bits help?

Josh Berkus <josh@agliodbs.com> writes:

Regarding the contention which Tom expects: the extra load on the CLOG
would be 100% reads, no? If it's *all* reads, why would we have any
more contention than we have now?

Read involves sharelock which still causes contention. Those bufmgr
contention storms we saw before were completely independent of whether
the pages were accessed for read or for write.

Another thing to keep in mind is that the current clog access code is
designed on the assumption that there's considerable locality of access
to pg_clog, ie, you usually only need to consult it for recent XIDs
because older ones have been hinted. Turn off hint bits, that behavior
goes out the window.

regards, tom lane

#41Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Josh Berkus (#36)
Re: How much do the hint bits help?

Josh Berkus <josh@agliodbs.com> writes: > I might be able to test
on some client workloads. We'll see; currently > I lack the
harness to simulate a high level of client contention. We're
pretty successful in doing that with Tsung, even against large
clusters of plproxy nodes. http://tsung.erlang-projects.org/
http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#42Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Tom Lane (#40)
Re: How much do the hint bits help?

On 23/12/10 10:54, Tom Lane wrote:

Josh Berkus<josh@agliodbs.com> writes:

Regarding the contention which Tom expects: the extra load on the CLOG
would be 100% reads, no? If it's *all* reads, why would we have any
more contention than we have now?

Read involves sharelock which still causes contention. Those bufmgr
contention storms we saw before were completely independent of whether
the pages were accessed for read or for write.

Another thing to keep in mind is that the current clog access code is
designed on the assumption that there's considerable locality of access
to pg_clog, ie, you usually only need to consult it for recent XIDs
because older ones have been hinted. Turn off hint bits, that behavior
goes out the window.

Would a larger (or configurable) clog cache help with this tho?

Cheers

Mark

#43Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#35)
Re: How much do the hint bits help?

On Wed, 2010-12-22 at 22:08 +0200, Heikki Linnakangas wrote:

On 22.12.2010 18:12, Merlin Moncure wrote:

On Wed, Dec 22, 2010 at 11:06 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Merlin Moncure<mmoncure@gmail.com> writes:

well, simon's point that hint bits complicate checksum may nor may not
be the case, but no hint bits = less i/o = less checksumming (unless
you checksum around the hint bits).

I think you're optimistically assuming the extra clog accesses don't
cost any I/O.

right, but clog is much more highly packed which is both a good and a
bad thing.

As a sidenote: note that the clog is not currently CRC'd.

Good point, thanks for mentioning it.

With 64kB of clog buffers and potentially 8 GB of shared_buffers, which
is about 10^5 more RAM for shared_buffers. So a protection mechanism for
shared_buffers will trap about 99.999% of RAM errors.

We might say that an error in clog could have a serious effect, and I
would agree. I don't see a way around that though, except for a CRC
check when we write to disk.

My understanding is that the context switch storms were because of the
I/O involved with thrashing the clog buffers. (Well, actually, I think
it was subtrans, but sane difference). To solve that, we could just swap
them out to shared_buffers with usage = 5 rather than evict them.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#44Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#40)
Re: How much do the hint bits help?

On Wed, Dec 22, 2010 at 4:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Josh Berkus <josh@agliodbs.com> writes:

Regarding the contention which Tom expects: the extra load on the CLOG
would be 100% reads, no?  If it's *all* reads, why would we have any
more contention than we have now?

Read involves sharelock which still causes contention.  Those bufmgr
contention storms we saw before were completely independent of whether
the pages were accessed for read or for write.

Another thing to keep in mind is that the current clog access code is
designed on the assumption that there's considerable locality of access
to pg_clog, ie, you usually only need to consult it for recent XIDs
because older ones have been hinted.  Turn off hint bits, that behavior
goes out the window.

That's not always going to be the case though. In olap-ish
environments you will see cases of scans over many records that come
from a single transaction. This is also the case where hint bits can
really drill you -- you insert a bunch of records, log the bits,
delete, log the bits, and vacuum eventually. I started investigating
this on behalf of a friend who is experiencing basically the worst
case with regularity.

merlin