Make HeapTupleSatisfiesMVCC more concurrent

Started by Jeff Janesover 10 years ago21 messages
#1Jeff Janes
jeff.janes@gmail.com
1 attachment(s)

When we check a tuple for MVCC, it has to pass checks that the inserting
transaction has committed, and that it committed before our snapshot
began. And similarly that the deleting transaction hasn't committed, or
did so after our snapshot.

XidInMVCCSnapshot is (or can be) very much cheaper
than TransactionIdIsInProgress, because the former touches only local
memory while the latter takes a highly contended lock and inspects shared
memory. We do the slow one first, but we could do the fast one first and
sometimes short-circuit the slow one. If the transaction is in our
snapshot, it doesn't matter if it is still in progress or not.

This was discussed back in 2013 (
/messages/by-id/CAMkU=1yy-YEQVvqj2xJitT1EFkyuFk7uTV_hrOMGyGMxpU=N+Q@mail.gmail.com),
and I wanted to revive it. The recent lwlock atomic changes haven't made
the problem irrelevant.

This patch swaps the order of the checks under some conditions. So that
hackers can readily do testing without juggling binaries, I've added an
experimental guc which controls the behavior. JJ_SNAP=0 is the original
(git HEAD) behavior, JJ_SNAP=1 turns on the new behavior.

I've added some flag variables to record if XidInMVCCSnapshot was already
called. XidInMVCCSnapshot is cheap, but not so cheap that we want to call
it twice if we can avoid it. Those would probably stay in some form or
another when the experimental guc goes away.

We might be able to rearrange the series of "if-tests" to get rid of the
flags variables, but I didn't want to touch the HEAP_MOVED_OFF
and HEAP_MOVED_IN parts of the code, as those must get about zero
regression testing.

The situation where the performance of this really shows up is when there
are tuples that remain in an unresolved state while highly concurrent
processes keep stumbling over them.

I set that up by using the pgbench tables with scale factor of 1, and
running a custom query at high concurrency which seq_scans the accounts
table:

pgbench -f <(echo 'select sum(abalance) from pgbench_accounts') -T 30 \
-n -c32 -j32 --startup='set JJ_SNAP=1'

While the test is contrived, it reproduces complaints I've seen on several
forums.

To create the burden of unresolved tuples, I open psql and run:
begin; update pgbench_accounts set abalance =1-abalance;

...and leave it uncommitted for a while.

Representative numbers for test runs of the above custom query on a 8-CPU
machine:

tps = 542 regardless of JJ_SNAP, when no in-progress tuples
tps = 30 JJ_SNAP=0 with uncommitted bulk update
tps = 364 JJ_SNAP=1 with uncommitted bulk update

A side effect of making this change would be that a query which finds a
tuple inserted or deleted by a transaction still in the query's snapshot
never checks to see if that transaction committed, and so it doesn't set
the hint bit if it did commit or abort. Some future query with a newer
snapshot will have to do that. It is at least theoretically possible that
this could mean that many hint bits could fail to get set while the buffer
is still dirty in shared_buffers, which means it needs to get dirtied again
once set. I doubt this would be significant, but if anyone has a test case
which they think could show up a problem in this area, please try it out or
describe it.

There are other places in tqual.c which could probably use similar
re-ordering tricks, but this is the one for which I have a reproducible
test case,

Cheers

Jeff

Attachments:

SatisfiesMVCC_reorder_v001.patchapplication/octet-stream; name=SatisfiesMVCC_reorder_v001.patchDownload
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
new file mode 100644
index b3dac51..cff96f7
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 106,111 ****
--- 106,112 ----
  /* XXX these should appear in other modules' header files */
  extern bool Log_disconnections;
  extern int	CommitDelay;
+ extern int	JJ_SNAP;
  extern int	CommitSiblings;
  extern char *default_tablespace;
  extern char *temp_tablespaces;
*************** static struct config_int ConfigureNamesI
*** 2264,2269 ****
--- 2265,2280 ----
  	},
  
  	{
+ 		{"JJ_SNAP", PGC_USERSET, LOCK_MANAGEMENT,
+ 			gettext_noop("change MVCC order of tests"
+ 						 ""),
+ 			NULL
+ 		},
+ 		&JJ_SNAP,
+ 		0, 0, 100000,
+ 		NULL, NULL, NULL
+ 	},
+ 	{
  		{"commit_delay", PGC_SUSET, WAL_SETTINGS,
  			gettext_noop("Sets the delay in microseconds between transaction commit and "
  						 "flushing WAL to disk."),
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
new file mode 100644
index de7b3fc..3a49dda
*** a/src/backend/utils/time/tqual.c
--- b/src/backend/utils/time/tqual.c
***************
*** 69,74 ****
--- 69,76 ----
  #include "utils/tqual.h"
  
  
+ int JJ_SNAP=0;
+ 
  /* Static variables representing various special snapshot semantics */
  SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
  SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
*************** HeapTupleSatisfiesMVCC(HeapTuple htup, S
*** 945,950 ****
--- 947,954 ----
  					   Buffer buffer)
  {
  	HeapTupleHeader tuple = htup->t_data;
+ 	bool checked_snapshot_xmin=false;
+ 	bool checked_snapshot_xmax=false;
  
  	Assert(ItemPointerIsValid(&htup->t_self));
  	Assert(htup->t_tableOid != InvalidOid);
*************** HeapTupleSatisfiesMVCC(HeapTuple htup, S
*** 1035,1040 ****
--- 1039,1047 ----
  			else
  				return false;	/* deleted before scan started */
  		}
+ 		else if (JJ_SNAP && (checked_snapshot_xmin=true) && !HeapTupleHeaderXminFrozen(tuple)
+ 				&& XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
+ 			return false;
  		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
  			return false;
  		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
*************** HeapTupleSatisfiesMVCC(HeapTuple htup, S
*** 1053,1059 ****
  	 * By here, the inserting transaction has committed - have to check
  	 * when...
  	 */
! 	if (!HeapTupleHeaderXminFrozen(tuple)
  		&& XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
  		return false;			/* treat as still in progress */
  
--- 1060,1066 ----
  	 * By here, the inserting transaction has committed - have to check
  	 * when...
  	 */
! 	if (!checked_snapshot_xmin && !HeapTupleHeaderXminFrozen(tuple)
  		&& XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
  		return false;			/* treat as still in progress */
  
*************** HeapTupleSatisfiesMVCC(HeapTuple htup, S
*** 1105,1110 ****
--- 1112,1120 ----
  				return false;	/* deleted before scan started */
  		}
  
+ 		if (JJ_SNAP && (checked_snapshot_xmax=true) && XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
+ 			return true;
+ 
  		if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
  			return true;
  
*************** HeapTupleSatisfiesMVCC(HeapTuple htup, S
*** 1124,1130 ****
  	/*
  	 * OK, the deleting transaction committed too ... but when?
  	 */
! 	if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
  		return true;			/* treat as still in progress */
  
  	return false;
--- 1134,1140 ----
  	/*
  	 * OK, the deleting transaction committed too ... but when?
  	 */
! 	if (!checked_snapshot_xmax && XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
  		return true;			/* treat as still in progress */
  
  	return false;
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#1)
Re: Make HeapTupleSatisfiesMVCC more concurrent

Jeff Janes <jeff.janes@gmail.com> writes:

When we check a tuple for MVCC, it has to pass checks that the inserting
transaction has committed, and that it committed before our snapshot
began. And similarly that the deleting transaction hasn't committed, or
did so after our snapshot.

XidInMVCCSnapshot is (or can be) very much cheaper
than TransactionIdIsInProgress, because the former touches only local
memory while the latter takes a highly contended lock and inspects shared
memory. We do the slow one first, but we could do the fast one first and
sometimes short-circuit the slow one. If the transaction is in our
snapshot, it doesn't matter if it is still in progress or not.

This was discussed back in 2013 (
/messages/by-id/CAMkU=1yy-YEQVvqj2xJitT1EFkyuFk7uTV_hrOMGyGMxpU=N+Q@mail.gmail.com),
and I wanted to revive it. The recent lwlock atomic changes haven't made
the problem irrelevant.

This patch swaps the order of the checks under some conditions.

Just thinking about this ... I wonder why we need to call
TransactionIdIsInProgress() at all rather than believing the answer from
the snapshot? Under what circumstances could TransactionIdIsInProgress()
return true where XidInMVCCSnapshot() had not?

I'm thinking maybe TransactionIdIsInProgress is only needed for non-MVCC
snapshot types.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
1 attachment(s)
Re: Make HeapTupleSatisfiesMVCC more concurrent

I wrote:

Just thinking about this ... I wonder why we need to call
TransactionIdIsInProgress() at all rather than believing the answer from
the snapshot? Under what circumstances could TransactionIdIsInProgress()
return true where XidInMVCCSnapshot() had not?

I experimented with the attached patch, which replaces
HeapTupleSatisfiesMVCC's calls of TransactionIdIsInProgress with
XidInMVCCSnapshot, and then as a cross-check has all the "return false"
exits from XidInMVCCSnapshot assert !TransactionIdIsInProgress().
The asserts did not fire in the standard regression tests nor in a
pgbench run, which is surely not proof of anything but it suggests
that I'm not totally nuts.

I wouldn't commit the changes in XidInMVCCSnapshot for real, but
otherwise this is a possibly committable patch.

regards, tom lane

Attachments:

believe-the-snapshot-about-whats-in-progress.patchtext/x-diff; charset=us-ascii; name=believe-the-snapshot-about-whats-in-progress.patchDownload
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index de7b3fc..c8f59f7 100644
*** a/src/backend/utils/time/tqual.c
--- b/src/backend/utils/time/tqual.c
***************
*** 10,21 ****
   * the passed-in buffer.  The caller must hold not only a pin, but at least
   * shared buffer content lock on the buffer containing the tuple.
   *
!  * NOTE: must check TransactionIdIsInProgress (which looks in PGXACT array)
   * before TransactionIdDidCommit/TransactionIdDidAbort (which look in
   * pg_clog).  Otherwise we have a race condition: we might decide that a
   * just-committed transaction crashed, because none of the tests succeed.
   * xact.c is careful to record commit/abort in pg_clog before it unsets
!  * MyPgXact->xid in PGXACT array.  That fixes that problem, but it also
   * means there is a window where TransactionIdIsInProgress and
   * TransactionIdDidCommit will both return true.  If we check only
   * TransactionIdDidCommit, we could consider a tuple committed when a
--- 10,22 ----
   * the passed-in buffer.  The caller must hold not only a pin, but at least
   * shared buffer content lock on the buffer containing the tuple.
   *
!  * NOTE: When using a non-MVCC snapshot, we must check
!  * TransactionIdIsInProgress (which looks in the PGXACT array)
   * before TransactionIdDidCommit/TransactionIdDidAbort (which look in
   * pg_clog).  Otherwise we have a race condition: we might decide that a
   * just-committed transaction crashed, because none of the tests succeed.
   * xact.c is careful to record commit/abort in pg_clog before it unsets
!  * MyPgXact->xid in the PGXACT array.  That fixes that problem, but it also
   * means there is a window where TransactionIdIsInProgress and
   * TransactionIdDidCommit will both return true.  If we check only
   * TransactionIdDidCommit, we could consider a tuple committed when a
***************
*** 26,31 ****
--- 27,37 ----
   * subtransactions of our own main transaction and so there can't be any
   * race condition.
   *
+  * When using an MVCC snapshot, we rely on XidInMVCCSnapshot rather than
+  * TransactionIdIsInProgress, but the logic is otherwise the same: do not
+  * check pg_clog until after deciding that the xact is no longer in progress.
+  *
+  *
   * Summary of visibility functions:
   *
   *	 HeapTupleSatisfiesMVCC()
*************** HeapTupleSatisfiesMVCC(HeapTuple htup, S
*** 961,967 ****
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
  				return false;
! 			if (!TransactionIdIsInProgress(xvac))
  			{
  				if (TransactionIdDidCommit(xvac))
  				{
--- 967,973 ----
  
  			if (TransactionIdIsCurrentTransactionId(xvac))
  				return false;
! 			if (!XidInMVCCSnapshot(xvac, snapshot))
  			{
  				if (TransactionIdDidCommit(xvac))
  				{
*************** HeapTupleSatisfiesMVCC(HeapTuple htup, S
*** 980,986 ****
  
  			if (!TransactionIdIsCurrentTransactionId(xvac))
  			{
! 				if (TransactionIdIsInProgress(xvac))
  					return false;
  				if (TransactionIdDidCommit(xvac))
  					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
--- 986,992 ----
  
  			if (!TransactionIdIsCurrentTransactionId(xvac))
  			{
! 				if (XidInMVCCSnapshot(xvac, snapshot))
  					return false;
  				if (TransactionIdDidCommit(xvac))
  					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
*************** HeapTupleSatisfiesMVCC(HeapTuple htup, S
*** 1035,1041 ****
  			else
  				return false;	/* deleted before scan started */
  		}
! 		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
  			return false;
  		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
  			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
--- 1041,1047 ----
  			else
  				return false;	/* deleted before scan started */
  		}
! 		else if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
  			return false;
  		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
  			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
*************** HeapTupleSatisfiesMVCC(HeapTuple htup, S
*** 1048,1061 ****
  			return false;
  		}
  	}
  
! 	/*
! 	 * By here, the inserting transaction has committed - have to check
! 	 * when...
! 	 */
! 	if (!HeapTupleHeaderXminFrozen(tuple)
! 		&& XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
! 		return false;			/* treat as still in progress */
  
  	if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid or aborted */
  		return true;
--- 1054,1068 ----
  			return false;
  		}
  	}
+ 	else
+ 	{
+ 		/* xmin is committed, but maybe not according to our snapshot */
+ 		if (!HeapTupleHeaderXminFrozen(tuple) &&
+ 			XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
+ 			return false;		/* treat as still in progress */
+ 	}
  
! 	/* by here, the inserting transaction has committed */
  
  	if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid or aborted */
  		return true;
*************** HeapTupleSatisfiesMVCC(HeapTuple htup, S
*** 1082,1096 ****
  			else
  				return false;	/* deleted before scan started */
  		}
! 		if (TransactionIdIsInProgress(xmax))
  			return true;
  		if (TransactionIdDidCommit(xmax))
! 		{
! 			/* updating transaction committed, but when? */
! 			if (XidInMVCCSnapshot(xmax, snapshot))
! 				return true;	/* treat as still in progress */
! 			return false;
! 		}
  		/* it must have aborted or crashed */
  		return true;
  	}
--- 1089,1098 ----
  			else
  				return false;	/* deleted before scan started */
  		}
! 		if (XidInMVCCSnapshot(xmax, snapshot))
  			return true;
  		if (TransactionIdDidCommit(xmax))
! 			return false;		/* updating transaction committed */
  		/* it must have aborted or crashed */
  		return true;
  	}
*************** HeapTupleSatisfiesMVCC(HeapTuple htup, S
*** 1105,1111 ****
  				return false;	/* deleted before scan started */
  		}
  
! 		if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
  			return true;
  
  		if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
--- 1107,1113 ----
  				return false;	/* deleted before scan started */
  		}
  
! 		if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
  			return true;
  
  		if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
*************** HeapTupleSatisfiesMVCC(HeapTuple htup, S
*** 1120,1131 ****
  		SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
  					HeapTupleHeaderGetRawXmax(tuple));
  	}
  
! 	/*
! 	 * OK, the deleting transaction committed too ... but when?
! 	 */
! 	if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
! 		return true;			/* treat as still in progress */
  
  	return false;
  }
--- 1122,1135 ----
  		SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
  					HeapTupleHeaderGetRawXmax(tuple));
  	}
+ 	else
+ 	{
+ 		/* xmax is committed, but maybe not according to our snapshot */
+ 		if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
+ 			return true;		/* treat as still in progress */
+ 	}
  
! 	/* xmax transaction committed */
  
  	return false;
  }
*************** XidInMVCCSnapshot(TransactionId xid, Sna
*** 1461,1467 ****
--- 1465,1474 ----
  
  	/* Any xid < xmin is not in-progress */
  	if (TransactionIdPrecedes(xid, snapshot->xmin))
+ 	{
+ 		Assert(!TransactionIdIsInProgress(xid));
  		return false;
+ 	}
  	/* Any xid >= xmax is in-progress */
  	if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
  		return true;
*************** XidInMVCCSnapshot(TransactionId xid, Sna
*** 1503,1509 ****
--- 1510,1519 ----
  			 * xmax.
  			 */
  			if (TransactionIdPrecedes(xid, snapshot->xmin))
+ 			{
+ 				Assert(!TransactionIdIsInProgress(xid));
  				return false;
+ 			}
  		}
  
  		for (i = 0; i < snapshot->xcnt; i++)
*************** XidInMVCCSnapshot(TransactionId xid, Sna
*** 1534,1540 ****
--- 1544,1553 ----
  			 * xmax.
  			 */
  			if (TransactionIdPrecedes(xid, snapshot->xmin))
+ 			{
+ 				Assert(!TransactionIdIsInProgress(xid));
  				return false;
+ 			}
  		}
  
  		/*
*************** XidInMVCCSnapshot(TransactionId xid, Sna
*** 1549,1554 ****
--- 1562,1568 ----
  		}
  	}
  
+ 	Assert(!TransactionIdIsInProgress(xid));
  	return false;
  }
  
#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#3)
Re: Make HeapTupleSatisfiesMVCC more concurrent

On Wed, Aug 19, 2015 at 6:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Just thinking about this ... I wonder why we need to call
TransactionIdIsInProgress() at all rather than believing the answer from
the snapshot? Under what circumstances could

TransactionIdIsInProgress()

return true where XidInMVCCSnapshot() had not?

I experimented with the attached patch, which replaces
HeapTupleSatisfiesMVCC's calls of TransactionIdIsInProgress with
XidInMVCCSnapshot, and then as a cross-check has all the "return false"
exits from XidInMVCCSnapshot assert !TransactionIdIsInProgress().
The asserts did not fire in the standard regression tests nor in a
pgbench run, which is surely not proof of anything but it suggests
that I'm not totally nuts.

I wouldn't commit the changes in XidInMVCCSnapshot for real, but
otherwise this is a possibly committable patch.

I think one case where the patch can impact is for aborted transactions.
In TransactionIdIsInProgress(), we check for aborted transactions before
consulting pg_subtrans whereas with patch it will consult pg_subtrans
without aborted transaction check. Now it could be better to first check
pg_subtrans if we found that the corresponding top transaction is
in-progress as that will save extra pg_clog lookup, but I just mentioned
because that is one difference that I could see with this patch.

Another minor point is, I think we should modify function level comment
for XidInMVCCSnapshot() where it says that this applies to known-
committed XIDs which will no longer be true after this patch.

Apart from above, the patch looks good.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#2)
Re: Make HeapTupleSatisfiesMVCC more concurrent

On 19 August 2015 at 00:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

When we check a tuple for MVCC, it has to pass checks that the inserting
transaction has committed, and that it committed before our snapshot
began. And similarly that the deleting transaction hasn't committed, or
did so after our snapshot.

XidInMVCCSnapshot is (or can be) very much cheaper
than TransactionIdIsInProgress, because the former touches only local
memory while the latter takes a highly contended lock and inspects shared
memory. We do the slow one first, but we could do the fast one first and
sometimes short-circuit the slow one. If the transaction is in our
snapshot, it doesn't matter if it is still in progress or not.

This was discussed back in 2013 (

/messages/by-id/CAMkU=1yy-YEQVvqj2xJitT1EFkyuFk7uTV_hrOMGyGMxpU=N+Q@mail.gmail.com
),

and I wanted to revive it. The recent lwlock atomic changes haven't made
the problem irrelevant.

This patch swaps the order of the checks under some conditions.

Just thinking about this ... I wonder why we need to call
TransactionIdIsInProgress() at all rather than believing the answer from
the snapshot? Under what circumstances could TransactionIdIsInProgress()
return true where XidInMVCCSnapshot() had not?

I'm thinking maybe TransactionIdIsInProgress is only needed for non-MVCC
snapshot types.

+1

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: Make HeapTupleSatisfiesMVCC more concurrent

On 2015-08-18 20:36:13 -0400, Tom Lane wrote:

I wrote:

Just thinking about this ... I wonder why we need to call
TransactionIdIsInProgress() at all rather than believing the answer from
the snapshot? Under what circumstances could TransactionIdIsInProgress()
return true where XidInMVCCSnapshot() had not?

I experimented with the attached patch, which replaces
HeapTupleSatisfiesMVCC's calls of TransactionIdIsInProgress with
XidInMVCCSnapshot, and then as a cross-check has all the "return false"
exits from XidInMVCCSnapshot assert !TransactionIdIsInProgress().

I'm not sure about it, but it might be worthwhile to add a
TransactionIdIsKnownCompleted() check before the more expensive parts of
XidInMVCCSnapshot(). Neither the array search nor, much more so, the
subtrans lookups are free.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#6)
Re: Make HeapTupleSatisfiesMVCC more concurrent

On 19 August 2015 at 15:08, Andres Freund <andres@anarazel.de> wrote:

On 2015-08-18 20:36:13 -0400, Tom Lane wrote:

I wrote:

Just thinking about this ... I wonder why we need to call
TransactionIdIsInProgress() at all rather than believing the answer

from

the snapshot? Under what circumstances could

TransactionIdIsInProgress()

return true where XidInMVCCSnapshot() had not?

I experimented with the attached patch, which replaces
HeapTupleSatisfiesMVCC's calls of TransactionIdIsInProgress with
XidInMVCCSnapshot, and then as a cross-check has all the "return false"
exits from XidInMVCCSnapshot assert !TransactionIdIsInProgress().

I'm not sure about it, but it might be worthwhile to add a
TransactionIdIsKnownCompleted() check before the more expensive parts of
XidInMVCCSnapshot(). Neither the array search nor, much more so, the
subtrans lookups are free.

That's true, but they are of the same order as the ProcArray search, just
without the contention, which is a pretty important thing to avoid. We only
consult subtrans is the snapshot has overflowed and we do that in both
XidInMVCCSnapshot() and in TransactionIdIsInProgress(), so there's no much
difference there.

I thought about adding TransactionIdIsKnownCompleted() also, but the
cachedFetchXid will hardly ever be set correctly. If the xid is in the
snapshot then with this new mechanism we don't ever check transaction
completion. It's possible that we are using multiple snapshots alternately,
with an xid completed in one but not the other, but that seems like a slim
possibility.

Hmm, I notice that XidInMVCCSnapshot() doesn't set cachedFetchXid. Perhaps
we should record the last fetched xid for a snapshot, so we can use the
same technique to record the outcome of repeated lookups in
XidInMVCCSnapshot().

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: Make HeapTupleSatisfiesMVCC more concurrent

Andres Freund <andres@anarazel.de> writes:

On 2015-08-18 20:36:13 -0400, Tom Lane wrote:

I experimented with the attached patch, which replaces
HeapTupleSatisfiesMVCC's calls of TransactionIdIsInProgress with
XidInMVCCSnapshot, and then as a cross-check has all the "return false"
exits from XidInMVCCSnapshot assert !TransactionIdIsInProgress().

I'm not sure about it, but it might be worthwhile to add a
TransactionIdIsKnownCompleted() check before the more expensive parts of
XidInMVCCSnapshot(). Neither the array search nor, much more so, the
subtrans lookups are free.

Hmmm... the comment for TransactionIdIsKnownCompleted says it's to
short-circuit calls of TransactionIdIsInProgress, which we wouldn't be
doing anymore. Maybe it's useful anyway but I'm not convinced.

In any case, the big picture here is that XidInMVCCSnapshot should on
average be looking at just about the same number of xids and subtrans ids
as TransactionIdIsInProgress does --- only the latter is looking at them
in the PGPROC array, so it needs a lock, and iterating over that data
structure is more complex than scanning an array too.

My own thought about reducing the cost of XidInMVCCSnapshot, if that
proves necessary, is that maybe it would be worth the trouble to sort the
arrays so we could use binary search. That would increase the cost of
snapshot acquisition noticeably though.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#8)
Re: Make HeapTupleSatisfiesMVCC more concurrent

On 2015-08-19 10:55:00 -0400, Tom Lane wrote:

My own thought about reducing the cost of XidInMVCCSnapshot, if that
proves necessary, is that maybe it would be worth the trouble to sort
the arrays so we could use binary search. That would increase the
cost of snapshot acquisition noticeably though.

It's also considerably more expensive for search in many cases because
it's very hard to predict (branching and prefetching). I doubt it'd be
worthwhile.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#8)
Re: Make HeapTupleSatisfiesMVCC more concurrent

I wrote:

Andres Freund <andres@anarazel.de> writes:

I'm not sure about it, but it might be worthwhile to add a
TransactionIdIsKnownCompleted() check before the more expensive parts of
XidInMVCCSnapshot(). Neither the array search nor, much more so, the
subtrans lookups are free.

Hmmm... the comment for TransactionIdIsKnownCompleted says it's to
short-circuit calls of TransactionIdIsInProgress, which we wouldn't be
doing anymore. Maybe it's useful anyway but I'm not convinced.

After further thought, the right way to implement the equivalent
optimization would be to add a couple of fields to struct Snapshot that
would cache the xid last checked against that snapshot and the outcome of
that check; this would be independent of TransactionIdIsKnownCompleted.
There would be no need to use that cache for xids below xmin or above
xmax, which would improve its chance of being applicable to in-doubt xids.

Not entirely sure it's worth doing, but if someone wants to do the
legwork, this would be an independent optimization possibility.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#10)
Re: Make HeapTupleSatisfiesMVCC more concurrent

On 19 August 2015 at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Andres Freund <andres@anarazel.de> writes:

I'm not sure about it, but it might be worthwhile to add a
TransactionIdIsKnownCompleted() check before the more expensive parts of
XidInMVCCSnapshot(). Neither the array search nor, much more so, the
subtrans lookups are free.

Hmmm... the comment for TransactionIdIsKnownCompleted says it's to
short-circuit calls of TransactionIdIsInProgress, which we wouldn't be
doing anymore. Maybe it's useful anyway but I'm not convinced.

After further thought, the right way to implement the equivalent
optimization would be to add a couple of fields to struct Snapshot that
would cache the xid last checked against that snapshot and the outcome of
that check; this would be independent of TransactionIdIsKnownCompleted.
There would be no need to use that cache for xids below xmin or above
xmax, which would improve its chance of being applicable to in-doubt xids.

Not entirely sure it's worth doing, but if someone wants to do the
legwork, this would be an independent optimization possibility.

This is what I suggested upthread. It's not a massive gain, but its cheap
and a straightforward patch.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#3)
Re: Make HeapTupleSatisfiesMVCC more concurrent

On Tue, Aug 18, 2015 at 5:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Just thinking about this ... I wonder why we need to call
TransactionIdIsInProgress() at all rather than believing the answer from
the snapshot? Under what circumstances could TransactionIdIsInProgress()
return true where XidInMVCCSnapshot() had not?

I experimented with the attached patch, which replaces
HeapTupleSatisfiesMVCC's calls of TransactionIdIsInProgress with
XidInMVCCSnapshot, and then as a cross-check has all the "return false"
exits from XidInMVCCSnapshot assert !TransactionIdIsInProgress().
The asserts did not fire in the standard regression tests nor in a
pgbench run, which is surely not proof of anything but it suggests
that I'm not totally nuts.

I wouldn't commit the changes in XidInMVCCSnapshot for real, but
otherwise this is a possibly committable patch.

This gives the same performance improvement as the original patch (when
compiled without cassert).

It has survived testing in a hostile environment (rapid fire stream of
contentious updates intermixed with selects, with or without frequent
crashes) without signs of missed or duplicated rows.

So, it looks good to me.

Cheers,

Jeff

#13Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#3)
Re: Make HeapTupleSatisfiesMVCC more concurrent

On Tue, Aug 18, 2015 at 8:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Just thinking about this ... I wonder why we need to call
TransactionIdIsInProgress() at all rather than believing the answer from
the snapshot? Under what circumstances could TransactionIdIsInProgress()
return true where XidInMVCCSnapshot() had not?

I experimented with the attached patch, which replaces
HeapTupleSatisfiesMVCC's calls of TransactionIdIsInProgress with
XidInMVCCSnapshot, and then as a cross-check has all the "return false"
exits from XidInMVCCSnapshot assert !TransactionIdIsInProgress().
The asserts did not fire in the standard regression tests nor in a
pgbench run, which is surely not proof of anything but it suggests
that I'm not totally nuts.

I wouldn't commit the changes in XidInMVCCSnapshot for real, but
otherwise this is a possibly committable patch.

Like Jeff's approach, this will set hint bits less aggressively.
Suppose there are 10 processes concurrently scanning a whole bunch of
dead tuples. Furthermore, suppose those dead tuples were created by a
transaction that began after those processes took their snapshot and
later aborted. Without the patch, the first one should notice that
the tuples are in fact dead and hint them HEAP_XMIN_INVALID. With the
patch, the backends will see those transactions as in-progress rather
than aborted, so they won't be hinted.

Now that probably isn't a reason not to do this. Anything that cuts
down on ProcArrayLock acquisition and cache-line bouncing in shared
memory is likely to be a win even if it burns a few more cycles
elsewhere. And I think the scenario I just mentioned probably doesn't
happen that often, and probably wouldn't cost that much if it did.
But it's worth thinking about, and at the least we should document in
the commit message or the comments that this change has this effect.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#4)
Re: Make HeapTupleSatisfiesMVCC more concurrent

On Wed, Aug 19, 2015 at 2:53 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think one case where the patch can impact is for aborted transactions.
In TransactionIdIsInProgress(), we check for aborted transactions before
consulting pg_subtrans whereas with patch it will consult pg_subtrans
without aborted transaction check. Now it could be better to first check
pg_subtrans if we found that the corresponding top transaction is
in-progress as that will save extra pg_clog lookup, but I just mentioned
because that is one difference that I could see with this patch.

Another minor point is, I think we should modify function level comment
for XidInMVCCSnapshot() where it says that this applies to known-
committed XIDs which will no longer be true after this patch.

But only if the snapshot has overflowed, right? That should affect
only a small minority of cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#13)
Re: Make HeapTupleSatisfiesMVCC more concurrent

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Aug 18, 2015 at 8:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I experimented with the attached patch, which replaces
HeapTupleSatisfiesMVCC's calls of TransactionIdIsInProgress with
XidInMVCCSnapshot, and then as a cross-check has all the "return false"
exits from XidInMVCCSnapshot assert !TransactionIdIsInProgress().

Like Jeff's approach, this will set hint bits less aggressively.

Only very marginally so, though: we will not set a hint bit in the case
where the updating transaction has committed/aborted but it is still shown
as running by our snapshot. As soon as a reader comes along with a
snapshot taken after the updater ended, that transaction will set the hint
bit.

Note that in the case where the updating transaction committed, setting
the hint bit early wouldn't save anything anyway, since we'd still need
to run XidInMVCCSnapshot(). It is true that if the updater aborted,
we could save cycles by hinting earlier; but I believe the general project
policy about such choices is not to optimize for the abort case. Also
note that the cycles wasted will now be in XidInMVCCSnapshot(), not in
ProcArray access, so they won't be as much of a scalability problem.

But it's worth thinking about, and at the least we should document in
the commit message or the comments that this change has this effect.

Agreed, there is a tradeoff being made.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#14)
Re: Make HeapTupleSatisfiesMVCC more concurrent

On Fri, Aug 21, 2015 at 8:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 19, 2015 at 2:53 AM, Amit Kapila <amit.kapila16@gmail.com>

wrote:

I think one case where the patch can impact is for aborted transactions.
In TransactionIdIsInProgress(), we check for aborted transactions before
consulting pg_subtrans whereas with patch it will consult pg_subtrans
without aborted transaction check. Now it could be better to first

check

pg_subtrans if we found that the corresponding top transaction is
in-progress as that will save extra pg_clog lookup, but I just mentioned
because that is one difference that I could see with this patch.

Another minor point is, I think we should modify function level comment
for XidInMVCCSnapshot() where it says that this applies to known-
committed XIDs which will no longer be true after this patch.

But only if the snapshot has overflowed, right?

Yes.

That should affect
only a small minority of cases.

Agreed, but the effect would be non-negligible for such cases. One thing
I am wondering that is there any harm in calling TransactionIdDidAbort()
in slow path before calling SubTransGetTopmostTransaction(), that can
also maintain consistency of checks in both the functions?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#16)
Re: Make HeapTupleSatisfiesMVCC more concurrent

Amit Kapila <amit.kapila16@gmail.com> writes:

On Fri, Aug 21, 2015 at 8:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 19, 2015 at 2:53 AM, Amit Kapila <amit.kapila16@gmail.com>

Another minor point is, I think we should modify function level comment
for XidInMVCCSnapshot() where it says that this applies to known-
committed XIDs which will no longer be true after this patch.

Yeah, that comment would certainly be out of date, and I think it may not
be the only one. I'll check around some more.

I am wondering that is there any harm in calling TransactionIdDidAbort()
in slow path before calling SubTransGetTopmostTransaction(), that can
also maintain consistency of checks in both the functions?

I think this is probably a bad idea. It adds a pg_clog lookup that we
would otherwise not do at all, in hopes of avoiding a pg_subtrans lookup.
It's not exactly clear that that's a win even if we successfully avoid
the subtrans lookup (which we often would not). And even if it does win,
that would only happen if the other transaction has aborted, which isn't
generally the case we prefer to optimize for.

I don't mean to dismiss the potential for further optimization inside
XidInMVCCSnapshot (for instance, the one-XID-cache idea sounds promising);
but I think that's material for further research and a separate patch.

It's not clear to me if anyone wanted to do further review/testing of
this patch, or if I should go ahead and push it (after fixing whatever
comments need to be fixed).

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#17)
Re: Make HeapTupleSatisfiesMVCC more concurrent

Tom Lane wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

I am wondering that is there any harm in calling TransactionIdDidAbort()
in slow path before calling SubTransGetTopmostTransaction(), that can
also maintain consistency of checks in both the functions?

I think this is probably a bad idea. It adds a pg_clog lookup that we
would otherwise not do at all, in hopes of avoiding a pg_subtrans lookup.
It's not exactly clear that that's a win even if we successfully avoid
the subtrans lookup (which we often would not). And even if it does win,
that would only happen if the other transaction has aborted, which isn't
generally the case we prefer to optimize for.

It's probably key to this idea that TransactionIdDidAbort returns in a
single slru lookup, whereas SubTransGetTopmostTransaction needs to
iterate possibly many layers of subxacts. But the point about this
being a win only for aborted xacts makes it probably pointless, I agree.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#17)
Re: Make HeapTupleSatisfiesMVCC more concurrent

On Wed, Aug 26, 2015 at 2:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

I am wondering that is there any harm in calling TransactionIdDidAbort()
in slow path before calling SubTransGetTopmostTransaction(), that can
also maintain consistency of checks in both the functions?

I think this is probably a bad idea. It adds a pg_clog lookup that we
would otherwise not do at all, in hopes of avoiding a pg_subtrans lookup.
It's not exactly clear that that's a win even if we successfully avoid
the subtrans lookup (which we often would not). And even if it does win,
that would only happen if the other transaction has aborted, which isn't
generally the case we prefer to optimize for.

I think Alvaro has mentioned the case where it could win, however it can
still
add some penality where most (or all) transactions are committed. I agree
with
you that we might not want to optimize or spend our energy figuring out if
this
is win for not-a-common use case. OTOH, I feel having this and other point
related to optimisation (one-XID-cache) could be added as part of function
level
comments to help, if some body encounters any such case in future or is
puzzled why there are some differences in TransactionIdIsInProgress() and
XidInMVCCSnapshot().

I don't mean to dismiss the potential for further optimization inside
XidInMVCCSnapshot (for instance, the one-XID-cache idea sounds promising);
but I think that's material for further research and a separate patch.

It's not clear to me if anyone wanted to do further review/testing of
this patch, or if I should go ahead and push it (after fixing whatever
comments need to be fixed).

I think jeff's test results upthread already ensured that this patch is of
value and fair enough number of people have already looked into it and
provided there feedback, so +1 for pushing it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#20Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#17)
1 attachment(s)
Re: Make HeapTupleSatisfiesMVCC more concurrent

On 25 August 2015 at 21:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't mean to dismiss the potential for further optimization inside
XidInMVCCSnapshot (for instance, the one-XID-cache idea sounds promising);
but I think that's material for further research and a separate patch.

Patch attached. Doesn't seem worth a separate thread.

It's not clear to me if anyone wanted to do further review/testing of
this patch, or if I should go ahead and push it (after fixing whatever
comments need to be fixed).

Push, please.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

xidinmvccsnapshot_cache.v1.patchapplication/octet-stream; name=xidinmvccsnapshot_cache.v1.patchDownload
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index cced823..f54b01b 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1724,6 +1724,8 @@ GetSnapshotData(Snapshot snapshot)
 
 	snapshot->curcid = GetCurrentCommandId(false);
 
+	snapshot->cachedSearchXid = InvalidTransactionId;
+
 	/*
 	 * This is a new snapshot, so set both refcounts are zero, and mark it as
 	 * not copied in persistent memory.
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index de7b3fc..8f78732 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -1467,6 +1467,12 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 		return true;
 
 	/*
+	 * Fastpath out if this snapshot has checked this xid recently.
+	 */
+	if (TransactionIdEquals(xid, snapshot->cachedSearchXid))
+		return true;
+
+	/*
 	 * Snapshot information is stored slightly differently in snapshots taken
 	 * during recovery.
 	 */
@@ -1487,7 +1493,10 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 			for (j = 0; j < snapshot->subxcnt; j++)
 			{
 				if (TransactionIdEquals(xid, snapshot->subxip[j]))
+				{
+					snapshot->cachedSearchXid = xid;
 					return true;
+				}
 			}
 
 			/* not there, fall through to search xip[] */
@@ -1509,7 +1518,10 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 		for (i = 0; i < snapshot->xcnt; i++)
 		{
 			if (TransactionIdEquals(xid, snapshot->xip[i]))
+			{
+				snapshot->cachedSearchXid = xid;
 				return true;
+			}
 		}
 	}
 	else
@@ -1545,7 +1557,10 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 		for (j = 0; j < snapshot->subxcnt; j++)
 		{
 			if (TransactionIdEquals(xid, snapshot->subxip[j]))
+			{
+				snapshot->cachedSearchXid = xid;
 				return true;
+			}
 		}
 	}
 
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index cbf1bbd..e14128a 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -94,6 +94,13 @@ typedef struct SnapshotData
 	CommandId	curcid;			/* in my xact, CID < curcid are visible */
 
 	/*
+	 * Cache information, for use in speeding up repeated calls to check
+	 * XidInMVCCSnapshot(). All snapshots start with InvalidTransactionid.
+	 * Name and design similar to cachedFetchXid within transam.c
+	 */
+	TransactionId cachedSearchXid;
+
+	/*
 	 * An extra return value for HeapTupleSatisfiesDirty, not used in MVCC
 	 * snapshots.
 	 */
#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#20)
Re: Make HeapTupleSatisfiesMVCC more concurrent

Simon Riggs <simon@2ndQuadrant.com> writes:

On 25 August 2015 at 21:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't mean to dismiss the potential for further optimization inside
XidInMVCCSnapshot (for instance, the one-XID-cache idea sounds promising);
but I think that's material for further research and a separate patch.

Patch attached. Doesn't seem worth a separate thread.

This doesn't seem right to me: it only caches the case where the XID is
found to be present in the snapshot, whereas I think probably we should
remember the result in both cases (present or not).

Moreover, in the subxip-overflowed case where we replace an
originally-inquired-of subxact xid with its parent, what you're caching is
the parent xid, which will fail to match subsequent queries. Shouldn't we
cache the originally inquired-of xid?

The question of whether to cache "false" results perhaps requires some
performance testing.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers