recent deadlock regression test failures

Started by Andres Freundalmost 9 years ago26 messages
#1Andres Freund
andres@anarazel.de

Hi,

There's two machines that recently report changes in deadlock detector
output:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2017-04-05%2018%3A58%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=friarbird&dt=2017-04-07%2004%3A20%3A01

both just failed twice in a row:
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=hyrax&br=HEAD
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=friarbird&br=HEAD
without previous errors of the same ilk.

I don't think any recent changes are supposed to affect deadlock
detector behaviour?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Andres Freund (#1)
Re: recent deadlock regression test failures

On 04/07/2017 12:57 PM, Andres Freund wrote:

Hi,

There's two machines that recently report changes in deadlock detector
output:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2017-04-05%2018%3A58%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=friarbird&dt=2017-04-07%2004%3A20%3A01

both just failed twice in a row:
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=hyrax&br=HEAD
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=friarbird&br=HEAD
without previous errors of the same ilk.

I don't think any recent changes are supposed to affect deadlock
detector behaviour?

Both these machines have CLOBBER_CACHE_ALWAYS set. And on both machines
recent changes have made the isolation tests run much much longer.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#2)
Re: recent deadlock regression test failures

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 04/07/2017 12:57 PM, Andres Freund wrote:

I don't think any recent changes are supposed to affect deadlock
detector behaviour?

Both these machines have CLOBBER_CACHE_ALWAYS set. And on both machines
recent changes have made the isolation tests run much much longer.

Ouch. I see friarbird's run time for the isolation tests has gone from an
hour and change to over 5 hours in one fell swoop. hyrax not much better.
Oddly, non-CCA animals don't seem to have changed much.

Eyeing recent patches, it seems like the culprit must be Kevin's
addition to isolationtester's wait query:

@@ -231,6 +231,14 @@ main(int argc, char **argv)
appendPQExpBuffer(&wait_query, ",%s", backend_pids[i]);
appendPQExpBufferStr(&wait_query, "}'::integer[]");

+   /* Also detect certain wait events. */
+   appendPQExpBufferStr(&wait_query,
+                        " OR EXISTS ("
+                        "  SELECT * "
+                        "  FROM pg_catalog.pg_stat_activity "
+                        "  WHERE pid = $1 "
+                        "  AND wait_event IN ('SafeSnapshot'))");
+
    res = PQprepare(conns[0], PREP_WAITING, wait_query.data, 0, NULL);
    if (PQresultStatus(res) != PGRES_COMMAND_OK)
    {

This seems a tad ill-considered. We sweated a lot of blood not so long
ago to get the runtime of that query down, and this seems to have blown
it up again. And done so for every isolation test case, not only the
ones where it could possibly matter.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Kevin Grittner
kgrittn@gmail.com
In reply to: Tom Lane (#3)
Re: recent deadlock regression test failures

On Fri, Apr 7, 2017 at 12:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 04/07/2017 12:57 PM, Andres Freund wrote:

I don't think any recent changes are supposed to affect deadlock
detector behaviour?

Both these machines have CLOBBER_CACHE_ALWAYS set. And on both machines
recent changes have made the isolation tests run much much longer.

Ouch. I see friarbird's run time for the isolation tests has gone from an
hour and change to over 5 hours in one fell swoop. hyrax not much better.
Oddly, non-CCA animals don't seem to have changed much.

Eyeing recent patches, it seems like the culprit must be Kevin's
addition to isolationtester's wait query:

Ouch. Without this we don't have regression test coverage for the
SERIALIZABLE READ ONLY DEFERRABLE code, but it's probably not worth
adding 4 hours to any tests, even if it only shows up with
CLOBBER_CACHE_ONLY. I assume the consensus is that I should revert
it?

--
Kevin Grittner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@anarazel.de
In reply to: Kevin Grittner (#4)
Re: recent deadlock regression test failures

On 2017-04-07 12:49:22 -0500, Kevin Grittner wrote:

On Fri, Apr 7, 2017 at 12:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 04/07/2017 12:57 PM, Andres Freund wrote:

I don't think any recent changes are supposed to affect deadlock
detector behaviour?

Both these machines have CLOBBER_CACHE_ALWAYS set. And on both machines
recent changes have made the isolation tests run much much longer.

Ouch. I see friarbird's run time for the isolation tests has gone from an
hour and change to over 5 hours in one fell swoop. hyrax not much better.
Oddly, non-CCA animals don't seem to have changed much.

Eyeing recent patches, it seems like the culprit must be Kevin's
addition to isolationtester's wait query:

Ouch. Without this we don't have regression test coverage for the
SERIALIZABLE READ ONLY DEFERRABLE code, but it's probably not worth
adding 4 hours to any tests, even if it only shows up with
CLOBBER_CACHE_ONLY. I assume the consensus is that I should revert
it?

I'd rather fix the issue, than remove the tests entirely. Seems quite
possible to handle blocking on Safesnapshot in a similar manner as pg_blocking_pids?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Kevin Grittner
kgrittn@gmail.com
In reply to: Andres Freund (#5)
Re: recent deadlock regression test failures

On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund <andres@anarazel.de> wrote:

I'd rather fix the issue, than remove the tests entirely. Seems quite
possible to handle blocking on Safesnapshot in a similar manner as pg_blocking_pids?

I'll see what I can figure out.

--
Kevin Grittner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Kevin Grittner (#6)
Re: recent deadlock regression test failures

On Sat, Apr 8, 2017 at 6:35 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund <andres@anarazel.de> wrote:

I'd rather fix the issue, than remove the tests entirely. Seems quite
possible to handle blocking on Safesnapshot in a similar manner as pg_blocking_pids?

I'll see what I can figure out.

Ouch. These are the other ways I thought of to achieve this:

/messages/by-id/CAEepm=1MR4Ug9YsLtOS4Q9KAU9aku0pZS4RhBN=0LY3pJ49Ksg@mail.gmail.com

I'd be happy to write one of those, but it may take a day as I have
some other commitments.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Kevin Grittner
kgrittn@gmail.com
In reply to: Thomas Munro (#7)
Re: recent deadlock regression test failures

On Fri, Apr 7, 2017 at 3:47 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sat, Apr 8, 2017 at 6:35 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund <andres@anarazel.de> wrote:

I'd rather fix the issue, than remove the tests entirely. Seems quite
possible to handle blocking on Safesnapshot in a similar manner as pg_blocking_pids?

I'll see what I can figure out.

Ouch. These are the other ways I thought of to achieve this:

/messages/by-id/CAEepm=1MR4Ug9YsLtOS4Q9KAU9aku0pZS4RhBN=0LY3pJ49Ksg@mail.gmail.com

I'd be happy to write one of those, but it may take a day as I have
some other commitments.

Please give it a go. I'm dealing with putting out fires with
customers while trying to make sure I have tested the predicate
locking GUCs patch sufficiently. (I think it's ready to go, and has
passed all tests so far, but there are a few more I want to run.)
I'm not sure I can come up with a solution faster than you, given
that. Since it is an improvement to performance for a test-only
environment on a feature that is already committed and not causing
problems for production environments, hopefully people will tolerate
a fix a day or two out. If not, we'll have to revert and get it
into the first CF for v11.

--
Kevin Grittner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Kevin Grittner (#8)
1 attachment(s)
Re: recent deadlock regression test failures

On Sat, Apr 8, 2017 at 9:47 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Fri, Apr 7, 2017 at 3:47 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sat, Apr 8, 2017 at 6:35 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund <andres@anarazel.de> wrote:

I'd rather fix the issue, than remove the tests entirely. Seems quite
possible to handle blocking on Safesnapshot in a similar manner as pg_blocking_pids?

I'll see what I can figure out.

Ouch. These are the other ways I thought of to achieve this:

/messages/by-id/CAEepm=1MR4Ug9YsLtOS4Q9KAU9aku0pZS4RhBN=0LY3pJ49Ksg@mail.gmail.com

I'd be happy to write one of those, but it may take a day as I have
some other commitments.

Please give it a go. I'm dealing with putting out fires with
customers while trying to make sure I have tested the predicate
locking GUCs patch sufficiently. (I think it's ready to go, and has
passed all tests so far, but there are a few more I want to run.)
I'm not sure I can come up with a solution faster than you, given
that. Since it is an improvement to performance for a test-only
environment on a feature that is already committed and not causing
problems for production environments, hopefully people will tolerate
a fix a day or two out. If not, we'll have to revert and get it
into the first CF for v11.

Here is an attempt at option 2 from the menu I posted above. Questions:

1. Does anyone object to this extension of pg_blocking_pids()'s
remit? If so, I could make it a separate function (that was option
3).

2. Did I understand correctly that it is safe to scan the list of
SERIALIZABLEXACTs and access the possibleUnsafeConflicts list while
holding only SerializableXactHashLock, and that 'inLink' is the
correct link to be following?

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

safe-snapshot-blocker-introspection.patchapplication/octet-stream; name=safe-snapshot-blocker-introspection.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cb0a36a170f..e1e3d32744c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15952,10 +15952,11 @@ SET search_path TO <replaceable>schema</> <optional>, <replaceable>schema</>, ..
     <function>pg_blocking_pids</function> returns an array of the process IDs
     of the sessions that are blocking the server process with the specified
     process ID, or an empty array if there is no such server process or it is
-    not blocked.  One server process blocks another if it either holds a lock
-    that conflicts with the blocked process's lock request (hard block), or is
+    not blocked.  One server process blocks another if it holds a lock that
+    conflicts with the blocked process's lock request (hard block), is
     waiting for a lock that would conflict with the blocked process's lock
-    request and is ahead of it in the wait queue (soft block).  When using
+    request and is ahead of it in the wait queue (soft block), or potentially
+    conflicts with a SERIALIZABLE DEFERRABLE transaction.  When using
     parallel queries the result always lists client-visible process IDs (that
     is, <function>pg_backend_pid</> results) even if the actual lock is held
     or awaited by a child worker process.  As a result of that, there may be
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 7aa719d6123..3283ed68c81 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -1549,6 +1549,53 @@ GetSafeSnapshot(Snapshot origSnapshot)
 }
 
 /*
+ * If the given process is currently blocked in GetSafeSnapshot, return the
+ * list of process IDs that it is waiting for.  This is intended for testing
+ * and is used by pg_blocking_pids().
+ */
+List *
+GetSafeSnapshotBlockingPids(int blocked_pid)
+{
+	SERIALIZABLEXACT *sxact;
+	List	   *blocking_pids = NIL;
+
+	LWLockAcquire(SerializableXactHashLock, LW_SHARED);
+
+	/* Find blocked_pid's SERIALIZABLEXACT by linear search. */
+	for (sxact = FirstPredXact(); sxact != NULL; sxact = NextPredXact(sxact))
+	{
+		if (sxact->pid == blocked_pid)
+			break;
+	}
+
+	/* Did we find it, and is it currently waiting in GetSafeSnapshot? */
+	if (sxact != NULL && (sxact->flags & SXACT_FLAG_DEFERRABLE_WAITING) != 0)
+	{
+		RWConflict possibleUnsafeConflict;
+
+		/* Traverse the list of possible unsafe conflicts collecting PIDs. */
+		possibleUnsafeConflict = (RWConflict)
+			SHMQueueNext(&sxact->possibleUnsafeConflicts,
+						 &sxact->possibleUnsafeConflicts,
+						 offsetof(RWConflictData, inLink));
+
+		while (possibleUnsafeConflict != NULL)
+		{
+			blocking_pids = lappend_int(blocking_pids,
+										possibleUnsafeConflict->sxactIn->pid);
+			possibleUnsafeConflict = (RWConflict)
+				SHMQueueNext(&sxact->possibleUnsafeConflicts,
+							 &possibleUnsafeConflict->inLink,
+							 offsetof(RWConflictData, inLink));
+		}
+	}
+
+	LWLockRelease(SerializableXactHashLock);
+
+	return blocking_pids;
+}
+
+/*
  * Acquire a snapshot that can be used for the current transaction.
  *
  * Make sure we have a SERIALIZABLEXACT reference in MySerializableXact.
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index 63f956e6708..6593b27a00c 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "nodes/pg_list.h"
 #include "storage/predicate_internals.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
@@ -412,7 +413,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
  * because multiple blockers have same group leader PID).  We do not bother
  * to eliminate such duplicates from the result.
  *
- * We need not consider predicate locks here, since those don't block anything.
+ * We need not consider predicate locks here, since those don't block anything,
+ * but we do consider backends blocked in GetSafeSnapshot.
  */
 Datum
 pg_blocking_pids(PG_FUNCTION_ARGS)
@@ -421,16 +423,29 @@ pg_blocking_pids(PG_FUNCTION_ARGS)
 	Datum	   *arrayelems;
 	int			narrayelems;
 	BlockedProcsData *lockData; /* state data from lmgr */
+	List	   *snapshotBlockers;
+	ListCell   *lc;
 	int			i,
 				j;
 
 	/* Collect a snapshot of lock manager state */
 	lockData = GetBlockerStatusData(blocked_pid);
 
-	/* We can't need more output entries than there are reported PROCLOCKs */
-	arrayelems = (Datum *) palloc(lockData->nlocks * sizeof(Datum));
+	/* Collect a snapshot of processes waited for by GetSafeSnapshot */
+	snapshotBlockers = GetSafeSnapshotBlockingPids(blocked_pid);
+
+	/*
+	 * We can't need more output entries than there are reported PROCLOCKs,
+	 * plus any snapshot blockers.
+	 */
+	arrayelems = (Datum *)
+		palloc((lockData->nlocks + list_length(snapshotBlockers)) * sizeof(Datum));
 	narrayelems = 0;
 
+	/* Add any blockers of GetSafeSnapshot */
+	foreach(lc, snapshotBlockers)
+		arrayelems[narrayelems++] = lfirst_int(lc);
+
 	/* For each blocked proc in the lock group ... */
 	for (i = 0; i < lockData->nprocs; i++)
 	{
@@ -508,7 +523,7 @@ pg_blocking_pids(PG_FUNCTION_ARGS)
 	}
 
 	/* Assert we didn't overrun arrayelems[] */
-	Assert(narrayelems <= lockData->nlocks);
+	Assert(narrayelems <= (lockData->nlocks + list_length(snapshotBlockers)));
 
 	/* Construct array, using hardwired knowledge about int4 type */
 	PG_RETURN_ARRAYTYPE_P(construct_array(arrayelems, narrayelems,
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 408d94cc7a5..e9527aedc0b 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -474,5 +474,6 @@ typedef struct TwoPhasePredicateRecord
  * locking internals.
  */
 extern PredicateLockData *GetPredicateLockStatusData(void);
+extern List *GetSafeSnapshotBlockingPids(int blocked_pid);
 
 #endif   /* PREDICATE_INTERNALS_H */
diff --git a/src/test/isolation/isolationtester.c b/src/test/isolation/isolationtester.c
index 4d18710bdfd..f77f4657512 100644
--- a/src/test/isolation/isolationtester.c
+++ b/src/test/isolation/isolationtester.c
@@ -231,14 +231,6 @@ main(int argc, char **argv)
 		appendPQExpBuffer(&wait_query, ",%s", backend_pids[i]);
 	appendPQExpBufferStr(&wait_query, "}'::integer[]");
 
-	/* Also detect certain wait events. */
-	appendPQExpBufferStr(&wait_query,
-						 " OR EXISTS ("
-						 "  SELECT * "
-						 "  FROM pg_catalog.pg_stat_activity "
-						 "  WHERE pid = $1 "
-						 "  AND wait_event IN ('SafeSnapshot'))");
-
 	res = PQprepare(conns[0], PREP_WAITING, wait_query.data, 0, NULL);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
#10Kevin Grittner
kgrittn@gmail.com
In reply to: Thomas Munro (#9)
Re: recent deadlock regression test failures

On Fri, Apr 7, 2017 at 9:24 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

2. Did I understand correctly that it is safe to scan the list of
SERIALIZABLEXACTs and access the possibleUnsafeConflicts list while
holding only SerializableXactHashLock,

Yes.

and that 'inLink' is the correct link to be following?

If you're starting from the blocked (read-only) transaction (which
you are), inLink is the one to follow.

Note: It would be better form to use the SxactIsDeferrableWaiting()
macro than repeat the bit-testing code directly in your function.

--
Kevin Grittner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#9)
Re: recent deadlock regression test failures

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Here is an attempt at option 2 from the menu I posted above. Questions:

1. Does anyone object to this extension of pg_blocking_pids()'s
remit? If so, I could make it a separate function (that was option
3).

It seems an entirely principle-free change in the function's definition.

I'm not actually clear on why Kevin wanted this change in
isolationtester's wait behavior anyway, so maybe some clarification
on that would be a good idea. But if we need it, I think probably
a dedicated function would be a good thing. We want the wait-checking
query to be as trivial as possible at the SQL level, so whatever
semantic oddities it needs to have should be pushed into C code.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Tom Lane (#11)
1 attachment(s)
Re: recent deadlock regression test failures

On Sat, Apr 8, 2017 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Here is an attempt at option 2 from the menu I posted above. Questions:

1. Does anyone object to this extension of pg_blocking_pids()'s
remit? If so, I could make it a separate function (that was option
3).

It seems an entirely principle-free change in the function's definition.

Well... other backends can block a SERIALIZABLE DEFERRABLE
transaction, so it doesn't seem that unreasonable to expect that a
function named pg_blocking_pids(blocked_pid) described as "get array
of PIDs of sessions blocking specified backend PID" should be able to
tell you who they are.

You might say that pg_blocking_pid() is about locking only and not
arbitrary other kinds of waits, but safe snapshots are not completely
unrelated to locking if you tilt your head at the right angle:
GetSafeSnapshot waits for all transactions that might hold SIRead
locks that could affect this transaction's serializability to
complete.

But I can see that it's an odd case. Minimal separate function
version attached.

I'm not actually clear on why Kevin wanted this change in
isolationtester's wait behavior anyway, so maybe some clarification
on that would be a good idea.

I can't speak for Kevin but here's my argument as patch author: One
of the purposes of the isolation tester is to test our transaction
isolation. SERIALIZABLE DEFERRABLE is a special case of one of our
levels and should be tested. Statement s3r in the new spec
read-only-anomaly-3.spec runs at that level and causes the backend to
wait for another backend. Without any change to isolationtester it
would hang on that statement until timeout failure. Therefore I
proposed that isolationtester should recognise this kind of waiting
and proceed to later steps that can unblock it, like so:

step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y')
ORDER BY id; <waiting ...>
step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
step s2c: COMMIT;
step s3r: <... completed>

But if we need it, I think probably
a dedicated function would be a good thing. We want the wait-checking
query to be as trivial as possible at the SQL level, so whatever
semantic oddities it needs to have should be pushed into C code.

Based on the above, here is a version that introduces a simple boolean
function pg_waiting_for_safe_snapshot(pid) and adds that the the
query. This was my "option 1" upthread.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

boolean-function.patchapplication/octet-stream; name=boolean-function.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cb0a36a170f..5d1c77e5365 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15801,6 +15801,12 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
       </row>
 
       <row>
+       <entry><literal><function>pg_waiting_for_safe_snapshot(<type>int</type>)</function></literal></entry>
+       <entry><type>boolean</type></entry>
+       <entry>Whether the specified server process ID is waiting for a safe snapshot</entry>
+      </row>
+
+      <row>
        <entry><literal><function>session_user</function></literal></entry>
        <entry><type>name</type></entry>
        <entry>session user name</entry>
@@ -16068,6 +16074,19 @@ SET search_path TO <replaceable>schema</> <optional>, <replaceable>schema</>, ..
    </para>
 
    <indexterm>
+    <primary>pg_waiting_for_safe_snapshot</primary>
+   </indexterm>
+
+   <para>
+    <function>pg_waiting_for_safe_snapshot</function> returns true only if the
+    specified process ID is currently waiting for a safe snapshot.  This
+    applies to <literal>SERIALIZABLE READ ONLY DEFERRABLE</literal>
+    transactions blocked by concurrent <literal>SERIALIZABLE</literal>
+    transactions.  See <xref linkend="xact-serializable"> for more
+    information.
+   </para>
+
+   <indexterm>
     <primary>version</primary>
    </indexterm>
 
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 10bac71e94b..89f4af1f793 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -1556,6 +1556,32 @@ GetSafeSnapshot(Snapshot origSnapshot)
 }
 
 /*
+ * Check if the given process is currently waiting in GetSafeSnapshot.
+ */
+bool
+GetSafeSnapshotIsWaiting(int blocked_pid)
+{
+	SERIALIZABLEXACT *sxact;
+	bool		result;
+
+	LWLockAcquire(SerializableXactHashLock, LW_SHARED);
+
+	/* Find blocked_pid's SERIALIZABLEXACT by linear search. */
+	for (sxact = FirstPredXact(); sxact != NULL; sxact = NextPredXact(sxact))
+	{
+		if (sxact->pid == blocked_pid)
+			break;
+	}
+
+	/* Did we find it, and is it currently waiting in GetSafeSnapshot? */
+	result = (sxact != NULL && SxactIsDeferrableWaiting(sxact));
+
+	LWLockRelease(SerializableXactHashLock);
+
+	return result;
+}
+
+/*
  * Acquire a snapshot that can be used for the current transaction.
  *
  * Make sure we have a SERIALIZABLEXACT reference in MySerializableXact.
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index 63f956e6708..035ce3f06f2 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -516,6 +516,16 @@ pg_blocking_pids(PG_FUNCTION_ARGS)
 										  sizeof(int32), true, 'i'));
 }
 
+/*
+ * pg_waiting_for_safe_snapshot - is a given pid waiting in GetSafeSnapshot?
+ */
+Datum
+pg_waiting_for_safe_snapshot(PG_FUNCTION_ARGS)
+{
+	int			blocked_pid = PG_GETARG_INT32(0);
+
+	PG_RETURN_BOOL(GetSafeSnapshotIsWaiting(blocked_pid));
+}
 
 /*
  * Functions for manipulating advisory locks
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 643838bb054..fdf34914a2c 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3140,6 +3140,8 @@ DATA(insert OID = 1371 (  pg_lock_status   PGNSP PGUID 12 1 1000 0 0 f f f f t t
 DESCR("view system lock information");
 DATA(insert OID = 2561 (  pg_blocking_pids PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 1007 "23" _null_ _null_ _null_ _null_ _null_ pg_blocking_pids _null_ _null_ _null_ ));
 DESCR("get array of PIDs of sessions blocking specified backend PID");
+DATA(insert OID = 3376 (  pg_waiting_for_safe_snapshot PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 16 "23" _null_ _null_ _null_ _null_ _null_ pg_waiting_for_safe_snapshot _null_ _null_ _null_ ));
+DESCR("get array of PIDs of sessions blocking specified backend PID while waiting for a safe snapshot");
 DATA(insert OID = 1065 (  pg_prepared_xact PGNSP PGUID 12 1 1000 0 0 f f f f t t v s 0 0 2249 "" "{28,25,1184,26,26}" "{o,o,o,o,o}" "{transaction,gid,prepared,ownerid,dbid}" _null_ _null_ pg_prepared_xact _null_ _null_ _null_ ));
 DESCR("view two-phase transactions");
 DATA(insert OID = 3819 (  pg_get_multixact_members PGNSP PGUID 12 1 1000 0 0 f f f f t t v s 1 0 2249 "28" "{28,28,25}" "{i,o,o}" "{multixid,xid,mode}" _null_ _null_ pg_get_multixact_members _null_ _null_ _null_ ));
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 408d94cc7a5..726010eafc4 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -474,5 +474,6 @@ typedef struct TwoPhasePredicateRecord
  * locking internals.
  */
 extern PredicateLockData *GetPredicateLockStatusData(void);
+extern bool GetSafeSnapshotIsWaiting(int blocked_pid);
 
 #endif   /* PREDICATE_INTERNALS_H */
diff --git a/src/test/isolation/isolationtester.c b/src/test/isolation/isolationtester.c
index 4d18710bdfd..2b18899039a 100644
--- a/src/test/isolation/isolationtester.c
+++ b/src/test/isolation/isolationtester.c
@@ -231,13 +231,12 @@ main(int argc, char **argv)
 		appendPQExpBuffer(&wait_query, ",%s", backend_pids[i]);
 	appendPQExpBufferStr(&wait_query, "}'::integer[]");
 
-	/* Also detect certain wait events. */
+	/*
+	 * Also detect waits caused by SERIALIZABLE DEFERRABLE transactions
+	 * waiting for all possible conflicting transactions to finish.
+	 */
 	appendPQExpBufferStr(&wait_query,
-						 " OR EXISTS ("
-						 "  SELECT * "
-						 "  FROM pg_catalog.pg_stat_activity "
-						 "  WHERE pid = $1 "
-						 "  AND wait_event IN ('SafeSnapshot'))");
+						 " OR pg_catalog.pg_waiting_for_safe_snapshot($1)");
 
 	res = PQprepare(conns[0], PREP_WAITING, wait_query.data, 0, NULL);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#12)
Re: recent deadlock regression test failures

Thomas Munro <thomas.munro@enterprisedb.com> writes:

On Sat, Apr 8, 2017 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems an entirely principle-free change in the function's definition.

You might say that pg_blocking_pid() is about locking only and not
arbitrary other kinds of waits, but safe snapshots are not completely
unrelated to locking if you tilt your head at the right angle:
GetSafeSnapshot waits for all transactions that might hold SIRead
locks that could affect this transaction's serializability to
complete.

Well, the problem I'm having is that once we say that pg_blocking_pids()
is not just about heavyweight locks, it's not clear where we stop.
There might be many other cases where, if you dig around in the right data
structures, it'd be possible to say that process X is waiting for process
Y to do something. How many of those will we expect pg_blocking_pids()
to cover? And there certainly will be cases where X is waiting for Y but
we don't have any effective way of identifying that. Is that a bug in
pg_blocking_pids()?

Admittedly, this is a somewhat academic objection; but I dislike taking a
function with a fairly clear, published spec and turning it into something
that does whatever's expedient for a single use-case.

Based on the above, here is a version that introduces a simple boolean
function pg_waiting_for_safe_snapshot(pid) and adds that the the
query. This was my "option 1" upthread.

Nah, this is not good either. Yes, it's a fairly precise conversion
of what Kevin's patch did, but I think that patch is wrong even
without considering the performance angle. If you look back at the
discussions in Feb 2016 that led to what we had, it turned out to be
important not just to be able to say that process X is blocked, but
that it is blocked by one of the other isolationtest sessions, and
not by some random third party (like autovacuum). I do not know
whether there is, right now, any way for autovacuum to be the guilty
party for a SafeSnapshot wait --- but it does not seem like a good
plan to assume that there never will be.

So I think what we need is functionality very much like what you had
in the prior patch, ie identify the specific PIDs that are causing
process X to wait for a safe snapshot. I'm just not happy with how
you packaged it.

Here's a sketch of what I think we ought to do:

1. Leave pg_blocking_pids() alone; it's a published API now.

2. Add GetSafeSnapshotBlockingPids() more or less as you had it
in the previous patch (+ Kevin's recommendations). There might be
value in providing a SQL-level way to call that, but I'm not sure,
and it would be independent of fixing isolationtester anyway.

3. Invent a SQL function that is dedicated specifically to supporting
isolationtester and need not be documented at all; this gets us out
of the problem of whether it's okay to whack its semantics around
anytime somebody thinks of something else they want to test.
I'm imagining an API like

isolation_test_is_waiting_for(int, int[]) returns bool

so that isolationtester's query would reduce to something like

SELECT pg_catalog.isolation_test_is_waiting_for($1, '{...}')

which would take even more cycles out of the parse/plan overhead for it
(which is basically what's killing the CCA animals). Internally, this
function would call pg_blocking_pids and, if necessary,
GetSafeSnapshotBlockingPids, and would check for matches in its array
argument directly; it could probably do that significantly faster than the
general-purpose array && code. So we'd have to expend a bit of backend C
code, but we'd have something that would be quite speedy and we could
customize in future without fear of breaking behavior that users are
depending on.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Kevin Grittner
kgrittn@gmail.com
In reply to: Tom Lane (#13)
Re: recent deadlock regression test failures

On Sat, Apr 8, 2017 at 12:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

On Sat, Apr 8, 2017 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Based on the above, here is a version that introduces a simple boolean
function pg_waiting_for_safe_snapshot(pid) and adds that the the
query. This was my "option 1" upthread.

Nah, this is not good either. Yes, it's a fairly precise conversion
of what Kevin's patch did, but I think that patch is wrong even
without considering the performance angle. If you look back at the
discussions in Feb 2016 that led to what we had, it turned out to be
important not just to be able to say that process X is blocked, but
that it is blocked by one of the other isolationtest sessions, and
not by some random third party (like autovacuum). I do not know
whether there is, right now, any way for autovacuum to be the guilty
party for a SafeSnapshot wait --- but it does not seem like a good
plan to assume that there never will be.

It would make no sense to run autovacuum at the serializable
transaction isolation level, and only overlapping read-write
serializable transactions can block the attempt to acquire a safe
snapshot.

So I think what we need is functionality very much like what you had
in the prior patch, ie identify the specific PIDs that are causing
process X to wait for a safe snapshot. I'm just not happy with how
you packaged it.

Here's a sketch of what I think we ought to do:

1. Leave pg_blocking_pids() alone; it's a published API now.

Fair enough.

2. Add GetSafeSnapshotBlockingPids() more or less as you had it
in the previous patch (+ Kevin's recommendations). There might be
value in providing a SQL-level way to call that, but I'm not sure,
and it would be independent of fixing isolationtester anyway.

It seems entirely plausible that someone might want to know what is
holding up the start of a backup or large report which uses the READ
ONLY DEFERRABLE option, so I think there is a real use case for a
documented SQL function similar to pg_blocking_pids() to show the
pids of connections currently running transactions which are holding
things up. Of course, they may not initially know whether it is
being blocked by heavyweight locks or concurrent serializable
read-write transactions, but it should not be a big deal to run two
separate functions.

Given the inability to run isolation tests to cover the deferrable
code, we used a variation on DBT-2 that could cause serialization
anomalies to generate a high concurrency saturation run using
serializable transactions, and started a SERIALIZABLE READ ONLY
DEFERRABLE transaction 1200 times competing with this load, timing
how long it took to start. To quote the VLDB paper[1]Dan R. K. Ports and Kevin Grittner. 2012. Serializable Snapshot Isolation in PostgreSQL. Proceedings of the VLDB Endowment, Vol. 5, No. 12. The 38th International Conference on Very Large Data Bases, August 27th - 31st 2012, Istanbul, Turkey. http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf, "The median
latency was 1.98 seconds, with 90% of transactions able to obtain a
safe snapshot within 6 seconds, and all within 20 seconds. Given the
intended use (long-running transactions), we believe this delay is
reasonable." That said, a single long-running serializable
read-write transaction could hold up the attempt indefinitely --
there is no maximum bound. Hence the benefit of a SQL function to
find out what's happening.

3. Invent a SQL function that is dedicated specifically to supporting
isolationtester and need not be documented at all; this gets us out
of the problem of whether it's okay to whack its semantics around
anytime somebody thinks of something else they want to test.
I'm imagining an API like

isolation_test_is_waiting_for(int, int[]) returns bool

so that isolationtester's query would reduce to something like

SELECT pg_catalog.isolation_test_is_waiting_for($1, '{...}')

which would take even more cycles out of the parse/plan overhead for it
(which is basically what's killing the CCA animals). Internally, this
function would call pg_blocking_pids and, if necessary,
GetSafeSnapshotBlockingPids, and would check for matches in its array
argument directly; it could probably do that significantly faster than the
general-purpose array && code. So we'd have to expend a bit of backend C
code, but we'd have something that would be quite speedy and we could
customize in future without fear of breaking behavior that users are
depending on.

Good suggestion.

Thomas, would you like to produce a patch along these lines, or
should I?

--
Kevin Grittner

[1]: Dan R. K. Ports and Kevin Grittner. 2012. Serializable Snapshot Isolation in PostgreSQL. Proceedings of the VLDB Endowment, Vol. 5, No. 12. The 38th International Conference on Very Large Data Bases, August 27th - 31st 2012, Istanbul, Turkey. http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf
Serializable Snapshot Isolation in PostgreSQL.
Proceedings of the VLDB Endowment, Vol. 5, No. 12.
The 38th International Conference on Very Large Data Bases,
August 27th - 31st 2012, Istanbul, Turkey.
http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#14)
Re: recent deadlock regression test failures

Kevin Grittner <kgrittn@gmail.com> writes:

On Sat, Apr 8, 2017 at 12:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm imagining an API like
isolation_test_is_waiting_for(int, int[]) returns bool

Good suggestion.

Thomas, would you like to produce a patch along these lines, or
should I?

I'll be happy to review/push if someone else does a first draft.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#15)
Re: recent deadlock regression test failures

... BTW, one other minor coding suggestion for
GetSafeSnapshotBlockingPids(): it might be better to avoid doing so much
palloc work while holding the SerializableXactHashLock. Even if it's
only held shared, I imagine that it's a contention bottleneck. You
could avoid that by returning an array rather than a list; the array
could be preallocated of size MaxBackends before ever taking the lock.
That would be a little bit space-wasteful, but since it's only short-lived
storage it doesn't seem like much of a problem.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Tom Lane (#15)
Re: recent deadlock regression test failures

On Sun, Apr 9, 2017 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kevin Grittner <kgrittn@gmail.com> writes:

On Sat, Apr 8, 2017 at 12:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm imagining an API like
isolation_test_is_waiting_for(int, int[]) returns bool

Good suggestion.

Thomas, would you like to produce a patch along these lines, or
should I?

I'll be happy to review/push if someone else does a first draft.

Ok, I'll post a new patch tomorrow.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thomas Munro (#17)
2 attachment(s)
Re: recent deadlock regression test failures

On Sun, Apr 9, 2017 at 11:49 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sun, Apr 9, 2017 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kevin Grittner <kgrittn@gmail.com> writes:

On Sat, Apr 8, 2017 at 12:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm imagining an API like
isolation_test_is_waiting_for(int, int[]) returns bool

Good suggestion.

Thomas, would you like to produce a patch along these lines, or
should I?

I'll be happy to review/push if someone else does a first draft.

Ok, I'll post a new patch tomorrow.

Here's a pair of draft patches for review:

1. pg-isolation-test-session-is-blocked.patch, to fix the
CACHE_CLOBBERED_ALWAYS problem by doing all the work in C as
suggested.

2. pg-safe-snapshot-blocking-pids.patch, to provide an end-user
function wrapping GetSafeSnapshotBlockingPids(). Kevin expressed an
interest in that functionality, and it does seem useful: for example,
someone might use it to investigate which backends are holding up
pg_dump --serializable-deferrrable. This is a separate patch because
you might consider it material for the next cycle, though at least
it's handy for verifying that GetSafeSnapshotBlockingPids() is working
correctly. To test, open some number of SERIALIZABLE transactions and
take a snapshot in each, and then open a SERIALIZABLE READ ONLY
DEFERRABLE transaction and try to take a snapshot; it will block and
pg_safe_snapshot_blocking_pids() will identify the culprit(s).

Thanks to Andrew Gierth for an off-list discussion about the pitfalls
of trying to use "arrayoverlap" from inside
pg_isolation_test_session_is_blocked, which helped me decide that I
should open-code that logic instead. Does that make sense?

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

pg-isolation-test-session-is-blocked.patchapplication/octet-stream; name=pg-isolation-test-session-is-blocked.patchDownload
From b1857cf7f60484d7e8409a60f106939875cfb632 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Mon, 10 Apr 2017 14:03:12 +1200
Subject: [PATCH] Fix isolation tester performance under CLOBBER_CACHE_ALWAYS.

Commit 4deb4138 added an isolation test for SERIALIZABLE DEFERRABLE.  It
included a change to a query used internally by the isolation tester which
turned out to perform terribly under CLOBBER_CACHE_ALWAYS.  Move the new
logic into C code, and while doing that also improve the performance of the
existing logic.  Package as an undocumented non-public function
pg_isolation_test_session_is_blocked().
---
 src/backend/storage/lmgr/predicate.c      | 48 +++++++++++++++++++++
 src/backend/utils/adt/lockfuncs.c         | 72 +++++++++++++++++++++++++++++++
 src/include/catalog/pg_proc.h             |  2 +
 src/include/storage/predicate_internals.h |  2 +
 src/test/isolation/isolationtester.c      | 12 +-----
 5 files changed, 126 insertions(+), 10 deletions(-)

diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 10bac71e94b..032fae36361 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -1556,6 +1556,54 @@ GetSafeSnapshot(Snapshot origSnapshot)
 }
 
 /*
+ * If the given process is currently blocked in GetSafeSnapshot, write the
+ * process IDs of all processes that it is waiting for into caller-supplied
+ * buffer 'output'.  The list is truncated at 'output_size', and the number of
+ * PIDs written into the output buffer is returned.  Return zero if the given
+ * not currently blocked in GetSafeSnapshot or unknown.
+ */
+int
+GetSafeSnapshotBlockingPids(int blocked_pid, int *output, int output_size)
+{
+	SERIALIZABLEXACT *sxact;
+	int			num_written = 0;
+
+	LWLockAcquire(SerializableXactHashLock, LW_SHARED);
+
+	/* Find blocked_pid's SERIALIZABLEXACT by linear search. */
+	for (sxact = FirstPredXact(); sxact != NULL; sxact = NextPredXact(sxact))
+	{
+		if (sxact->pid == blocked_pid)
+			break;
+	}
+
+	/* Did we find it, and is it currently waiting in GetSafeSnapshot? */
+	if (sxact != NULL && SxactIsDeferrableWaiting(sxact))
+	{
+		RWConflict possibleUnsafeConflict;
+
+		/* Traverse the list of possible unsafe conflicts collecting PIDs. */
+		possibleUnsafeConflict = (RWConflict)
+			SHMQueueNext(&sxact->possibleUnsafeConflicts,
+						 &sxact->possibleUnsafeConflicts,
+						 offsetof(RWConflictData, inLink));
+
+		while (num_written < output_size && possibleUnsafeConflict != NULL)
+		{
+			output[num_written++] = possibleUnsafeConflict->sxactOut->pid;
+			possibleUnsafeConflict = (RWConflict)
+				SHMQueueNext(&sxact->possibleUnsafeConflicts,
+							 &possibleUnsafeConflict->inLink,
+							 offsetof(RWConflictData, inLink));
+		}
+	}
+
+	LWLockRelease(SerializableXactHashLock);
+
+	return num_written;
+}
+
+/*
  * Acquire a snapshot that can be used for the current transaction.
  *
  * Make sure we have a SERIALIZABLEXACT reference in MySerializableXact.
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index 63f956e6708..726ba73d427 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -516,6 +516,78 @@ pg_blocking_pids(PG_FUNCTION_ARGS)
 										  sizeof(int32), true, 'i'));
 }
 
+/*
+ * Check if a backend is waiting for a heavyweight lock or a safe snapshot.
+ * Use a whitelist of PIDs for backends involved in an isolation test, as a
+ * way of ignoring locks held incidentally by autovacuum.
+ *
+ * This is an undocumented function intended for use by the isolation tester,
+ * and may change in future releases as required for testing purposes.
+ */
+Datum
+pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
+{
+	int			blocked_pid = PG_GETARG_INT32(0);
+	ArrayType  *interesting_pids = PG_GETARG_ARRAYTYPE_P(1);
+	ArrayType  *blocking_pids;
+	int			num_interesting_pids;
+	int			num_blocking_pids;
+	int			dummy;
+	int			i,
+				j;
+
+	/*
+	 * Get the PIDs of all sessions blocking the given session's attempt to
+	 * acquire heavyweight locks.
+	 */
+	blocking_pids =
+		DatumGetArrayTypeP(DirectFunctionCall1(pg_blocking_pids, blocked_pid));
+
+	/*
+	 * Check if any of these are in the list of interesting PIDs, being the
+	 * sessions that the isolation tester is running.  We don't use
+	 * "arrayoverlaps" here, because it would lead to cache lookups and one of
+	 * our goals is to run quickly under CLOBBER_CACHE_ALWAYS.  We expect
+	 * blocking_pids to be usually empty and otherwise a very small number in
+	 * the isolation tester cases, so make that the outer loop of a naive
+	 * search for a match.  We use knowledge about the layout of int[]
+	 * inspired by contrib/intarray rather than going through the polymorphic
+	 * array interface, after sanity checking the types.
+	 */
+
+	Assert(ARR_NDIM(blocking_pids) == 1);
+	Assert(ARR_ELEMTYPE(blocking_pids) == INT4OID);
+	num_blocking_pids = ArrayGetNItems(1, ARR_DIMS(blocking_pids));
+
+	/* Tolerate zero-dimensional arrays cast from "'{}'::int[]". */
+	Assert(ARR_NDIM(interesting_pids) == 0 || ARR_NDIM(interesting_pids) == 1);
+	Assert(ARR_ELEMTYPE(interesting_pids) == INT4OID);
+	num_interesting_pids = ArrayGetNItems(ARR_NDIM(interesting_pids),
+										  ARR_DIMS(interesting_pids));
+	if (ARR_HASNULL(interesting_pids) && array_contains_nulls(interesting_pids))
+		elog(ERROR, "array must not contain nulls");
+
+#define RAWINTS(x) ((int32 *) (ARR_DATA_PTR((x))))
+
+	for (i = 0; i < num_blocking_pids; ++i)
+		for (j = 0; j < num_interesting_pids; ++j)
+			if (RAWINTS(blocking_pids)[i] == RAWINTS(interesting_pids)[j])
+				PG_RETURN_BOOL(true);
+
+	/*
+	 * Check if blocked_pid is waiting for a safe snapshot.  We could in
+	 * theory also check the resulting array of blocker PIDs against the
+	 * interesting PIDs whitelist, but since there is no danger of autovacuum
+	 * blocking GetSafeSnapshot there seems to be no point in expending cycles
+	 * on allocating a buffer and searching for overlap there too: it's
+	 * sufficient for the isolation tester's purposes to use a single element
+	 * buffer and check if the number of safe snapshot blockers is non-zero.
+	 */
+	if (GetSafeSnapshotBlockingPids(blocked_pid, &dummy, 1) > 0)
+		PG_RETURN_BOOL(true);
+
+	PG_RETURN_BOOL(false);
+}
 
 /*
  * Functions for manipulating advisory locks
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 643838bb054..58ad255d14b 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3140,6 +3140,8 @@ DATA(insert OID = 1371 (  pg_lock_status   PGNSP PGUID 12 1 1000 0 0 f f f f t t
 DESCR("view system lock information");
 DATA(insert OID = 2561 (  pg_blocking_pids PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 1007 "23" _null_ _null_ _null_ _null_ _null_ pg_blocking_pids _null_ _null_ _null_ ));
 DESCR("get array of PIDs of sessions blocking specified backend PID");
+DATA(insert OID = 3378 (  pg_isolation_test_session_is_blocked PGNSP PGUID 12 1 0 0 0 f f f f t f v s 2 0 16 "23 1007" _null_ _null_ _null_ _null_ _null_ pg_isolation_test_session_is_blocked _null_ _null_ _null_ ));
+DESCR("is the given backend PID waiting for a safe snapshot, or for one of the given PIDs for a lock?");
 DATA(insert OID = 1065 (  pg_prepared_xact PGNSP PGUID 12 1 1000 0 0 f f f f t t v s 0 0 2249 "" "{28,25,1184,26,26}" "{o,o,o,o,o}" "{transaction,gid,prepared,ownerid,dbid}" _null_ _null_ pg_prepared_xact _null_ _null_ _null_ ));
 DESCR("view two-phase transactions");
 DATA(insert OID = 3819 (  pg_get_multixact_members PGNSP PGUID 12 1 1000 0 0 f f f f t t v s 1 0 2249 "28" "{28,28,25}" "{i,o,o}" "{multixid,xid,mode}" _null_ _null_ pg_get_multixact_members _null_ _null_ _null_ ));
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 408d94cc7a5..4b185c1eacd 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -474,5 +474,7 @@ typedef struct TwoPhasePredicateRecord
  * locking internals.
  */
 extern PredicateLockData *GetPredicateLockStatusData(void);
+extern int GetSafeSnapshotBlockingPids(int blocked_pid,
+									   int *output, int output_size);
 
 #endif   /* PREDICATE_INTERNALS_H */
diff --git a/src/test/isolation/isolationtester.c b/src/test/isolation/isolationtester.c
index 4d18710bdfd..30ebb06beae 100644
--- a/src/test/isolation/isolationtester.c
+++ b/src/test/isolation/isolationtester.c
@@ -224,20 +224,12 @@ main(int argc, char **argv)
 	 */
 	initPQExpBuffer(&wait_query);
 	appendPQExpBufferStr(&wait_query,
-						 "SELECT pg_catalog.pg_blocking_pids($1) && '{");
+						 "SELECT pg_catalog.pg_isolation_test_session_is_blocked($1, '{");
 	/* The spec syntax requires at least one session; assume that here. */
 	appendPQExpBufferStr(&wait_query, backend_pids[1]);
 	for (i = 2; i < nconns; i++)
 		appendPQExpBuffer(&wait_query, ",%s", backend_pids[i]);
-	appendPQExpBufferStr(&wait_query, "}'::integer[]");
-
-	/* Also detect certain wait events. */
-	appendPQExpBufferStr(&wait_query,
-						 " OR EXISTS ("
-						 "  SELECT * "
-						 "  FROM pg_catalog.pg_stat_activity "
-						 "  WHERE pid = $1 "
-						 "  AND wait_event IN ('SafeSnapshot'))");
+	appendPQExpBufferStr(&wait_query, "}'::integer[])");
 
 	res = PQprepare(conns[0], PREP_WAITING, wait_query.data, 0, NULL);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
-- 
2.12.2

pg-safe-snapshot-blocking-pids.patchapplication/octet-stream; name=pg-safe-snapshot-blocking-pids.patchDownload
From 1d37828653c8657f718abe7cb61be49e907b7f99 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Mon, 10 Apr 2017 15:34:48 +1200
Subject: [PATCH] Provide a function for introspecting safe snapshot waits.

This patch introduces a new function pg_safe_snapshot_blocking_pids(),
modelled on pg_blocking_pids().  While the latter is concerned with waits
for heavyweight locks, this new function allows users of SERIALIZABLE READ
ONLY DEFERRABLE to see the PIDs of backends running concurrent SERIALIZABLE
transactions that are causing waits.
---
 doc/src/sgml/func.sgml            | 27 ++++++++++++++++++++++++++-
 src/backend/utils/adt/lockfuncs.c | 37 +++++++++++++++++++++++++++++++++++++
 src/include/catalog/pg_proc.h     |  4 +++-
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cb0a36a170f..1e5b0a1506e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15747,7 +15747,7 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
       <row>
        <entry><literal><function>pg_blocking_pids(<type>int</type>)</function></literal></entry>
        <entry><type>int[]</type></entry>
-       <entry>Process ID(s) that are blocking specified server process ID</entry>
+       <entry>Process ID(s) that are blocking specified server process ID from acquiring a lock</entry>
       </row>
 
       <row>
@@ -15794,6 +15794,12 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
       </row>
 
       <row>
+       <entry><literal><function>pg_safe_snapshot_blocking_pids(<type>int</type>)</function></literal></entry>
+       <entry><type>int[]</type></entry>
+       <entry>Process ID(s) that are blocking specified server process ID from acquiring a safe snapshot</entry>
+      </row>
+
+      <row>
        <entry><literal><function>pg_trigger_depth()</function></literal></entry>
        <entry><type>int</type></entry>
        <entry>current nesting level of <productname>PostgreSQL</> triggers
@@ -16068,6 +16074,25 @@ SET search_path TO <replaceable>schema</> <optional>, <replaceable>schema</>, ..
    </para>
 
    <indexterm>
+    <primary>pg_safe_snapshot_blocking_pids</primary>
+   </indexterm>
+
+   <para>
+    <function>pg_safe_snapshot_blocking_pids</function> returns an array of
+    the process IDs of the sessions that are blocking the server process with
+    the specified process ID from acquiring a safe snapshot, or an empty array
+    if there is no such server process or it is not blocked.  A session
+    running a <literal>SERIALIZABLE</literal> transaction blocks
+    a <literal>SERIALIZABLE READ ONLY DEFERRABLE</literal> transaction from
+    acquiring a snapshot until the latter determines that it is safe to avoid
+    taking any predicate locks.  See <xref linkend="xact-serializable"> for
+    more information about serializable and deferrable transactions.  Frequent
+    calls to this function could have some impact on database performance,
+    because it needs exclusive access to the predicate lock manager's shared
+    state for a short time.
+   </para>
+   
+   <indexterm>
     <primary>version</primary>
    </indexterm>
 
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index 726ba73d427..193516554df 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -590,6 +590,43 @@ pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
 }
 
 /*
+ * pg_safe_snapshot_blocking_pids - produce an array of the PIDs blocking
+ * given PID in GetSafeSnapshot()
+ */
+Datum
+pg_safe_snapshot_blocking_pids(PG_FUNCTION_ARGS)
+{
+	int			blocked_pid = PG_GETARG_INT32(0);
+	int		   *blockers;
+	int			num_blockers;
+	Datum	   *blocker_datums;
+
+	/* A buffer big enough for any possible blocker list without truncation */
+	blockers = (int *) palloc(MaxBackends * sizeof(int));
+
+	/* Collect a snapshot of processes waited for by GetSafeSnapshot */
+	num_blockers =
+		GetSafeSnapshotBlockingPids(blocked_pid, blockers, MaxBackends);
+
+	/* Convert int array to Datum array */
+	if (num_blockers > 0)
+	{
+		int			i;
+
+		blocker_datums = (Datum *) palloc(num_blockers * sizeof(Datum));
+		for (i = 0; i < num_blockers; ++i)
+			blocker_datums[i] = Int32GetDatum(blockers[i]);
+	}
+	else
+		blocker_datums = NULL;
+
+	/* Convert Datum to Postgres array */
+	PG_RETURN_ARRAYTYPE_P(construct_array(blocker_datums, num_blockers,
+										  INT4OID,
+										  sizeof(int32), true, 'i'));
+}
+
+/*
  * Functions for manipulating advisory locks
  *
  * We make use of the locktag fields as follows:
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 58ad255d14b..620522816c0 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3139,7 +3139,9 @@ DESCR("show pg_hba.conf rules");
 DATA(insert OID = 1371 (  pg_lock_status   PGNSP PGUID 12 1 1000 0 0 f f f f t t v s 0 0 2249 "" "{25,26,26,23,21,25,28,26,26,21,25,23,25,16,16}" "{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}" "{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath}" _null_ _null_ pg_lock_status _null_ _null_ _null_ ));
 DESCR("view system lock information");
 DATA(insert OID = 2561 (  pg_blocking_pids PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 1007 "23" _null_ _null_ _null_ _null_ _null_ pg_blocking_pids _null_ _null_ _null_ ));
-DESCR("get array of PIDs of sessions blocking specified backend PID");
+DESCR("get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock");
+DATA(insert OID = 3376 (  pg_safe_snapshot_blocking_pids PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 1007 "23" _null_ _null_ _null_ _null_ _null_ pg_safe_snapshot_blocking_pids _null_ _null_ _null_ ));
+DESCR("get array of PIDs of sessions blocking specified backed PID from acquiring a safe snapshot");
 DATA(insert OID = 3378 (  pg_isolation_test_session_is_blocked PGNSP PGUID 12 1 0 0 0 f f f f t f v s 2 0 16 "23 1007" _null_ _null_ _null_ _null_ _null_ pg_isolation_test_session_is_blocked _null_ _null_ _null_ ));
 DESCR("is the given backend PID waiting for a safe snapshot, or for one of the given PIDs for a lock?");
 DATA(insert OID = 1065 (  pg_prepared_xact PGNSP PGUID 12 1 1000 0 0 f f f f t t v s 0 0 2249 "" "{28,25,1184,26,26}" "{o,o,o,o,o}" "{transaction,gid,prepared,ownerid,dbid}" _null_ _null_ pg_prepared_xact _null_ _null_ _null_ ));
-- 
2.12.2

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#18)
Re: recent deadlock regression test failures

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Here's a pair of draft patches for review:

I'll look at these in detail tomorrow, but:

2. pg-safe-snapshot-blocking-pids.patch, to provide an end-user
function wrapping GetSafeSnapshotBlockingPids(). Kevin expressed an
interest in that functionality, and it does seem useful: for example,
someone might use it to investigate which backends are holding up
pg_dump --serializable-deferrrable. This is a separate patch because
you might consider it material for the next cycle, though at least
it's handy for verifying that GetSafeSnapshotBlockingPids() is working
correctly.

Personally I have no problem with adding this now, even though it
could be seen as a new feature. Does anyone want to lobby against?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#18)
Re: recent deadlock regression test failures

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Here's a pair of draft patches for review:

Pushed with cosmetic improvements.

I notice that the safe-snapshot code path is not paying attention to
parallel-query cases, unlike the lock code path. I'm not sure how
big a deal that is...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Kevin Grittner
kgrittn@gmail.com
In reply to: Tom Lane (#20)
Re: recent deadlock regression test failures

On Mon, Apr 10, 2017 at 9:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Here's a pair of draft patches for review:

Thanks.

Pushed with cosmetic improvements.

Thanks.

I notice that the safe-snapshot code path is not paying attention to
parallel-query cases, unlike the lock code path. I'm not sure how
big a deal that is...

Parallel workers in serializable transactions should be using the
transaction number of the "master" process to take any predicate
locks, and if parallel workers are doing any DML work against
tuples, that should be using the master transaction number for
xmin/xmax and serialization failure testing. If either of those
rules are being violated, parallel processing should probably be
disabled within a serializable transaction. I don't think
safe-snapshot processing needs to do anything special if the above
rules are followed, nor can I see anything special it *could* do.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#21)
Re: recent deadlock regression test failures

Kevin Grittner <kgrittn@gmail.com> writes:

On Mon, Apr 10, 2017 at 9:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I notice that the safe-snapshot code path is not paying attention to
parallel-query cases, unlike the lock code path. I'm not sure how
big a deal that is...

Parallel workers in serializable transactions should be using the
transaction number of the "master" process to take any predicate
locks, and if parallel workers are doing any DML work against
tuples, that should be using the master transaction number for
xmin/xmax and serialization failure testing.

Right, but do they record the master's PID rather than their own in
the SERIALIZABLEXACT data structure?

Maybe it's impossible for a parallel worker to acquire its own
snapshot at all, in which case this is moot. But I'm nervous.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Kevin Grittner
kgrittn@gmail.com
In reply to: Tom Lane (#22)
Re: recent deadlock regression test failures

On Mon, Apr 10, 2017 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kevin Grittner <kgrittn@gmail.com> writes:

On Mon, Apr 10, 2017 at 9:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I notice that the safe-snapshot code path is not paying attention to
parallel-query cases, unlike the lock code path. I'm not sure how
big a deal that is...

Parallel workers in serializable transactions should be using the
transaction number of the "master" process to take any predicate
locks, and if parallel workers are doing any DML work against
tuples, that should be using the master transaction number for
xmin/xmax and serialization failure testing.

Right, but do they record the master's PID rather than their own in
the SERIALIZABLEXACT data structure?

There had better be just a single SERIALIZABLEXACT data structure
for a serializable transaction, or it just doesn't work. I can't
see how it would have anything but the master's PID in it. If
parallel execution is possible in a serializable transaction and
things are done any other way, we have a problem.

Maybe it's impossible for a parallel worker to acquire its own
snapshot at all, in which case this is moot. But I'm nervous.

I have trouble seeing what sane semantics we could possibly have if
each worker for a parallel scan used a different snapshot -- under
any transaction isolation level. I know that under the read
committed transaction isolation level we can follow xmax pointers to
hit tuples on the target relation which are not in the snapshot used
for the scan, but I don't see how that would negate the need to have
the scan itself run on a single snapshot,

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Tom Lane (#22)
Re: recent deadlock regression test failures

On Tue, Apr 11, 2017 at 6:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kevin Grittner <kgrittn@gmail.com> writes:

On Mon, Apr 10, 2017 at 9:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I notice that the safe-snapshot code path is not paying attention to
parallel-query cases, unlike the lock code path. I'm not sure how
big a deal that is...

Parallel workers in serializable transactions should be using the
transaction number of the "master" process to take any predicate
locks, and if parallel workers are doing any DML work against
tuples, that should be using the master transaction number for
xmin/xmax and serialization failure testing.

Right, but do they record the master's PID rather than their own in
the SERIALIZABLEXACT data structure?

Maybe it's impossible for a parallel worker to acquire its own
snapshot at all, in which case this is moot. But I'm nervous.

Parallel workers can't acquire snapshots, and SERIALIZABLE disables
all parallel query planning anyway.

I did some early stage POC hacking to lift that restriction[1]https://commitfest.postgresql.org/14/1004/, and if
we finish up doing it at all like that then all SERIALIZABLEXACT
structures would be associated with leader processes and
pg_safe_snapshot_blocking_pid() would automatically deal only in
leader PIDs as arguments and results so there would be no problem
here. The interlocking I proposed in that WIP patch may need work, to
be discussed, but I'm fairly sure that sharing the leader's
SERIALIZABLEXACT like that is the only sensible way forward.

[1]: https://commitfest.postgresql.org/14/1004/

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#24)
Re: recent deadlock regression test failures

Thomas Munro <thomas.munro@enterprisedb.com> writes:

On Tue, Apr 11, 2017 at 6:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe it's impossible for a parallel worker to acquire its own
snapshot at all, in which case this is moot. But I'm nervous.

Parallel workers can't acquire snapshots, and SERIALIZABLE disables
all parallel query planning anyway.

I did some early stage POC hacking to lift that restriction[1], and if
we finish up doing it at all like that then all SERIALIZABLEXACT
structures would be associated with leader processes and
pg_safe_snapshot_blocking_pid() would automatically deal only in
leader PIDs as arguments and results so there would be no problem
here. The interlocking I proposed in that WIP patch may need work, to
be discussed, but I'm fairly sure that sharing the leader's
SERIALIZABLEXACT like that is the only sensible way forward.

OK, sounds good. I agree that it would only be sensible for a parallel
worker to be using a snapshot already acquired by the master, so far
as the parallelized query itself is concerned. What was worrying me
is snapshots acquired by, eg, volatile PL functions called by the query.
But on second thought, in SERIALIZABLE mode no such function would be
taking an actual new snapshot anyhow --- they'd just continue to use
the transaction snap.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Kevin Grittner (#21)
Re: recent deadlock regression test failures

On Tue, Apr 11, 2017 at 5:45 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Mon, Apr 10, 2017 at 9:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Here's a pair of draft patches for review:

Thanks.

Pushed with cosmetic improvements.

Thanks.

Thanks Tom and Kevin.

After this commit hyrax ran the isolation test in 00:56:14, shaving
10-15 minutes off the typical earlier times, and is green once again.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers