SERIALIZABLE with parallel query

Started by Thomas Munroabout 9 years ago51 messages

thomas.munro@enterprisedb.com

about 9 years ago

1 attachment(s)

Hi hackers,

Currently we don't generate parallel plans in SERIALIZABLE. What
problems need to be solved to be able to do that? I'm probably
steamrolling over a ton of subtleties and assumptions here, but it
occurred to me that a first step might be something like this:

1. Hand the leader's SERIALIZABLEXACT to workers.
2. Make sure that workers don't release predicate locks.
3. Make sure that the leader doesn't release predicate locks until
after the workers have finished accessing the leader's
SERIALIZABLEXACT. I think this is already the case.

See attached 5 minute hack. Need to audit predicate.c for cases where
MySerializableXact might be modified without suitable locking, and
probably sprinkle assertions all over the place that workers don't
reach certain places etc. I wonder what horrible things might happen
as a result of workers running with a SERIALIZABLEXACT that contains
the leader's vxid and other such things. I'd love to figure all this
out in time for one of the later CFs in this cycle. Any thoughts?

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

ssi-parallel-hack.patchapplication/octet-stream; name=ssi-parallel-hack.patchDownload

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 59dc394..f047ee0 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -26,6 +26,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate_internals.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -76,6 +77,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SERIALIZABLEXACT *parallel_master_serializablexact;
 
 	/* Entrypoint for parallel workers. */
 	parallel_worker_main_type entrypoint;
@@ -138,14 +140,6 @@ CreateParallelContext(parallel_worker_main_type entrypoint, int nworkers)
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.
-	 */
-	if (IsolationIsSerializable())
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -298,6 +292,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->parallel_master_serializablexact = GetSerializableXact();
 	fps->entrypoint = pcxt->entrypoint;
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
@@ -1092,6 +1087,9 @@ ParallelWorkerMain(Datum main_arg)
 	/* Set ParallelMasterBackendId so we know how to address temp relations. */
 	ParallelMasterBackendId = fps->parallel_master_backend_id;
 
+	/* Use the leader's SERIALIZABLEXACT. */
+	SetSerializableXact(fps->parallel_master_serializablexact);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 644b8b6..9c494c1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -233,14 +233,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -249,8 +241,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->utilityStmt == NULL &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 24ed21b..8442bc2 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -184,6 +184,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -3201,6 +3202,10 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* Only leader processes should release predicate locks. */
+	if (IsParallelWorker())
+		return;
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
@@ -4966,3 +4971,22 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Accessor to allow parallel leaders to export the current SERIALIZABLEXACT
+ * to parallel workers.
+ */
+SERIALIZABLEXACT *
+GetSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+SetSerializableXact(SERIALIZABLEXACT *sxact)
+{
+	MySerializableXact = sxact;
+}
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 3175d28..ad049a6 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -474,5 +474,7 @@ typedef struct TwoPhasePredicateRecord
  * locking internals.
  */
 extern PredicateLockData *GetPredicateLockStatusData(void);
+extern SERIALIZABLEXACT *GetSerializableXact(void);
+extern void SetSerializableXact(SERIALIZABLEXACT *sxact);
 
 #endif   /* PREDICATE_INTERNALS_H */

Peter Geoghegan

pg@heroku.com

about 9 years ago

In reply to: Thomas Munro (#1)

Re: SERIALIZABLE with parallel query

On Tue, Nov 8, 2016 at 1:51 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Currently we don't generate parallel plans in SERIALIZABLE. What
problems need to be solved to be able to do that?

FWIW, parallel CREATE INDEX works at SERIALIZABLE isolation level by
specially asking the parallel infrastructure to not care. I think that
this works fine, given the limited scope of the problem, but it would
be nice to have that confirmed.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#1)

1 attachment(s)

Re: SERIALIZABLE with parallel query

On Wed, Nov 9, 2016 at 10:51 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Need to audit predicate.c for cases where
MySerializableXact might be modified without suitable locking,

The only thing I see along those lines is that
CheckForSerializableConflictOut() and CheckForSerializableConflictIn()
access SxactIsDoomed(MySerializableXact) without any locking, but if
that's OK in the non-parallel case it should also be OK in a worker.
I guess this is an opportunistic early error path that doesn't mind
seeing data from the past without worrying about cache coherency, on
the basis that it will be checked again in
PreCommit_CheckForSerializationFailure().

I wonder what horrible things might happen
as a result of workers running with a SERIALIZABLEXACT that contains
the leader's vxid and other such things.

What is the consequence of that vxid? What other complications could
be involved here?

Here's a rebased patch that updates the documentation and adds a test
cast to show a serialization failure being detected when one of the
queries runs entirely in a parallel worker.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

ssi-parallel-v2.patchapplication/octet-stream; name=ssi-parallel-v2.patchDownload

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index e8624fc..525177c 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -177,13 +177,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -235,16 +228,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         making it ineligible for parallel query.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 3e0ee87..ab2c3e6 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -26,6 +26,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate_internals.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -76,6 +77,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SERIALIZABLEXACT *parallel_master_serializablexact;
 
 	/* Entrypoint for parallel workers. */
 	parallel_worker_main_type entrypoint;
@@ -138,14 +140,6 @@ CreateParallelContext(parallel_worker_main_type entrypoint, int nworkers)
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.
-	 */
-	if (IsolationIsSerializable())
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -298,6 +292,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->parallel_master_serializablexact = GetSerializableXact();
 	fps->entrypoint = pcxt->entrypoint;
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
@@ -1093,6 +1088,9 @@ ParallelWorkerMain(Datum main_arg)
 	/* Set ParallelMasterBackendId so we know how to address temp relations. */
 	ParallelMasterBackendId = fps->parallel_master_backend_id;
 
+	/* Use the leader's SERIALIZABLEXACT. */
+	SetSerializableXact(fps->parallel_master_serializablexact);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index abb4f12..7b8f763 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -230,14 +230,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -245,8 +237,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 7aa719d..929a751 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -184,6 +184,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -3201,6 +3202,10 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* Only leader processes should release predicate locks. */
+	if (IsParallelWorker())
+		return;
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
@@ -4966,3 +4971,22 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Accessor to allow parallel leaders to export the current SERIALIZABLEXACT
+ * to parallel workers.
+ */
+SERIALIZABLEXACT *
+GetSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+SetSerializableXact(SERIALIZABLEXACT *sxact)
+{
+	MySerializableXact = sxact;
+}
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 408d94c..db48d5a 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -474,5 +474,7 @@ typedef struct TwoPhasePredicateRecord
  * locking internals.
  */
 extern PredicateLockData *GetPredicateLockStatusData(void);
+extern SERIALIZABLEXACT *GetSerializableXact(void);
+extern void SetSerializableXact(SERIALIZABLEXACT *sxact);
 
 #endif   /* PREDICATE_INTERNALS_H */
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 0000000..f43aa6a
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 2606a27..1d69820 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -57,3 +57,4 @@ test: alter-table-3
 test: create-trigger
 test: async-notify
 test: timeouts
+test: serializable-parallel
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 0000000..0e7c2c7
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Peter Geoghegan (#2)

Re: SERIALIZABLE with parallel query

On Wed, Nov 9, 2016 at 12:34 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Tue, Nov 8, 2016 at 1:51 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Currently we don't generate parallel plans in SERIALIZABLE. What
problems need to be solved to be able to do that?

FWIW, parallel CREATE INDEX works at SERIALIZABLE isolation level by
specially asking the parallel infrastructure to not care. I think that
this works fine, given the limited scope of the problem, but it would
be nice to have that confirmed.

I don't see any problem with it, but it'd be nicer to get rid of the
restriction so your change isn't needed.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Thomas Munro (#1)

Re: SERIALIZABLE with parallel query

On Tue, Nov 8, 2016 at 4:51 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Currently we don't generate parallel plans in SERIALIZABLE. What
problems need to be solved to be able to do that? I'm probably
steamrolling over a ton of subtleties and assumptions here, but it
occurred to me that a first step might be something like this:

1. Hand the leader's SERIALIZABLEXACT to workers.
2. Make sure that workers don't release predicate locks.
3. Make sure that the leader doesn't release predicate locks until
after the workers have finished accessing the leader's
SERIALIZABLEXACT. I think this is already the case.

What happens if the workers exit at the end of the query and the
leader then goes on and executes more queries? Will the
worker-acquired predicate locks be retained or will they go away when
the leader exits?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#5)

Re: SERIALIZABLE with parallel query

On Thu, Feb 16, 2017 at 2:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 8, 2016 at 4:51 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Currently we don't generate parallel plans in SERIALIZABLE. What
problems need to be solved to be able to do that? I'm probably
steamrolling over a ton of subtleties and assumptions here, but it
occurred to me that a first step might be something like this:

1. Hand the leader's SERIALIZABLEXACT to workers.
2. Make sure that workers don't release predicate locks.
3. Make sure that the leader doesn't release predicate locks until
after the workers have finished accessing the leader's
SERIALIZABLEXACT. I think this is already the case.

What happens if the workers exit at the end of the query and the
leader then goes on and executes more queries? Will the
worker-acquired predicate locks be retained or will they go away when
the leader exits?

All predicate locks last at least until ReleasePredicateLocks() run
after ProcReleaseLocks(), and sometimes longer. Although
ReleasePredicateLocks() runs in workers too, this patch makes it
return without doing anything. I suppose someone could say that
ReleasePredicateLocks() should at least run
hash_destroy(LocalPredicateLockHash) and set LocalPredicateLockHash to
NULL in workers. This sort of thing could be important if we start
reusing worker processes. I'll do that in the next version.

The predicate locks themselves consist of state in shared memory, and
those created by workers are indistinguishable from those created by
the leader process. Having multiple workers and the leader all
creating predicate locks linked to the same SERIALIZABLEXACT is
*almost* OK, because most relevant shmem state is protected by locks
already in all paths (with the exception of the DOOMED flag already
mentioned, which seems to follow a "notice me as soon as possible"
philosophy, to avoid putting locking into the
CheckForSerializableConflict(In|Out) paths, with a definitive check at
commit time).

But... there is a problem with the management of the linked list of
predicate locks held by a transactions. The locks themselves are
covered by partition locks, but the links are not, and that previous
patch broke the assumption that they could never be changed by another
process.

Specifically, DeleteChildTargetLocks() assumes it can walk
MySerializableXact->predicateLocks and throw away locks that are
covered by a new lock (ie throw away tuple locks because a covering
page lock has been acquired) without let or hindrance until it needs
to modify the locks themselves. That assumption doesn't hold up with
that last patch and will require a new kind of mutual exclusion. I
wonder if the solution is to introduce an LWLock into each
SERIALIZABLEXACT object, so DeleteChildTargetLocks() can prevent
workers from stepping on each others' toes during lock cleanup. An
alternative would be to start taking SerializablePredicateLockListLock
in exclusive rather than shared mode, but that seems unnecessarily
course.

I have a patch that implements the above but I'm still figuring out
how to test it, and I'll need to do a bit more poking around for other
similar assumptions before I post a new version.

I tried to find any way that LocalPredicateLockHash could create
problems, but it's effectively a cache with
false-negatives-but-never-false-positives semantics. In cache-miss
scenarios it we look in shmem data structures and are prepared to find
that our SERIALIZABLEXACT already has the predicate lock even though
there was a cache miss in LocalPredicateLockHash. That works because
our SERIALIZABLEXACT's address is part of the tag, and it's stable
across backends.

Random observation: The global variable MyXactDidWrite would probably
need to move into shared memory when parallel workers eventually learn
to write.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Thomas Munro (#6)

1 attachment(s)

Re: SERIALIZABLE with parallel query

On Thu, Feb 16, 2017 at 6:19 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Specifically, DeleteChildTargetLocks() assumes it can walk
MySerializableXact->predicateLocks and throw away locks that are
covered by a new lock (ie throw away tuple locks because a covering
page lock has been acquired) without let or hindrance until it needs
to modify the locks themselves. That assumption doesn't hold up with
that last patch and will require a new kind of mutual exclusion. I
wonder if the solution is to introduce an LWLock into each
SERIALIZABLEXACT object, so DeleteChildTargetLocks() can prevent
workers from stepping on each others' toes during lock cleanup. An
alternative would be to start taking SerializablePredicateLockListLock
in exclusive rather than shared mode, but that seems unnecessarily
coarse.

Here is a patch to do that, for discussion. It adds an LWLock to each
SERIALIZABLEXACT, and acquires it after SerializablePredicateListLock
and before any predicate lock partition lock. It doesn't bother with
that if not in parallel mode, or in the cases where
SerializablePredicateListLock is held exclusively. This prevents
parallel query workers and leader from stepping on each others' toes
when manipulating the predicate list.

The case in CheckTargetForConflictsIn is theoretical for now since we
don't support writing in parallel query yet. The case in
CreatePredicateLock is reachable by running a simple parallel
sequential scan. The case in DeleteChildTargetLocks is for when we've
acquired a new predicate lock that covers finer grained locks which
can be dropped; that is reachable the same way again. I don't think
it's required in ReleaseOneSerializableXact since it was already
called in several places with an sxact other than the caller's, and
deals with finished transactions.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

ssi-parallel-v3.patchapplication/octet-stream; name=ssi-parallel-v3.patchDownload

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index e8624fc..525177c 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -177,13 +177,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -235,16 +228,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         making it ineligible for parallel query.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 3e0ee87..ab2c3e6 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -26,6 +26,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate_internals.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -76,6 +77,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SERIALIZABLEXACT *parallel_master_serializablexact;
 
 	/* Entrypoint for parallel workers. */
 	parallel_worker_main_type entrypoint;
@@ -138,14 +140,6 @@ CreateParallelContext(parallel_worker_main_type entrypoint, int nworkers)
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.
-	 */
-	if (IsolationIsSerializable())
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -298,6 +292,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->parallel_master_serializablexact = GetSerializableXact();
 	fps->entrypoint = pcxt->entrypoint;
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
@@ -1093,6 +1088,9 @@ ParallelWorkerMain(Datum main_arg)
 	/* Set ParallelMasterBackendId so we know how to address temp relations. */
 	ParallelMasterBackendId = fps->parallel_master_backend_id;
 
+	/* Use the leader's SERIALIZABLEXACT. */
+	SetSerializableXact(fps->parallel_master_serializablexact);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3d33d46..bf507e4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -230,14 +230,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -245,8 +237,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index ab81d94..fec9279 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -510,6 +510,7 @@ RegisterLWLockTranches(void)
 						  "predicate_lock_manager");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "sxact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 7aa719d..6d4180c 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLE_XACT's member 'lock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	FirstPredicateLockMgrLock based partition locks
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target..
@@ -184,6 +192,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -1749,6 +1758,7 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	SHMQueueInit(&(sxact->predicateLocks));
 	SHMQueueElemInit(&(sxact->finishedLink));
 	sxact->flags = 0;
+	LWLockInitialize(&sxact->lock, LWTRANCHE_SXACT);
 	if (XactReadOnly)
 	{
 		sxact->flags |= SXACT_FLAG_READ_ONLY;
@@ -2031,6 +2041,14 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
 
 	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
 
+	if (IsInParallelMode())
+	{
+		Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+									LW_EXCLUSIVE) ||
+			   LWLockHeldByMeInMode(&MySerializableXact->lock,
+									LW_EXCLUSIVE));
+	}
+
 	/* Can't remove it until no locks at this target. */
 	if (!SHMQueueEmpty(&target->predicateLocks))
 		return;
@@ -2048,7 +2066,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2061,6 +2081,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2114,6 +2136,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2305,6 +2329,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2342,6 +2368,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2529,7 +2557,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2589,7 +2618,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2604,7 +2633,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3201,6 +3231,10 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* Only leader processes should release predicate locks. */
+	if (IsParallelWorker())
+		goto cleanup;
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
@@ -3487,6 +3521,7 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
+cleanup:
 	/* Delete per-transaction lock table */
 	if (LocalPredicateLockHash != NULL)
 	{
@@ -4176,6 +4211,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4210,6 +4247,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->lock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4758,6 +4797,11 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->lock in parallel mode because there cannot be
+	 * any parallel workers running while we are preparing a transaction.
+	 */
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -4966,3 +5010,22 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Accessor to allow parallel leaders to export the current SERIALIZABLEXACT
+ * to parallel workers.
+ */
+SERIALIZABLEXACT *
+GetSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+SetSerializableXact(SERIALIZABLEXACT *sxact)
+{
+	MySerializableXact = sxact;
+}
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 8bd93c3..bf2f8db 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -212,6 +212,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_LOCK_MANAGER,
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 408d94c..35b63ab 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	/* lock to protect predicateLocks list in parallel mode */
+	LWLock		lock;
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
@@ -474,5 +478,7 @@ typedef struct TwoPhasePredicateRecord
  * locking internals.
  */
 extern PredicateLockData *GetPredicateLockStatusData(void);
+extern SERIALIZABLEXACT *GetSerializableXact(void);
+extern void SetSerializableXact(SERIALIZABLEXACT *sxact);
 
 #endif   /* PREDICATE_INTERNALS_H */
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 0000000..f43aa6a
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 2606a27..1d69820 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -57,3 +57,4 @@ test: alter-table-3
 test: create-trigger
 test: async-notify
 test: timeouts
+test: serializable-parallel
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 0000000..0e7c2c7
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Thomas Munro (#7)

Re: SERIALIZABLE with parallel query

On Tue, Feb 21, 2017 at 12:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Feb 16, 2017 at 6:19 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Specifically, DeleteChildTargetLocks() assumes it can walk
MySerializableXact->predicateLocks and throw away locks that are
covered by a new lock (ie throw away tuple locks because a covering
page lock has been acquired) without let or hindrance until it needs
to modify the locks themselves. That assumption doesn't hold up with
that last patch and will require a new kind of mutual exclusion. I
wonder if the solution is to introduce an LWLock into each
SERIALIZABLEXACT object, so DeleteChildTargetLocks() can prevent
workers from stepping on each others' toes during lock cleanup. An
alternative would be to start taking SerializablePredicateLockListLock
in exclusive rather than shared mode, but that seems unnecessarily
coarse.

Here is a patch to do that, for discussion. It adds an LWLock to each
SERIALIZABLEXACT, and acquires it after SerializablePredicateListLock
and before any predicate lock partition lock. It doesn't bother with
that if not in parallel mode, or in the cases where
SerializablePredicateListLock is held exclusively. This prevents
parallel query workers and leader from stepping on each others' toes
when manipulating the predicate list.

The case in CheckTargetForConflictsIn is theoretical for now since we
don't support writing in parallel query yet. The case in
CreatePredicateLock is reachable by running a simple parallel
sequential scan. The case in DeleteChildTargetLocks is for when we've
acquired a new predicate lock that covers finer grained locks which
can be dropped; that is reachable the same way again. I don't think
it's required in ReleaseOneSerializableXact since it was already
called in several places with an sxact other than the caller's, and
deals with finished transactions.

I don't think I know enough about the serializable code to review
this, or at least not quickly, but it seems very cool if it works.
Have you checked what effect it has on shared memory consumption?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#8)

1 attachment(s)

Re: SERIALIZABLE with parallel query

On Wed, Feb 22, 2017 at 2:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I don't think I know enough about the serializable code to review
this, or at least not quickly, but it seems very cool if it works.
Have you checked what effect it has on shared memory consumption?

I'm not sure how to test that. Kevin, can you provide any pointers to
the test workloads you guys used when developing SSI? In theory shmem
usage shouldn't change, since the predicate locks are shared by the
cooperating backends. It might be able to use a bit more by creating
finer grain locks in worker A that are already covered by coarse
grained locks acquired by worker B or something like that, but it
seems unlikely if they tend to scan disjoint sets of pages.

Here is a rebased patch.

I should explain the included isolation test. It's almost the same as
the SERIALIZABLE variant that I submitted for
https://commitfest.postgresql.org/13/949/. That is a useful test here
because it's a serialisation anomaly that involves a read-only query.
In this version we run that query (s3r) in a parallel worker, and the
query plan comes out like this:

Gather (cost=1013.67..1013.87 rows=2 width=64)
Workers Planned: 1
Single Copy: true
-> Sort (cost=13.67..13.67 rows=2 width=64)
Sort Key: id
-> Bitmap Heap Scan on bank_account (cost=8.32..13.66 rows=2 width=64)
Recheck Cond: (id = ANY ('{X,Y}'::text[]))
-> Bitmap Index Scan on bank_account_pkey
(cost=0.00..8.32 rows=2 width=0)
Index Cond: (id = ANY ('{X,Y}'::text[]))

A dangerous cycle is detected, confirming that reads done by the
worker participate correctly in predicate locking and conflict
detection:

step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
ERROR: could not serialize access due to read/write dependencies
among transactions

It's probably too late for this WIP patch to get the kind of review
and testing it needs for PostgreSQL 10. That's OK, but think it might
be a strategically good idea to get parallel SSI support (whether with
this or some other approach) into the tree before people start showing
up with parallel write patches.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

ssi-parallel-v4.patchapplication/octet-stream; name=ssi-parallel-v4.patchDownload

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index 2ea5c34..df8856e 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -177,13 +177,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -235,16 +228,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         making it ineligible for parallel query.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 3e0ee87..ab2c3e6 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -26,6 +26,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate_internals.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -76,6 +77,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SERIALIZABLEXACT *parallel_master_serializablexact;
 
 	/* Entrypoint for parallel workers. */
 	parallel_worker_main_type entrypoint;
@@ -138,14 +140,6 @@ CreateParallelContext(parallel_worker_main_type entrypoint, int nworkers)
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.
-	 */
-	if (IsolationIsSerializable())
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -298,6 +292,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->parallel_master_serializablexact = GetSerializableXact();
 	fps->entrypoint = pcxt->entrypoint;
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
@@ -1093,6 +1088,9 @@ ParallelWorkerMain(Datum main_arg)
 	/* Set ParallelMasterBackendId so we know how to address temp relations. */
 	ParallelMasterBackendId = fps->parallel_master_backend_id;
 
+	/* Use the leader's SERIALIZABLEXACT. */
+	SetSerializableXact(fps->parallel_master_serializablexact);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 02286d9..993e318 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -232,14 +232,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -247,8 +239,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3e13394..8ff9b83 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "sxact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 7aa719d..6d4180c 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLE_XACT's member 'lock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	FirstPredicateLockMgrLock based partition locks
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target..
@@ -184,6 +192,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -1749,6 +1758,7 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	SHMQueueInit(&(sxact->predicateLocks));
 	SHMQueueElemInit(&(sxact->finishedLink));
 	sxact->flags = 0;
+	LWLockInitialize(&sxact->lock, LWTRANCHE_SXACT);
 	if (XactReadOnly)
 	{
 		sxact->flags |= SXACT_FLAG_READ_ONLY;
@@ -2031,6 +2041,14 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
 
 	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
 
+	if (IsInParallelMode())
+	{
+		Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+									LW_EXCLUSIVE) ||
+			   LWLockHeldByMeInMode(&MySerializableXact->lock,
+									LW_EXCLUSIVE));
+	}
+
 	/* Can't remove it until no locks at this target. */
 	if (!SHMQueueEmpty(&target->predicateLocks))
 		return;
@@ -2048,7 +2066,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2061,6 +2081,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2114,6 +2136,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2305,6 +2329,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2342,6 +2368,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2529,7 +2557,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2589,7 +2618,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2604,7 +2633,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3201,6 +3231,10 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* Only leader processes should release predicate locks. */
+	if (IsParallelWorker())
+		goto cleanup;
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
@@ -3487,6 +3521,7 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
+cleanup:
 	/* Delete per-transaction lock table */
 	if (LocalPredicateLockHash != NULL)
 	{
@@ -4176,6 +4211,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4210,6 +4247,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->lock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4758,6 +4797,11 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->lock in parallel mode because there cannot be
+	 * any parallel workers running while we are preparing a transaction.
+	 */
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -4966,3 +5010,22 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Accessor to allow parallel leaders to export the current SERIALIZABLEXACT
+ * to parallel workers.
+ */
+SERIALIZABLEXACT *
+GetSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+SetSerializableXact(SERIALIZABLEXACT *sxact)
+{
+	MySerializableXact = sxact;
+}
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 0cd45bb..cd72014 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 408d94c..35b63ab 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	/* lock to protect predicateLocks list in parallel mode */
+	LWLock		lock;
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
@@ -474,5 +478,7 @@ typedef struct TwoPhasePredicateRecord
  * locking internals.
  */
 extern PredicateLockData *GetPredicateLockStatusData(void);
+extern SERIALIZABLEXACT *GetSerializableXact(void);
+extern void SetSerializableXact(SERIALIZABLEXACT *sxact);
 
 #endif   /* PREDICATE_INTERNALS_H */
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 0000000..f43aa6a
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 2606a27..1d69820 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -57,3 +57,4 @@ test: alter-table-3
 test: create-trigger
 test: async-notify
 test: timeouts
+test: serializable-parallel
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 0000000..0e7c2c7
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"

#10

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Thomas Munro (#9)

Re: SERIALIZABLE with parallel query

Hi,

On 2017-03-11 15:19:23 +1300, Thomas Munro wrote:

Here is a rebased patch.

It seems that this patch is still undergoing development, review and
performance evaluation. Therefore it seems like it'd be a bad idea to
try to get this into v10. Any arguments against moving this to the next
CF?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Andres Freund (#10)

Re: SERIALIZABLE with parallel query

On Tue, Apr 4, 2017 at 6:41 AM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-03-11 15:19:23 +1300, Thomas Munro wrote:

Here is a rebased patch.

It seems that this patch is still undergoing development, review and
performance evaluation. Therefore it seems like it'd be a bad idea to
try to get this into v10. Any arguments against moving this to the next
CF?

None, and done.

It would be good to get some feedback from Kevin on whether this is a
reasonable approach, but considering that these data structures may
finish up being redesigned as part of the GSoC project[1]https://wiki.postgresql.org/wiki/GSoC_2017#Eliminate_O.28N.5E2.29_scaling_from_rw-conflict_tracking_in_serializable_transactions, it may be
best to wait and see where that goes before doing anything. I'll
follow developments there, and if this patch remains relevant I'll
plan to do some more work on it including testing (possibly with the
RUBiS benchmark from Kevin and Dan's paper since it seems the most
likely to be able to really use parallelism) for PG11 CF1.

[1]: https://wiki.postgresql.org/wiki/GSoC_2017#Eliminate_O.28N.5E2.29_scaling_from_rw-conflict_tracking_in_serializable_transactions

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Kevin Grittner

kgrittn@gmail.com

almost 9 years ago

In reply to: Thomas Munro (#9)

Re: SERIALIZABLE with parallel query

On Fri, Mar 10, 2017 at 8:19 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Feb 22, 2017 at 2:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I don't think I know enough about the serializable code to review
this, or at least not quickly, but it seems very cool if it works.
Have you checked what effect it has on shared memory consumption?

I'm not sure how to test that. Kevin, can you provide any pointers to
the test workloads you guys used when developing SSI?

During development there was first and foremost the set of tests
which wound up implemented in the isolation testing environment
developed by Heikki, although for most of the development cycle
these tests were run by a python tool written by Markus Wanner
(which was not as fast as Heikki's C-based tool, but provided a lot
more detail in case of failure -- so it was very nice to have until
we got down to polishing things).

The final stress test to chase out race conditions and the
performance benchmarks were all run by Dan Ports on big test
machines at MIT. I'm not sure how much I can say to elaborate on
what is in section 8 of this paper:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

I seem to remember that he had some saturation run versions of the
"DBT-2++" tests, modified to monitor for serialization anomalies
missed by the implementation, which he sometimes ran for days at a
time on MIT testing machines. There were some very narrow race
conditions uncovered by this testing, which at high concurrency on a
16 core machine might hit a particular problem less often than once
per day.

I also remember that both Anssi Kääriäinen and Yamamoto Takahashi
did a lot of valuable testing of the patch and found problems that
we had missed. Perhaps they can describe their testing or make
other suggestions.

--
Kevin Grittner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Thomas Munro (#11)

1 attachment(s)

Re: SERIALIZABLE with parallel query

On Tue, Apr 4, 2017 at 8:25 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

... but considering that these data structures may
finish up being redesigned as part of the GSoC project[1], it may be
best to wait and see where that goes before doing anything. I'll
follow developments there, and if this patch remains relevant I'll
plan to do some more work on it including testing (possibly with the
RUBiS benchmark from Kevin and Dan's paper since it seems the most
likely to be able to really use parallelism) for PG11 CF1.

I've been keeping one eye on the GSoC project. That patch changes the
inConflicts and outConflicts data structures, but not the locking
protocol. This patch works by introducing per-SERIALIZABLEXACT
locking in the places where the code currently assumes that the
current backend is the only one that could modify a shared data
structure (namely MySerializableXact->predicateLocks), so that
MySerializableXact can be shared with workers. There doesn't seem to
be any incompatibility or dependency so far, so here's a rebased
patch. Testing needed.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

ssi-parallel-v5.patchapplication/octet-stream; name=ssi-parallel-v5.patchDownload

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index ff31e7537e6..fd49ef3b5c7 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -177,13 +177,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -235,16 +228,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         making it ineligible for parallel query.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 17b10383e44..7a24b5ec33b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -27,6 +27,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate_internals.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -77,6 +78,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SERIALIZABLEXACT *parallel_master_serializablexact;
 
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
@@ -152,14 +154,6 @@ CreateParallelContext(const char *library_name, const char *function_name,
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.
-	 */
-	if (IsolationIsSerializable())
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -281,6 +275,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->parallel_master_serializablexact = GetSerializableXact();
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_FIXED, fps);
@@ -1089,6 +1084,9 @@ ParallelWorkerMain(Datum main_arg)
 	/* Set ParallelMasterBackendId so we know how to address temp relations. */
 	ParallelMasterBackendId = fps->parallel_master_backend_id;
 
+	/* Use the leader's SERIALIZABLEXACT. */
+	SetSerializableXact(fps->parallel_master_serializablexact);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2988c1181b9..6b213f9fe0a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -261,14 +261,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -276,8 +268,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 82a1cf5150b..21beb8d463c 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -494,7 +494,7 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 64;
+		LWLockTranchesAllocated = 128;
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 								   LWLockTranchesAllocated * sizeof(char *));
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "sxact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index a4cb4d33add..f43cc68e78d 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLE_XACT's member 'lock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	FirstPredicateLockMgrLock based partition locks
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target..
@@ -184,6 +192,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -1810,6 +1819,7 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	SHMQueueInit(&(sxact->predicateLocks));
 	SHMQueueElemInit(&(sxact->finishedLink));
 	sxact->flags = 0;
+	LWLockInitialize(&sxact->lock, LWTRANCHE_SXACT);
 	if (XactReadOnly)
 	{
 		sxact->flags |= SXACT_FLAG_READ_ONLY;
@@ -2092,6 +2102,14 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
 
 	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
 
+	if (IsInParallelMode())
+	{
+		Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+									LW_EXCLUSIVE) ||
+			   LWLockHeldByMeInMode(&MySerializableXact->lock,
+									LW_EXCLUSIVE));
+	}
+
 	/* Can't remove it until no locks at this target. */
 	if (!SHMQueueEmpty(&target->predicateLocks))
 		return;
@@ -2109,7 +2127,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2122,6 +2142,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2175,6 +2197,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2373,6 +2397,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2410,6 +2436,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2597,7 +2625,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2657,7 +2686,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2672,7 +2701,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3269,6 +3299,10 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* Only leader processes should release predicate locks. */
+	if (IsParallelWorker())
+		goto cleanup;
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
@@ -3555,6 +3589,7 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
+cleanup:
 	/* Delete per-transaction lock table */
 	if (LocalPredicateLockHash != NULL)
 	{
@@ -4244,6 +4279,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4278,6 +4315,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->lock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4826,6 +4865,11 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->lock in parallel mode because there cannot be
+	 * any parallel workers running while we are preparing a transaction.
+	 */
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -5034,3 +5078,22 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Accessor to allow parallel leaders to export the current SERIALIZABLEXACT
+ * to parallel workers.
+ */
+SERIALIZABLEXACT *
+GetSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+SetSerializableXact(SERIALIZABLEXACT *sxact)
+{
+	MySerializableXact = sxact;
+}
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 3d16132c88f..d9640139ae5 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 89874a5c3b6..64560d4d3a4 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	/* lock to protect predicateLocks list in parallel mode */
+	LWLock		lock;
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
@@ -475,5 +479,7 @@ typedef struct TwoPhasePredicateRecord
 extern PredicateLockData *GetPredicateLockStatusData(void);
 extern int GetSafeSnapshotBlockingPids(int blocked_pid,
 							int *output, int output_size);
+extern SERIALIZABLEXACT *GetSerializableXact(void);
+extern void SetSerializableXact(SERIALIZABLEXACT *sxact);
 
 #endif							/* PREDICATE_INTERNALS_H */
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 00000000000..f43aa6a2990
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 32c965b2a02..e428357e772 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -62,3 +62,4 @@ test: sequence-ddl
 test: async-notify
 test: vacuum-reltuples
 test: timeouts
+test: serializable-parallel
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 00000000000..0e7c2c7c1fa
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"

#14

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Thomas Munro (#13)

1 attachment(s)

Re: SERIALIZABLE with parallel query

On Wed, Jun 28, 2017 at 11:21 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

[ssi-parallel-v5.patch]

Rebased.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

ssi-parallel-v6.patchapplication/octet-stream; name=ssi-parallel-v6.patchDownload

From c626e1c0a366157d3c6efaf8c0cf1bc01fff85c3 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Fri, 1 Sep 2017 16:54:57 +1200
Subject: [PATCH] Enable SERIALIZABLE and parallel query to be used together.

Previously, the SERIALIZABLE isolation level prevented parallel query from
being used.  Allow the two features to be used together by sharing the
leader's SERIALIZABLEXACT with parallel workers.

Author: Thomas Munro
Reviewed-By: <this space could be yours>
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 doc/src/sgml/parallel.sgml                         | 17 -----
 src/backend/access/transam/parallel.c              | 14 ++---
 src/backend/optimizer/plan/planner.c               | 11 +---
 src/backend/storage/lmgr/lwlock.c                  |  3 +-
 src/backend/storage/lmgr/predicate.c               | 73 ++++++++++++++++++++--
 src/include/storage/lwlock.h                       |  1 +
 src/include/storage/predicate_internals.h          |  6 ++
 .../isolation/expected/serializable-parallel.out   | 44 +++++++++++++
 src/test/isolation/isolation_schedule              |  1 +
 .../isolation/specs/serializable-parallel.spec     | 48 ++++++++++++++
 10 files changed, 177 insertions(+), 41 deletions(-)

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index 2a25f21eb4b..d62a204d522 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -191,13 +191,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -249,16 +242,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         making it ineligible for parallel query.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index ce1b907debd..d5b710f03f3 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -27,6 +27,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate_internals.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -77,6 +78,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SERIALIZABLEXACT *parallel_master_serializablexact;
 
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
@@ -152,14 +154,6 @@ CreateParallelContext(const char *library_name, const char *function_name,
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.
-	 */
-	if (IsolationIsSerializable())
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -281,6 +275,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->parallel_master_serializablexact = GetSerializableXact();
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_FIXED, fps);
@@ -1089,6 +1084,9 @@ ParallelWorkerMain(Datum main_arg)
 	/* Set ParallelMasterBackendId so we know how to address temp relations. */
 	ParallelMasterBackendId = fps->parallel_master_backend_id;
 
+	/* Use the leader's SERIALIZABLEXACT. */
+	SetSerializableXact(fps->parallel_master_serializablexact);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 966230256ea..4367af1ca53 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -261,14 +261,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -276,8 +268,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 82a1cf5150b..21beb8d463c 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -494,7 +494,7 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 64;
+		LWLockTranchesAllocated = 128;
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 								   LWLockTranchesAllocated * sizeof(char *));
@@ -511,6 +511,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "sxact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 6a6d9d6d5cc..d47f0eedcce 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLEXACT's member 'lock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	PredicateLockHashPartitionLock(hashcode)
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target.
@@ -186,6 +194,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -1825,6 +1834,7 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	SHMQueueInit(&(sxact->predicateLocks));
 	SHMQueueElemInit(&(sxact->finishedLink));
 	sxact->flags = 0;
+	LWLockInitialize(&sxact->lock, LWTRANCHE_SXACT);
 	if (XactReadOnly)
 	{
 		sxact->flags |= SXACT_FLAG_READ_ONLY;
@@ -2107,6 +2117,14 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
 
 	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
 
+	if (IsInParallelMode())
+	{
+		Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+									LW_EXCLUSIVE) ||
+			   LWLockHeldByMeInMode(&MySerializableXact->lock,
+									LW_EXCLUSIVE));
+	}
+
 	/* Can't remove it until no locks at this target. */
 	if (!SHMQueueEmpty(&target->predicateLocks))
 		return;
@@ -2124,7 +2142,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2137,6 +2157,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2190,6 +2212,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2388,6 +2412,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2425,6 +2451,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2612,7 +2640,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2672,7 +2701,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2687,7 +2716,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3284,6 +3314,10 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* Only leader processes should release predicate locks. */
+	if (IsParallelWorker())
+		goto cleanup;
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
@@ -3570,6 +3604,7 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
+cleanup:
 	/* Delete per-transaction lock table */
 	if (LocalPredicateLockHash != NULL)
 	{
@@ -4259,6 +4294,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4293,6 +4330,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->lock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4841,6 +4880,11 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->lock in parallel mode because there cannot be
+	 * any parallel workers running while we are preparing a transaction.
+	 */
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -5049,3 +5093,22 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Accessor to allow parallel leaders to export the current SERIALIZABLEXACT
+ * to parallel workers.
+ */
+SERIALIZABLEXACT *
+GetSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+SetSerializableXact(SERIALIZABLEXACT *sxact)
+{
+	MySerializableXact = sxact;
+}
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 3d16132c88f..d9640139ae5 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 89874a5c3b6..64560d4d3a4 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	/* lock to protect predicateLocks list in parallel mode */
+	LWLock		lock;
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
@@ -475,5 +479,7 @@ typedef struct TwoPhasePredicateRecord
 extern PredicateLockData *GetPredicateLockStatusData(void);
 extern int GetSafeSnapshotBlockingPids(int blocked_pid,
 							int *output, int output_size);
+extern SERIALIZABLEXACT *GetSerializableXact(void);
+extern void SetSerializableXact(SERIALIZABLEXACT *sxact);
 
 #endif							/* PREDICATE_INTERNALS_H */
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 00000000000..f43aa6a2990
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 32c965b2a02..e428357e772 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -62,3 +62,4 @@ test: sequence-ddl
 test: async-notify
 test: vacuum-reltuples
 test: timeouts
+test: serializable-parallel
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 00000000000..0e7c2c7c1fa
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"
-- 
2.13.2

#15

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Thomas Munro (#14)

1 attachment(s)

Re: SERIALIZABLE with parallel query

On Fri, Sep 1, 2017 at 5:11 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Jun 28, 2017 at 11:21 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

[ssi-parallel-v5.patch]

Rebased.

Rebased again.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

ssi-parallel-v7.patchapplication/octet-stream; name=ssi-parallel-v7.patchDownload

From b5bdc1e89c3c1a7213c411c5348c600aa6d60ca6 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Fri, 1 Sep 2017 16:54:57 +1200
Subject: [PATCH] Enable SERIALIZABLE and parallel query to be used together.

Previously, the SERIALIZABLE isolation level prevented parallel query from
being used.  Allow the two features to be used together by sharing the
leader's SERIALIZABLEXACT with parallel workers.

Author: Thomas Munro
Reviewed-By: <this space could be yours>
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 doc/src/sgml/parallel.sgml                         | 17 -----
 src/backend/access/transam/parallel.c              | 14 ++---
 src/backend/optimizer/plan/planner.c               | 11 +---
 src/backend/storage/lmgr/lwlock.c                  |  1 +
 src/backend/storage/lmgr/predicate.c               | 73 ++++++++++++++++++++--
 src/include/storage/lwlock.h                       |  1 +
 src/include/storage/predicate_internals.h          |  6 ++
 .../isolation/expected/serializable-parallel.out   | 44 +++++++++++++
 src/test/isolation/isolation_schedule              |  1 +
 .../isolation/specs/serializable-parallel.spec     | 48 ++++++++++++++
 10 files changed, 176 insertions(+), 40 deletions(-)
 create mode 100644 src/test/isolation/expected/serializable-parallel.out
 create mode 100644 src/test/isolation/specs/serializable-parallel.spec

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index 2a25f21eb4b..d62a204d522 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -191,13 +191,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -249,16 +242,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         making it ineligible for parallel query.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 13c8ba3b196..5a64ead7516 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -28,6 +28,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate_internals.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -81,6 +82,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SERIALIZABLEXACT *parallel_master_serializablexact;
 
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
@@ -156,14 +158,6 @@ CreateParallelContext(const char *library_name, const char *function_name,
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.
-	 */
-	if (IsolationIsSerializable())
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -302,6 +296,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->parallel_master_serializablexact = GetSerializableXact();
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_FIXED, fps);
@@ -1124,6 +1119,9 @@ ParallelWorkerMain(Datum main_arg)
 	/* Set ParallelMasterBackendId so we know how to address temp relations. */
 	ParallelMasterBackendId = fps->parallel_master_backend_id;
 
+	/* Use the leader's SERIALIZABLEXACT. */
+	SetSerializableXact(fps->parallel_master_serializablexact);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7f146d670cb..b4ae70b93a7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -261,14 +261,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -276,8 +268,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index f1060f96757..9a9d3fa4d50 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -517,6 +517,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
 						  "session_typmod_table");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "sxact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 251a359bffc..c3f16d6f7a7 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLEXACT's member 'lock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	PredicateLockHashPartitionLock(hashcode)
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target.
@@ -186,6 +194,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -1825,6 +1834,7 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	SHMQueueInit(&(sxact->predicateLocks));
 	SHMQueueElemInit(&(sxact->finishedLink));
 	sxact->flags = 0;
+	LWLockInitialize(&sxact->lock, LWTRANCHE_SXACT);
 	if (XactReadOnly)
 	{
 		sxact->flags |= SXACT_FLAG_READ_ONLY;
@@ -2107,6 +2117,14 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
 
 	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
 
+	if (IsInParallelMode())
+	{
+		Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+									LW_EXCLUSIVE) ||
+			   LWLockHeldByMeInMode(&MySerializableXact->lock,
+									LW_EXCLUSIVE));
+	}
+
 	/* Can't remove it until no locks at this target. */
 	if (!SHMQueueEmpty(&target->predicateLocks))
 		return;
@@ -2124,7 +2142,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2137,6 +2157,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2190,6 +2212,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2388,6 +2412,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2425,6 +2451,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2612,7 +2640,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2672,7 +2701,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2687,7 +2716,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3284,6 +3314,10 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* Only leader processes should release predicate locks. */
+	if (IsParallelWorker())
+		goto cleanup;
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
@@ -3570,6 +3604,7 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
+cleanup:
 	/* Delete per-transaction lock table */
 	if (LocalPredicateLockHash != NULL)
 	{
@@ -4259,6 +4294,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4293,6 +4330,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->lock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4841,6 +4880,11 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->lock in parallel mode because there cannot be
+	 * any parallel workers running while we are preparing a transaction.
+	 */
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -5049,3 +5093,22 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Accessor to allow parallel leaders to export the current SERIALIZABLEXACT
+ * to parallel workers.
+ */
+SERIALIZABLEXACT *
+GetSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+SetSerializableXact(SERIALIZABLEXACT *sxact)
+{
+	MySerializableXact = sxact;
+}
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index f4c4aed7f91..50f63e22cef 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -216,6 +216,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SESSION_RECORD_TABLE,
 	LWTRANCHE_SESSION_TYPMOD_TABLE,
 	LWTRANCHE_TBM,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 89874a5c3b6..64560d4d3a4 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	/* lock to protect predicateLocks list in parallel mode */
+	LWLock		lock;
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
@@ -475,5 +479,7 @@ typedef struct TwoPhasePredicateRecord
 extern PredicateLockData *GetPredicateLockStatusData(void);
 extern int GetSafeSnapshotBlockingPids(int blocked_pid,
 							int *output, int output_size);
+extern SERIALIZABLEXACT *GetSerializableXact(void);
+extern void SetSerializableXact(SERIALIZABLEXACT *sxact);
 
 #endif							/* PREDICATE_INTERNALS_H */
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 00000000000..f43aa6a2990
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 32c965b2a02..e428357e772 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -62,3 +62,4 @@ test: sequence-ddl
 test: async-notify
 test: vacuum-reltuples
 test: timeouts
+test: serializable-parallel
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 00000000000..0e7c2c7c1fa
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"
-- 
2.13.2

#16

Haribabu Kommi

kommi.haribabu@gmail.com

over 8 years ago

In reply to: Thomas Munro (#15)

Re: SERIALIZABLE with parallel query

On Tue, Sep 19, 2017 at 11:42 AM, Thomas Munro <
thomas.munro@enterprisedb.com> wrote:

On Fri, Sep 1, 2017 at 5:11 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Jun 28, 2017 at 11:21 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

[ssi-parallel-v5.patch]

Rebased.

Rebased again.

During testing of this patch, I found some behavior difference
with the support of parallel query, while experimenting with the provided
test case in the patch.

But I tested the V6 patch, and I don't think that this version contains
any fixes other than rebase.

Test steps:

CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);

Session -1:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM bank_account WHERE id = 'Y';

Session -2:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET max_parallel_workers_per_gather = 2;
SET force_parallel_mode = on;
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set min_parallel_table_scan_size = 0;
set enable_indexscan = off;
set enable_bitmapscan = off;

SELECT balance FROM bank_account WHERE id = 'X';

Session -1:

update bank_account set balance = 10 where id = 'X';

Session -2:

update bank_account set balance = 10 where id = 'Y';
ERROR: could not serialize access due to read/write dependencies among
transactions
DETAIL: Reason code: Canceled on identification as a pivot, during write.
HINT: The transaction might succeed if retried.

Without the parallel query of select statement in session-2,
the update statement in session-2 is passed.

Regards,
Hari Babu
Fujitsu Australia

#17

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Haribabu Kommi (#16)

Re: SERIALIZABLE with parallel query

On Tue, Sep 19, 2017 at 1:47 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

During testing of this patch, I found some behavior difference
with the support of parallel query, while experimenting with the provided
test case in the patch.

But I tested the V6 patch, and I don't think that this version contains
any fixes other than rebase.

Test steps:

CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);

Session -1:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM bank_account WHERE id = 'Y';

Session -2:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET max_parallel_workers_per_gather = 2;
SET force_parallel_mode = on;
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set min_parallel_table_scan_size = 0;
set enable_indexscan = off;
set enable_bitmapscan = off;

SELECT balance FROM bank_account WHERE id = 'X';

Session -1:

update bank_account set balance = 10 where id = 'X';

Session -2:

update bank_account set balance = 10 where id = 'Y';
ERROR: could not serialize access due to read/write dependencies among
transactions
DETAIL: Reason code: Canceled on identification as a pivot, during write.
HINT: The transaction might succeed if retried.

Without the parallel query of select statement in session-2,
the update statement in session-2 is passed.

Hi Haribabu,

Thanks for looking at this!

Yeah. The difference seems to be that session 2 chooses a Parallel
Seq Scan instead of an Index Scan when you flip all those GUCs into
parallelism-is-free mode. Seq Scan takes a table-level predicate lock
(see heap_beginscan_internal()). But if you continue your example in
non-parallel mode (patched or unpatched), you'll find that only one of
those transactions can commit successfully.

Using the fancy notation in the papers about this stuff where w1[x=42]
means "write by transaction 1 on object x with value 42", let's see if
there is an apparent sequential order of these transactions that makes
sense:

Actual order: r1[Y=0] r2[X=0] w1[X=10] w2[Y=10] ... some commit order ...
Apparent order A: r2[X=0] w2[Y=10] c2 r1[Y=0*] w1[X=10] c1 (*nonsense)
Apparent order B: r1[Y=0] w1[X=10] c1 r2[X=0*] w2[Y=10] c2 (*nonsense)

Both potential commit orders are nonsensical. I think what happened
in your example was that a Seq Scan allowed the SSI algorithm to
reject a transaction sooner. Instead of r2[X=0], the executor sees
r2[X=0,Y=0] (we scanned the whole table, as if we read all objects, in
this case X and Y, even though we only asked to read X). Then the SSI
algorithm is able to detect a "dangerous structure" at w2[Y=10],
instead of later at commit time.

So I don't think this indicates a problem with the patch.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Haribabu Kommi

kommi.haribabu@gmail.com

over 8 years ago

In reply to: Thomas Munro (#17)

Re: SERIALIZABLE with parallel query

On Thu, Sep 21, 2017 at 4:13 PM, Thomas Munro <thomas.munro@enterprisedb.com

wrote:

On Tue, Sep 19, 2017 at 1:47 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

During testing of this patch, I found some behavior difference
with the support of parallel query, while experimenting with the provided
test case in the patch.

But I tested the V6 patch, and I don't think that this version contains
any fixes other than rebase.

Test steps:

CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT

NULL);

INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);

Session -1:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM bank_account WHERE id = 'Y';

Session -2:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET max_parallel_workers_per_gather = 2;
SET force_parallel_mode = on;
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set min_parallel_table_scan_size = 0;
set enable_indexscan = off;
set enable_bitmapscan = off;

SELECT balance FROM bank_account WHERE id = 'X';

Session -1:

update bank_account set balance = 10 where id = 'X';

Session -2:

update bank_account set balance = 10 where id = 'Y';
ERROR: could not serialize access due to read/write dependencies among
transactions
DETAIL: Reason code: Canceled on identification as a pivot, during

write.

HINT: The transaction might succeed if retried.

Without the parallel query of select statement in session-2,
the update statement in session-2 is passed.

Hi Thomas,

Yeah. The difference seems to be that session 2 chooses a Parallel
Seq Scan instead of an Index Scan when you flip all those GUCs into
parallelism-is-free mode. Seq Scan takes a table-level predicate lock
(see heap_beginscan_internal()). But if you continue your example in
non-parallel mode (patched or unpatched), you'll find that only one of
those transactions can commit successfully.

Yes, That's correct. Only one commit can be successful.

Using the fancy notation in the papers about this stuff where w1[x=42]
means "write by transaction 1 on object x with value 42", let's see if
there is an apparent sequential order of these transactions that makes
sense:

Actual order: r1[Y=0] r2[X=0] w1[X=10] w2[Y=10] ... some commit order ...
Apparent order A: r2[X=0] w2[Y=10] c2 r1[Y=0*] w1[X=10] c1 (*nonsense)
Apparent order B: r1[Y=0] w1[X=10] c1 r2[X=0*] w2[Y=10] c2 (*nonsense)

Both potential commit orders are nonsensical. I think what happened
in your example was that a Seq Scan allowed the SSI algorithm to
reject a transaction sooner. Instead of r2[X=0], the executor sees
r2[X=0,Y=0] (we scanned the whole table, as if we read all objects, in
this case X and Y, even though we only asked to read X). Then the SSI
algorithm is able to detect a "dangerous structure" at w2[Y=10],
instead of later at commit time.

Thanks for explaining with more details, now I can understand some more
about serialization.

After I tune the GUC to go with sequence scan, still I am not getting the
error
in the session-2 for update operation like it used to generate an error for
parallel
sequential scan, and also it even takes some many commands until unless the
S1
commits.

I am just thinking that with parallel sequential scan with serialize
isolation,
the user has lost the control of committing the desired session. I may be
thinking
a rare and never happen scenario.

I will continue my review on the latest patch and share any updates.

Regards,
Hari Babu
Fujitsu Australia

#19

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Haribabu Kommi (#18)

1 attachment(s)

Re: SERIALIZABLE with parallel query

On Mon, Sep 25, 2017 at 8:37 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

After I tune the GUC to go with sequence scan, still I am not getting the
error
in the session-2 for update operation like it used to generate an error for
parallel
sequential scan, and also it even takes some many commands until unless the
S1
commits.

Hmm. Then this requires more explanation because I don't expect a
difference. I did some digging and realised that the error detail
message "Reason code: Canceled on identification as a pivot, during
write." was reached in a code path that requires
SxactIsPrepared(writer) and also MySerializableXact == writer, which
means that the process believes it is committing. Clearly something
is wrong. After some more digging I realised that
ParallelWorkerMain() calls EndParallelWorkerTransaction() which calls
CommitTransaction() which calls
PreCommit_CheckForSerializationFailure(). Since the worker is
connected to the leader's SERIALIZABLEXACT, that finishes up being
marked as preparing to commit (not true!), and then the leader get
confused during that write, causing a serialization failure to be
raised sooner (though I can't explain why it should be raised then
anyway, but that's another topic). Oops. I think the fix here is
just not to do that in a worker (the worker's CommitTransaction()
doesn't really mean what it says).

Here's a version with a change that makes that conditional. This way
your test case behaves the same as non-parallel mode.

I will continue my review on the latest patch and share any updates.

Thanks!

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

ssi-parallel-v8.patchapplication/octet-stream; name=ssi-parallel-v8.patchDownload

From 67a3b38f923a997e8add64feb7993104041c089c Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Fri, 1 Sep 2017 16:54:57 +1200
Subject: [PATCH] Enable SERIALIZABLE and parallel query to be used together.

Previously, the SERIALIZABLE isolation level prevented parallel query from
being used.  Allow the two features to be used together by sharing the
leader's SERIALIZABLEXACT with parallel workers.

Author: Thomas Munro
Reviewed-By: Haribabu Kommi
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 doc/src/sgml/parallel.sgml                         | 17 -----
 src/backend/access/transam/parallel.c              | 14 ++---
 src/backend/access/transam/xact.c                  |  7 ++-
 src/backend/optimizer/plan/planner.c               | 11 +---
 src/backend/storage/lmgr/lwlock.c                  |  1 +
 src/backend/storage/lmgr/predicate.c               | 73 ++++++++++++++++++++--
 src/include/storage/lwlock.h                       |  1 +
 src/include/storage/predicate_internals.h          |  6 ++
 .../isolation/expected/serializable-parallel.out   | 44 +++++++++++++
 src/test/isolation/isolation_schedule              |  1 +
 .../isolation/specs/serializable-parallel.spec     | 48 ++++++++++++++
 11 files changed, 181 insertions(+), 42 deletions(-)
 create mode 100644 src/test/isolation/expected/serializable-parallel.out
 create mode 100644 src/test/isolation/specs/serializable-parallel.spec

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index 2a25f21eb4b..d62a204d522 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -191,13 +191,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -249,16 +242,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         making it ineligible for parallel query.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 13c8ba3b196..5a64ead7516 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -28,6 +28,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate_internals.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -81,6 +82,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SERIALIZABLEXACT *parallel_master_serializablexact;
 
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
@@ -156,14 +158,6 @@ CreateParallelContext(const char *library_name, const char *function_name,
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.
-	 */
-	if (IsolationIsSerializable())
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -302,6 +296,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->parallel_master_serializablexact = GetSerializableXact();
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_FIXED, fps);
@@ -1124,6 +1119,9 @@ ParallelWorkerMain(Datum main_arg)
 	/* Set ParallelMasterBackendId so we know how to address temp relations. */
 	ParallelMasterBackendId = fps->parallel_master_backend_id;
 
+	/* Use the leader's SERIALIZABLEXACT. */
+	SetSerializableXact(fps->parallel_master_serializablexact);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 93dca7a72af..0d1d6422453 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2011,9 +2011,12 @@ CommitTransaction(void)
 	/*
 	 * Mark serializable transaction as complete for predicate locking
 	 * purposes.  This should be done as late as we can put it and still allow
-	 * errors to be raised for failure patterns found at commit.
+	 * errors to be raised for failure patterns found at commit.  This is not
+	 * appropriate in a parallel worker however, because we aren't committing
+	 * the leader's transaction and its serializable state will go on.
 	 */
-	PreCommit_CheckForSerializationFailure();
+	if (!is_parallel_worker)
+		PreCommit_CheckForSerializationFailure();
 
 	/*
 	 * Insert notifications sent by NOTIFY commands into the queue.  This
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7f146d670cb..b4ae70b93a7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -261,14 +261,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -276,8 +268,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index f1060f96757..9a9d3fa4d50 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -517,6 +517,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
 						  "session_typmod_table");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "sxact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 251a359bffc..c3f16d6f7a7 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLEXACT's member 'lock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	PredicateLockHashPartitionLock(hashcode)
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target.
@@ -186,6 +194,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -1825,6 +1834,7 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	SHMQueueInit(&(sxact->predicateLocks));
 	SHMQueueElemInit(&(sxact->finishedLink));
 	sxact->flags = 0;
+	LWLockInitialize(&sxact->lock, LWTRANCHE_SXACT);
 	if (XactReadOnly)
 	{
 		sxact->flags |= SXACT_FLAG_READ_ONLY;
@@ -2107,6 +2117,14 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
 
 	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
 
+	if (IsInParallelMode())
+	{
+		Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+									LW_EXCLUSIVE) ||
+			   LWLockHeldByMeInMode(&MySerializableXact->lock,
+									LW_EXCLUSIVE));
+	}
+
 	/* Can't remove it until no locks at this target. */
 	if (!SHMQueueEmpty(&target->predicateLocks))
 		return;
@@ -2124,7 +2142,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2137,6 +2157,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2190,6 +2212,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2388,6 +2412,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2425,6 +2451,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2612,7 +2640,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2672,7 +2701,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2687,7 +2716,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3284,6 +3314,10 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* Only leader processes should release predicate locks. */
+	if (IsParallelWorker())
+		goto cleanup;
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
@@ -3570,6 +3604,7 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
+cleanup:
 	/* Delete per-transaction lock table */
 	if (LocalPredicateLockHash != NULL)
 	{
@@ -4259,6 +4294,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4293,6 +4330,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->lock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4841,6 +4880,11 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->lock in parallel mode because there cannot be
+	 * any parallel workers running while we are preparing a transaction.
+	 */
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -5049,3 +5093,22 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Accessor to allow parallel leaders to export the current SERIALIZABLEXACT
+ * to parallel workers.
+ */
+SERIALIZABLEXACT *
+GetSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+SetSerializableXact(SERIALIZABLEXACT *sxact)
+{
+	MySerializableXact = sxact;
+}
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index f4c4aed7f91..50f63e22cef 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -216,6 +216,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SESSION_RECORD_TABLE,
 	LWTRANCHE_SESSION_TYPMOD_TABLE,
 	LWTRANCHE_TBM,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 89874a5c3b6..64560d4d3a4 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	/* lock to protect predicateLocks list in parallel mode */
+	LWLock		lock;
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
@@ -475,5 +479,7 @@ typedef struct TwoPhasePredicateRecord
 extern PredicateLockData *GetPredicateLockStatusData(void);
 extern int GetSafeSnapshotBlockingPids(int blocked_pid,
 							int *output, int output_size);
+extern SERIALIZABLEXACT *GetSerializableXact(void);
+extern void SetSerializableXact(SERIALIZABLEXACT *sxact);
 
 #endif							/* PREDICATE_INTERNALS_H */
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 00000000000..f43aa6a2990
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 32c965b2a02..e428357e772 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -62,3 +62,4 @@ test: sequence-ddl
 test: async-notify
 test: vacuum-reltuples
 test: timeouts
+test: serializable-parallel
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 00000000000..0e7c2c7c1fa
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"
-- 
2.14.1

#20

Haribabu Kommi

kommi.haribabu@gmail.com

over 8 years ago

In reply to: Thomas Munro (#19)

Re: SERIALIZABLE with parallel query

On Mon, Sep 25, 2017 at 6:57 PM, Thomas Munro <thomas.munro@enterprisedb.com

wrote:

On Mon, Sep 25, 2017 at 8:37 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

After I tune the GUC to go with sequence scan, still I am not getting the
error
in the session-2 for update operation like it used to generate an error

for

parallel
sequential scan, and also it even takes some many commands until unless

the

S1
commits.

Hmm. Then this requires more explanation because I don't expect a
difference. I did some digging and realised that the error detail
message "Reason code: Canceled on identification as a pivot, during
write." was reached in a code path that requires
SxactIsPrepared(writer) and also MySerializableXact == writer, which
means that the process believes it is committing. Clearly something
is wrong. After some more digging I realised that
ParallelWorkerMain() calls EndParallelWorkerTransaction() which calls
CommitTransaction() which calls
PreCommit_CheckForSerializationFailure(). Since the worker is
connected to the leader's SERIALIZABLEXACT, that finishes up being
marked as preparing to commit (not true!), and then the leader get
confused during that write, causing a serialization failure to be
raised sooner (though I can't explain why it should be raised then
anyway, but that's another topic). Oops. I think the fix here is
just not to do that in a worker (the worker's CommitTransaction()
doesn't really mean what it says).

Here's a version with a change that makes that conditional. This way
your test case behaves the same as non-parallel mode.

With the latest patch, I didn't find any problems.

I will continue my review on the latest patch and share any updates.

Thanks!

The patch looks good, and I don't have any comments for the code.
The test that is going to add by the patch is not generating a true
parallelism scenario, I feel it is better to change the test that can
generate a parallel sequence/index/bitmap scan.

Regards,
Hari Babu
Fujitsu Australia

#21

Haribabu Kommi

kommi.haribabu@gmail.com

about 8 years ago

In reply to: Haribabu Kommi (#20)

Re: [HACKERS] SERIALIZABLE with parallel query

On Tue, Sep 26, 2017 at 4:41 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Mon, Sep 25, 2017 at 6:57 PM, Thomas Munro <
thomas.munro@enterprisedb.com> wrote:

On Mon, Sep 25, 2017 at 8:37 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

After I tune the GUC to go with sequence scan, still I am not getting

the

error
in the session-2 for update operation like it used to generate an error

for

parallel
sequential scan, and also it even takes some many commands until unless

the

S1
commits.

Hmm. Then this requires more explanation because I don't expect a
difference. I did some digging and realised that the error detail
message "Reason code: Canceled on identification as a pivot, during
write." was reached in a code path that requires
SxactIsPrepared(writer) and also MySerializableXact == writer, which
means that the process believes it is committing. Clearly something
is wrong. After some more digging I realised that
ParallelWorkerMain() calls EndParallelWorkerTransaction() which calls
CommitTransaction() which calls
PreCommit_CheckForSerializationFailure(). Since the worker is
connected to the leader's SERIALIZABLEXACT, that finishes up being
marked as preparing to commit (not true!), and then the leader get
confused during that write, causing a serialization failure to be
raised sooner (though I can't explain why it should be raised then
anyway, but that's another topic). Oops. I think the fix here is
just not to do that in a worker (the worker's CommitTransaction()
doesn't really mean what it says).

Here's a version with a change that makes that conditional. This way
your test case behaves the same as non-parallel mode.

The patch looks good, and I don't have any comments for the code.
The test that is going to add by the patch is not generating a true
parallelism scenario, I feel it is better to change the test that can
generate a parallel sequence/index/bitmap scan.

The latest patch is good. It lacks a test that verifies the serialize
support with actual parallel workers, so in case if it broken, it is
difficult
to know.

Regards,
Hari Babu
Fujitsu Australia

#22

Michael Paquier

michael.paquier@gmail.com

about 8 years ago

In reply to: Haribabu Kommi (#21)

Re: [HACKERS] SERIALIZABLE with parallel query

On Fri, Nov 24, 2017 at 1:06 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

The latest patch is good. It lacks a test that verifies the serialize
support with actual parallel workers, so in case if it broken, it is
difficult to know.

Could this question be answered? The patch still applies so I am
moving it to next CF.
--
Michael

#23

Thomas Munro

thomas.munro@enterprisedb.com

about 8 years ago

In reply to: Michael Paquier (#22)

Re: [HACKERS] SERIALIZABLE with parallel query

On Thu, Nov 30, 2017 at 2:32 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Nov 24, 2017 at 1:06 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

The latest patch is good. It lacks a test that verifies the serialize
support with actual parallel workers, so in case if it broken, it is
difficult to know.

Could this question be answered? The patch still applies so I am
moving it to next CF.

Thanks. The answer is: It does run queries in two different
backends, proving that different backends associated with the same
session are correctly detecting conflicts and enabling the SSI
algorithm to work. But yeah, Haribabu is right that it doesn't ever
cause them to run simultaneously in a way that would cause the new
locking to contend (or break if the locking code is incorrect). I
have been unable to think of a good way to do that in a regression or
isolation test so far.

--
Thomas Munro
http://www.enterprisedb.com

#24

Thomas Munro

thomas.munro@enterprisedb.com

about 8 years ago

In reply to: Thomas Munro (#23)

1 attachment(s)

Re: [HACKERS] SERIALIZABLE with parallel query

On Thu, Nov 30, 2017 at 2:44 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Nov 30, 2017 at 2:32 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Could this question be answered? The patch still applies so I am
moving it to next CF.

Rebased, 'cause it broke.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

ssi-parallel-v9.patchapplication/octet-stream; name=ssi-parallel-v9.patchDownload

From 1506bd4cfd873df49021c24233b441e002fd09a0 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Fri, 1 Sep 2017 16:54:57 +1200
Subject: [PATCH] Enable SERIALIZABLE and parallel query to be used together.

Previously, the SERIALIZABLE isolation level prevented parallel query from
being used.  Allow the two features to be used together by sharing the
leader's SERIALIZABLEXACT with parallel workers.

Author: Thomas Munro
Reviewed-By: Haribabu Kommi
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 doc/src/sgml/parallel.sgml                         | 17 -----
 src/backend/access/transam/parallel.c              | 14 ++---
 src/backend/access/transam/xact.c                  |  7 ++-
 src/backend/optimizer/plan/planner.c               | 11 +---
 src/backend/storage/lmgr/lwlock.c                  |  1 +
 src/backend/storage/lmgr/predicate.c               | 73 ++++++++++++++++++++--
 src/include/storage/lwlock.h                       |  1 +
 src/include/storage/predicate_internals.h          |  6 ++
 .../isolation/expected/serializable-parallel.out   | 44 +++++++++++++
 src/test/isolation/isolation_schedule              |  1 +
 .../isolation/specs/serializable-parallel.spec     | 48 ++++++++++++++
 11 files changed, 181 insertions(+), 42 deletions(-)
 create mode 100644 src/test/isolation/expected/serializable-parallel.out
 create mode 100644 src/test/isolation/specs/serializable-parallel.spec

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index f15a9233cbf..9507f1ae2ef 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -192,13 +192,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -241,16 +234,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         that may be suboptimal when run serially.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index d3431a7c306..f12e46ffdd6 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -28,6 +28,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate_internals.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -83,6 +84,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SERIALIZABLEXACT *parallel_master_serializablexact;
 
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
@@ -158,14 +160,6 @@ CreateParallelContext(const char *library_name, const char *function_name,
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.
-	 */
-	if (IsolationIsSerializable())
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -306,6 +300,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->parallel_master_serializablexact = GetSerializableXact();
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_FIXED, fps);
@@ -1137,6 +1132,9 @@ ParallelWorkerMain(Datum main_arg)
 	/* Set ParallelMasterBackendId so we know how to address temp relations. */
 	ParallelMasterBackendId = fps->parallel_master_backend_id;
 
+	/* Use the leader's SERIALIZABLEXACT. */
+	SetSerializableXact(fps->parallel_master_serializablexact);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 046898c6190..be31c1d917b 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2003,9 +2003,12 @@ CommitTransaction(void)
 	/*
 	 * Mark serializable transaction as complete for predicate locking
 	 * purposes.  This should be done as late as we can put it and still allow
-	 * errors to be raised for failure patterns found at commit.
+	 * errors to be raised for failure patterns found at commit.  This is not
+	 * appropriate in a parallel worker however, because we aren't committing
+	 * the leader's transaction and its serializable state will go on.
 	 */
-	PreCommit_CheckForSerializationFailure();
+	if (!is_parallel_worker)
+		PreCommit_CheckForSerializationFailure();
 
 	/*
 	 * Insert notifications sent by NOTIFY commands into the queue.  This
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e8bc15c35d2..9aa80e69b03 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -272,14 +272,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -287,8 +279,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 46f5c4277d4..fefb16dccd0 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -518,6 +518,7 @@ RegisterLWLockTranches(void)
 						  "session_typmod_table");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "sxact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 251a359bffc..c3f16d6f7a7 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLEXACT's member 'lock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	PredicateLockHashPartitionLock(hashcode)
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target.
@@ -186,6 +194,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -1825,6 +1834,7 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	SHMQueueInit(&(sxact->predicateLocks));
 	SHMQueueElemInit(&(sxact->finishedLink));
 	sxact->flags = 0;
+	LWLockInitialize(&sxact->lock, LWTRANCHE_SXACT);
 	if (XactReadOnly)
 	{
 		sxact->flags |= SXACT_FLAG_READ_ONLY;
@@ -2107,6 +2117,14 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
 
 	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
 
+	if (IsInParallelMode())
+	{
+		Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+									LW_EXCLUSIVE) ||
+			   LWLockHeldByMeInMode(&MySerializableXact->lock,
+									LW_EXCLUSIVE));
+	}
+
 	/* Can't remove it until no locks at this target. */
 	if (!SHMQueueEmpty(&target->predicateLocks))
 		return;
@@ -2124,7 +2142,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2137,6 +2157,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2190,6 +2212,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2388,6 +2412,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2425,6 +2451,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2612,7 +2640,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2672,7 +2701,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2687,7 +2716,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3284,6 +3314,10 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* Only leader processes should release predicate locks. */
+	if (IsParallelWorker())
+		goto cleanup;
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
@@ -3570,6 +3604,7 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
+cleanup:
 	/* Delete per-transaction lock table */
 	if (LocalPredicateLockHash != NULL)
 	{
@@ -4259,6 +4294,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4293,6 +4330,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->lock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4841,6 +4880,11 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->lock in parallel mode because there cannot be
+	 * any parallel workers running while we are preparing a transaction.
+	 */
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -5049,3 +5093,22 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Accessor to allow parallel leaders to export the current SERIALIZABLEXACT
+ * to parallel workers.
+ */
+SERIALIZABLEXACT *
+GetSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+SetSerializableXact(SERIALIZABLEXACT *sxact)
+{
+	MySerializableXact = sxact;
+}
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 460843d73e2..d58f9f3cd6a 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -217,6 +217,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SESSION_TYPMOD_TABLE,
 	LWTRANCHE_TBM,
 	LWTRANCHE_PARALLEL_APPEND,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 89874a5c3b6..64560d4d3a4 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	/* lock to protect predicateLocks list in parallel mode */
+	LWLock		lock;
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
@@ -475,5 +479,7 @@ typedef struct TwoPhasePredicateRecord
 extern PredicateLockData *GetPredicateLockStatusData(void);
 extern int GetSafeSnapshotBlockingPids(int blocked_pid,
 							int *output, int output_size);
+extern SERIALIZABLEXACT *GetSerializableXact(void);
+extern void SetSerializableXact(SERIALIZABLEXACT *sxact);
 
 #endif							/* PREDICATE_INTERNALS_H */
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 00000000000..f43aa6a2990
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index e41b9164cd0..8486b3c0173 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -63,3 +63,4 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+test: serializable-parallel
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 00000000000..0e7c2c7c1fa
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"
-- 
2.15.0

#25

Haribabu Kommi

kommi.haribabu@gmail.com

about 8 years ago

In reply to: Thomas Munro (#24)

Re: [HACKERS] SERIALIZABLE with parallel query

On Fri, Dec 8, 2017 at 11:33 AM, Thomas Munro <thomas.munro@enterprisedb.com

wrote:

On Thu, Nov 30, 2017 at 2:44 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Nov 30, 2017 at 2:32 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Could this question be answered? The patch still applies so I am
moving it to next CF.

Rebased, 'cause it broke.

Thanks for explaining the problem in generating an isolation test to
test the serialize parallel query.

Committer can decide whether existing test is fine to part of the test suite
or remove it, other than that everything is fine. so I am moving the patch
into "ready for committer" state.

Regards,
Hari Babu
Fujitsu Australia

#26

Thomas Munro

thomas.munro@enterprisedb.com

about 8 years ago

In reply to: Haribabu Kommi (#25)

Re: [HACKERS] SERIALIZABLE with parallel query

On Wed, Dec 13, 2017 at 2:09 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

Thanks for explaining the problem in generating an isolation test to
test the serialize parallel query.

Committer can decide whether existing test is fine to part of the test suite
or remove it, other than that everything is fine. so I am moving the patch
into "ready for committer" state.

Thank you! I will try to find a good benchmark that will really
exercise parallel query + SSI.

--
Thomas Munro
http://www.enterprisedb.com

#27

Thomas Munro

thomas.munro@enterprisedb.com

almost 8 years ago

In reply to: Thomas Munro (#26)

1 attachment(s)

Re: [HACKERS] SERIALIZABLE with parallel query

On Wed, Dec 13, 2017 at 5:30 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Dec 13, 2017 at 2:09 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

Thanks for explaining the problem in generating an isolation test to
test the serialize parallel query.

Committer can decide whether existing test is fine to part of the test suite
or remove it, other than that everything is fine. so I am moving the patch
into "ready for committer" state.

Thank you! I will try to find a good benchmark that will really
exercise parallel query + SSI.

This started crashing some time yesterday with an assertion failure in
the isolation tests after commit 2badb5af landed. Reordering of code
in parallel.c confused patch's fuzz heuristics leading
SetSerializableXact() to be called too soon. Here's a fix for that.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

ssi-parallel-v10.patchapplication/octet-stream; name=ssi-parallel-v10.patchDownload

From e155cc5d45799a0f4a26cf3ce3d01483fc467f27 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Fri, 1 Sep 2017 16:54:57 +1200
Subject: [PATCH] Enable SERIALIZABLE and parallel query to be used together.

Previously, the SERIALIZABLE isolation level prevented parallel query from
being used.  Allow the two features to be used together by sharing the
leader's SERIALIZABLEXACT with parallel workers.

Author: Thomas Munro
Reviewed-By: Haribabu Kommi
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 doc/src/sgml/parallel.sgml                         | 17 -----
 src/backend/access/transam/parallel.c              | 14 ++---
 src/backend/access/transam/xact.c                  |  7 ++-
 src/backend/optimizer/plan/planner.c               | 11 +---
 src/backend/storage/lmgr/lwlock.c                  |  1 +
 src/backend/storage/lmgr/predicate.c               | 73 ++++++++++++++++++++--
 src/include/storage/lwlock.h                       |  1 +
 src/include/storage/predicate_internals.h          |  6 ++
 .../isolation/expected/serializable-parallel.out   | 44 +++++++++++++
 src/test/isolation/isolation_schedule              |  1 +
 .../isolation/specs/serializable-parallel.spec     | 48 ++++++++++++++
 11 files changed, 181 insertions(+), 42 deletions(-)
 create mode 100644 src/test/isolation/expected/serializable-parallel.out
 create mode 100644 src/test/isolation/specs/serializable-parallel.spec

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index f15a9233cbf..9507f1ae2ef 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -192,13 +192,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -241,16 +234,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         that may be suboptimal when run serially.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 54d9ea7be05..bcc92e9197f 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -29,6 +29,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate_internals.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -85,6 +86,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SERIALIZABLEXACT *parallel_master_serializablexact;
 
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
@@ -164,14 +166,6 @@ CreateParallelContext(const char *library_name, const char *function_name,
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.
-	 */
-	if (IsolationIsSerializable())
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -315,6 +309,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->parallel_master_serializablexact = GetSerializableXact();
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_FIXED, fps);
@@ -1239,6 +1234,9 @@ ParallelWorkerMain(Datum main_arg)
 	reindexspace = shm_toc_lookup(toc, PARALLEL_KEY_REINDEX_STATE, false);
 	RestoreReindexState(reindexspace);
 
+	/* Use the leader's SERIALIZABLEXACT. */
+	SetSerializableXact(fps->parallel_master_serializablexact);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index ea81f4b5de3..994abac7546 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2004,9 +2004,12 @@ CommitTransaction(void)
 	/*
 	 * Mark serializable transaction as complete for predicate locking
 	 * purposes.  This should be done as late as we can put it and still allow
-	 * errors to be raised for failure patterns found at commit.
+	 * errors to be raised for failure patterns found at commit.  This is not
+	 * appropriate in a parallel worker however, because we aren't committing
+	 * the leader's transaction and its serializable state will go on.
 	 */
-	PreCommit_CheckForSerializationFailure();
+	if (!is_parallel_worker)
+		PreCommit_CheckForSerializationFailure();
 
 	/*
 	 * Insert notifications sent by NOTIFY commands into the queue.  This
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 53870432ea7..8020cb53cb0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -272,14 +272,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -287,8 +279,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 71caac1a1f4..2e12049ff6b 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -520,6 +520,7 @@ RegisterLWLockTranches(void)
 						  "shared_tuplestore");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "sxact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index d1ff2b1edcd..a3bfbb65ece 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLEXACT's member 'lock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	PredicateLockHashPartitionLock(hashcode)
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target.
@@ -186,6 +194,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -1825,6 +1834,7 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	SHMQueueInit(&(sxact->predicateLocks));
 	SHMQueueElemInit(&(sxact->finishedLink));
 	sxact->flags = 0;
+	LWLockInitialize(&sxact->lock, LWTRANCHE_SXACT);
 	if (XactReadOnly)
 	{
 		sxact->flags |= SXACT_FLAG_READ_ONLY;
@@ -2107,6 +2117,14 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
 
 	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
 
+	if (IsInParallelMode())
+	{
+		Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+									LW_EXCLUSIVE) ||
+			   LWLockHeldByMeInMode(&MySerializableXact->lock,
+									LW_EXCLUSIVE));
+	}
+
 	/* Can't remove it until no locks at this target. */
 	if (!SHMQueueEmpty(&target->predicateLocks))
 		return;
@@ -2124,7 +2142,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2137,6 +2157,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2190,6 +2212,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2388,6 +2412,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2425,6 +2451,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2612,7 +2640,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2672,7 +2701,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2687,7 +2716,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3284,6 +3314,10 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* Only leader processes should release predicate locks. */
+	if (IsParallelWorker())
+		goto cleanup;
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
@@ -3570,6 +3604,7 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
+cleanup:
 	/* Delete per-transaction lock table */
 	if (LocalPredicateLockHash != NULL)
 	{
@@ -4259,6 +4294,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4293,6 +4330,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->lock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4841,6 +4880,11 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->lock in parallel mode because there cannot be
+	 * any parallel workers running while we are preparing a transaction.
+	 */
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -5049,3 +5093,22 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Accessor to allow parallel leaders to export the current SERIALIZABLEXACT
+ * to parallel workers.
+ */
+SERIALIZABLEXACT *
+GetSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+SetSerializableXact(SERIALIZABLEXACT *sxact)
+{
+	MySerializableXact = sxact;
+}
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index c21bfe2f666..b25c43fc6be 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SHARED_TUPLESTORE,
 	LWTRANCHE_TBM,
 	LWTRANCHE_PARALLEL_APPEND,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 0f736d37dff..a5d975a3f60 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	/* lock to protect predicateLocks list in parallel mode */
+	LWLock		lock;
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
@@ -475,5 +479,7 @@ typedef struct TwoPhasePredicateRecord
 extern PredicateLockData *GetPredicateLockStatusData(void);
 extern int GetSafeSnapshotBlockingPids(int blocked_pid,
 							int *output, int output_size);
+extern SERIALIZABLEXACT *GetSerializableXact(void);
+extern void SetSerializableXact(SERIALIZABLEXACT *sxact);
 
 #endif							/* PREDICATE_INTERNALS_H */
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 00000000000..f43aa6a2990
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 74d7d59546a..aed46d8d549 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,3 +66,4 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+test: serializable-parallel
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 00000000000..0e7c2c7c1fa
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"
-- 
2.15.1

#28

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Thomas Munro (#27)

Re: [HACKERS] SERIALIZABLE with parallel query

On Wed, Jan 24, 2018 at 7:39 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

This started crashing some time yesterday with an assertion failure in
the isolation tests after commit 2badb5af landed. Reordering of code
in parallel.c confused patch's fuzz heuristics leading
SetSerializableXact() to be called too soon. Here's a fix for that.

I took a look at this today and thought it might be OK to commit,
modulo a few minor issues: (1) you didn't document the new tranche and
(2) I prefer to avoid if (blah) { Assert(thing) } in favor of
Assert(!blah || thing).

But then I got a little bit concerned about ReleasePredicateLocks().
Suppose that in the middle of a read-only transaction,
SXACT_FLAG_RO_SAFE becomes true. The next call to
SerializationNeededForRead in each process will call
ReleasePredicateLocks. In the workers, this won't do anything, so
we'll just keep coming back there. But in the leader, we'll go ahead
do all that stuff, including MySerializableXact =
InvalidSerializableXact. But in the workers, it's still set. Maybe
that's OK, but I'm not sure that it's OK...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#29

Thomas Munro

thomas.munro@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#28)

1 attachment(s)

Re: [HACKERS] SERIALIZABLE with parallel query

On Fri, Jan 26, 2018 at 4:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I took a look at this today and thought it might be OK to commit,

Thank you for looking at this!

modulo a few minor issues: (1) you didn't document the new tranche and

Fixed.

(2) I prefer to avoid if (blah) { Assert(thing) } in favor of
Assert(!blah || thing).

Done.

But then I got a little bit concerned about ReleasePredicateLocks().
Suppose that in the middle of a read-only transaction,
SXACT_FLAG_RO_SAFE becomes true. The next call to
SerializationNeededForRead in each process will call
ReleasePredicateLocks. In the workers, this won't do anything, so
we'll just keep coming back there. But in the leader, we'll go ahead
do all that stuff, including MySerializableXact =
InvalidSerializableXact. But in the workers, it's still set. Maybe
that's OK, but I'm not sure that it's OK...

Ouch. Yeah. It's not OK. If the leader gives up its
SERIALIZABLEXACT object early due to that safe-read-only optimisation,
the workers are left with a dangling pointer to a SERIALIZABLEXACT
object that has been pushed onto FinishedSerializableTransactions.
From there it will move to PredXact->availableTransactions and might
be handed out to some other transaction, so it is not safe to retain a
pointer to that object.

The best solution I have come up with so far is to add a reference
count to SERIALIZABLEXACT. I toyed with putting the refcount into the
DSM instead, but then I ran into problems making that work when you
have a query with multiple Gather nodes. Since the refcount is in
SERIALIZABLEXACT I also had to add a generation counter so that I
could detect the case where you try to attach too late (the leader has
already errored out, the refcount has reached 0 and the
SERIALIZABLEXACT object has been recycled).

The attached is a draft patch only, needing some testing and polish.
Brickbats, better ideas?

FWIW I also considered a couple of other ideas: (1) Keeping the
object alive on the FinishedSerializableTransactions list until the
leader's transaction is finished seems broken because we need to be
able to spill that list to the SLRU at any time, and if we somehow
made them sticky we could run out of memory. (2) Anything involving
the leader having sole control of the object lifetime seems
problematic... well, it might work if you disabled the
SXACT_FLAG_RO_SAFE optimisation so that ReleasePredicateLocks() always
happens after all workers have finished, but that seems like cheating.

PS I noticed that for BecomeLockGroupMember() we say "If we can't
join the lock group, the leader has gone away, so just exit quietly"
but for various other similar things we spew errors (most commonly
seen one being "ERROR: could not map dynamic shared memory segment").
Intentional?

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

ssi-parallel-v11.patchapplication/octet-stream; name=ssi-parallel-v11.patchDownload

From 7b153e82a9d73dae4eb1e2b639b37bc808359857 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Fri, 1 Sep 2017 16:54:57 +1200
Subject: [PATCH] Enable SERIALIZABLE and parallel query to be used together.

Previously, the SERIALIZABLE isolation level prevented parallel query from
being used.  Allow the two features to be used together by sharing the
leader's SERIALIZABLEXACT with parallel workers.

Remove the serializable_okay flag added to CreateParallelContext() by commit
9da0cc35284bdbe8d442d732963303ff0e0a40bc, because it's now redundant.

Author: Thomas Munro
Reviewed-By: Haribabu Kommi, Robert Haas
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 doc/src/sgml/monitoring.sgml                       |   7 +-
 doc/src/sgml/parallel.sgml                         |  17 --
 src/backend/access/nbtree/nbtsort.c                |   2 +-
 src/backend/access/transam/parallel.c              |  21 +--
 src/backend/access/transam/xact.c                  |   7 +-
 src/backend/executor/execParallel.c                |   2 +-
 src/backend/optimizer/plan/planner.c               |  11 +-
 src/backend/storage/lmgr/lwlock.c                  |   1 +
 src/backend/storage/lmgr/predicate.c               | 203 ++++++++++++++++++++-
 src/include/access/parallel.h                      |   3 +-
 src/include/storage/lwlock.h                       |   1 +
 src/include/storage/predicate.h                    |  14 ++
 src/include/storage/predicate_internals.h          |   7 +-
 .../isolation/expected/serializable-parallel.out   |  44 +++++
 src/test/isolation/isolation_schedule              |   1 +
 .../isolation/specs/serializable-parallel.spec     |  48 +++++
 16 files changed, 333 insertions(+), 56 deletions(-)
 create mode 100644 src/test/isolation/expected/serializable-parallel.out
 create mode 100644 src/test/isolation/specs/serializable-parallel.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e138d1ef076..64f60cbdc54 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="63"><literal>LWLock</literal></entry>
+        <entry morerows="64"><literal>LWLock</literal></entry>
         <entry><literal>ShmemIndexLock</literal></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -979,6 +979,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting to perform an operation on a list of locks held by
          serializable transactions.</entry>
         </row>
+        <row>
+         <entry><literal>sxact</literal></entry>
+         <entry>Waiting to perform an operation on a serializable transaction
+         in a parallel query.</entry>
+        </row>
         <row>
          <entry><literal>OldSerXidLock</literal></entry>
          <entry>Waiting to read or record conflicting serializable
diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index f15a9233cbf..9507f1ae2ef 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -192,13 +192,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -241,16 +234,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         that may be suboptimal when run serially.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 521ae6e5f77..98abcc6bf3a 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1207,7 +1207,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	EnterParallelMode();
 	Assert(request > 0);
 	pcxt = CreateParallelContext("postgres", "_bt_parallel_build_main",
-								 request, true);
+								 request);
 	scantuplesortstates = leaderparticipates ? request + 1 : request;
 
 	/*
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 9d4efc0f8fc..542d4fae2e4 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -30,6 +30,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -86,6 +87,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SerializableXactHandle serializable_xact_handle;
 
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
@@ -150,7 +152,7 @@ static void ParallelWorkerShutdown(int code, Datum arg);
  */
 ParallelContext *
 CreateParallelContext(const char *library_name, const char *function_name,
-					  int nworkers, bool serializable_okay)
+					  int nworkers)
 {
 	MemoryContext oldcontext;
 	ParallelContext *pcxt;
@@ -168,16 +170,6 @@ CreateParallelContext(const char *library_name, const char *function_name,
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.  Utility statement callers may ask us to ignore this
-	 * restriction because they're always able to safely ignore the fact that
-	 * SIREAD locks do not work with parallelism.
-	 */
-	if (IsolationIsSerializable() && !serializable_okay)
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -321,6 +313,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	ShareSerializableXact(&fps->serializable_xact_handle);
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_FIXED, fps);
@@ -886,6 +879,9 @@ DestroyParallelContext(ParallelContext *pcxt)
 		}
 	}
 
+	/* Stop sharing our serializable transaction with workers. */
+	UnshareSerializableXact();
+
 	/*
 	 * If we have allocated a shared memory segment, detach it.  This will
 	 * implicitly detach the error queues, and any other shared memory queues,
@@ -1384,6 +1380,9 @@ ParallelWorkerMain(Datum main_arg)
 	reindexspace = shm_toc_lookup(toc, PARALLEL_KEY_REINDEX_STATE, false);
 	RestoreReindexState(reindexspace);
 
+	/* Attach to the leader's serializable transaction, if SERIALIZABLE. */
+	AttachSerializableXact(&fps->serializable_xact_handle);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index dbaaf8e0053..52e48c3b60a 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2004,9 +2004,12 @@ CommitTransaction(void)
 	/*
 	 * Mark serializable transaction as complete for predicate locking
 	 * purposes.  This should be done as late as we can put it and still allow
-	 * errors to be raised for failure patterns found at commit.
+	 * errors to be raised for failure patterns found at commit.  This is not
+	 * appropriate in a parallel worker however, because we aren't committing
+	 * the leader's transaction and its serializable state will go on.
 	 */
-	PreCommit_CheckForSerializationFailure();
+	if (!is_parallel_worker)
+		PreCommit_CheckForSerializationFailure();
 
 	/*
 	 * Insert notifications sent by NOTIFY commands into the queue.  This
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 14b0b89463c..f8b72ebab99 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -592,7 +592,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pstmt_data = ExecSerializePlan(planstate->plan, estate);
 
 	/* Create a parallel context. */
-	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers, false);
+	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
 	pei->pcxt = pcxt;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd1447cc..c39e79a26d3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -292,14 +292,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -307,8 +299,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 233606b4141..8ef1f3f3c4c 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -520,6 +520,7 @@ RegisterLWLockTranches(void)
 						  "shared_tuplestore");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "sxact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index d1ff2b1edcd..4c4c9a808b4 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLEXACT's member 'lock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	PredicateLockHashPartitionLock(hashcode)
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target.
@@ -186,6 +194,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -469,6 +478,7 @@ static void CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag);
 static void FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer);
 static void OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
 										SERIALIZABLEXACT *writer);
+static void CreateLocalPredicateLockHash(void);
 
 
 /*------------------------------------------------------------------------*/
@@ -1214,6 +1224,14 @@ InitPredicateLocks(void)
 		memset(PredXact->element, 0, requestSize);
 		for (i = 0; i < max_table_size; i++)
 		{
+			/*
+			 * The other members of SERIALIZABLEXACT are initialized when
+			 * objects are removed from the availableList, but "lock" and
+			 * "generation" are initialized up front only because
+			 * AttachSerializableXact() uses them to detect objects that have
+			 * been recycled (otherwise they'd be corrupted).
+			 */
+			LWLockInitialize(&PredXact->element[i].sxact.lock, LWTRANCHE_SXACT);
 			SHMQueueInsertBefore(&(PredXact->availableList),
 								 &(PredXact->element[i].link));
 		}
@@ -1679,6 +1697,17 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
 {
 	Assert(IsolationIsSerializable());
 
+	/*
+	 * If this is called by parallel.c in a parallel worker, we don't want to
+	 * create a SERIALIZABLEXACT just yet because the leader's
+	 * SERIALIZABLEXACT will be installed with AttachSerializableXact().  We
+	 * also don't want to reject SERIALIZABLE READ ONLY DEFERRABLE in this
+	 * case, because the leader has already determined that the snapshot it
+	 * has passed us is safe.  So there is nothing for us to do.
+	 */
+	if (IsParallelWorker())
+		return;
+
 	/*
 	 * We do not allow SERIALIZABLE READ ONLY DEFERRABLE transactions to
 	 * import snapshots, since there's no way to wait for a safe snapshot when
@@ -1712,7 +1741,6 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	VirtualTransactionId vxid;
 	SERIALIZABLEXACT *sxact,
 			   *othersxact;
-	HASHCTL		hash_ctl;
 
 	/* We only do this for serializable transactions.  Once. */
 	Assert(MySerializableXact == InvalidSerializableXact);
@@ -1825,6 +1853,7 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	SHMQueueInit(&(sxact->predicateLocks));
 	SHMQueueElemInit(&(sxact->finishedLink));
 	sxact->flags = 0;
+	sxact->refcount = -1;
 	if (XactReadOnly)
 	{
 		sxact->flags |= SXACT_FLAG_READ_ONLY;
@@ -1859,6 +1888,16 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 
 	LWLockRelease(SerializableXactHashLock);
 
+	CreateLocalPredicateLockHash();
+
+	return snapshot;
+}
+
+static void
+CreateLocalPredicateLockHash(void)
+{
+	HASHCTL		hash_ctl;
+
 	/* Initialize the backend-local hash table of parent locks */
 	Assert(LocalPredicateLockHash == NULL);
 	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
@@ -1868,8 +1907,6 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 										 max_predicate_locks_per_xact,
 										 &hash_ctl,
 										 HASH_ELEM | HASH_BLOBS);
-
-	return snapshot;
 }
 
 /*
@@ -2107,6 +2144,12 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
 
 	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
 
+	Assert(!IsInParallelMode() ||
+		   (LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								 LW_EXCLUSIVE) ||
+			LWLockHeldByMeInMode(&MySerializableXact->lock,
+								 LW_EXCLUSIVE)));
+
 	/* Can't remove it until no locks at this target. */
 	if (!SHMQueueEmpty(&target->predicateLocks))
 		return;
@@ -2124,7 +2167,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2137,6 +2182,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2190,6 +2237,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2388,6 +2437,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->lock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2425,6 +2476,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->lock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2612,7 +2665,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2672,7 +2726,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2687,7 +2741,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3272,6 +3327,7 @@ ReleasePredicateLocks(bool isCommit)
 				nextConflict,
 				possibleUnsafeConflict;
 	SERIALIZABLEXACT *roXact;
+	bool		local_cleanup_only;
 
 	/*
 	 * We can't trust XactReadOnly here, because a transaction which started
@@ -3290,6 +3346,45 @@ ReleasePredicateLocks(bool isCommit)
 		return;
 	}
 
+	/*
+	 * In parallel query, we use reference counting so that the last backend
+	 * to call ReleasePredicateLocks() actually frees the shared resources.
+	 * Usually that is the leader, but in the case of an error or a read-only
+	 * transaction whose SXACT_FLAG_RO_SAFE flag is set, it could be any
+	 * backend.
+	 */
+	local_cleanup_only = false;
+	LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
+	if (MySerializableXact->refcount != -1)
+	{
+		bool		last_to_detach = false;
+
+		Assert(MySerializableXact->refcount > 0);
+		if (--MySerializableXact->refcount == 0)
+		{
+			last_to_detach = true;
+			++MySerializableXact->generation;
+		}
+
+		/* If we're not last, we only clean up backend-local resources. */
+		if (!last_to_detach)
+			local_cleanup_only = true;
+	}
+	else if (IsParallelWorker())
+	{
+		/*
+		 * The leader made this SERIALIZABLEXACT non-shared in
+		 * DestroyParallelContext().  It can't become shared again while we're
+		 * running since the leader is now waiting for all workers to exit.
+		 * Workers only need to clean up their local state.
+		 */
+		local_cleanup_only = true;
+	}
+	LWLockRelease(&MySerializableXact->lock);
+
+	if (local_cleanup_only)
+		goto cleanup;
+
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
 	Assert(!isCommit || SxactIsPrepared(MySerializableXact));
@@ -3319,8 +3414,8 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact->finishedBefore = ShmemVariableCache->nextXid;
 
 	/*
-	 * If it's not a commit it's a rollback, and we can clear our locks
-	 * immediately.
+	 * If it's not a commit it's either a rollback or a read-only transaction
+	 * flagged SXACT_FLAG_RO_SAFE, and we can clear our locks immediately.
 	 */
 	if (isCommit)
 	{
@@ -3567,6 +3662,7 @@ ReleasePredicateLocks(bool isCommit)
 	if (needToClear)
 		ClearOldPredicateLocks();
 
+cleanup:
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
@@ -4259,6 +4355,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4293,6 +4391,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->lock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4841,6 +4941,11 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->lock in parallel mode because there cannot be
+	 * any parallel workers running while we are preparing a transaction.
+	 */
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -5049,3 +5154,81 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Prepare to share the current SERIALIZABLEXACT with parallel workers,
+ * filling in a handle object that can be used by AttachSerializableXact() in
+ * a parallel worker.
+ */
+void
+ShareSerializableXact(SerializableXactHandle *handle)
+{
+	Assert(!IsParallelWorker());
+
+	if (MySerializableXact == InvalidSerializableXact)
+	{
+		handle->sxact = NULL;
+		handle->generation = 0;
+		return;
+	}
+
+	LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
+	Assert(MySerializableXact->refcount == -1);
+	MySerializableXact->refcount = 1;
+	handle->sxact = MySerializableXact;
+	handle->generation = MySerializableXact->generation;
+	LWLockRelease(&MySerializableXact->lock);
+}
+
+/*
+ * Stop sharing the current SERIALIZABLEXACT.
+ */
+void
+UnshareSerializableXact(void)
+{
+	Assert(!IsParallelWorker());
+
+	if (MySerializableXact == InvalidSerializableXact)
+		return;
+
+	LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
+	Assert(MySerializableXact->refcount > 0);
+	MySerializableXact->refcount = -1;
+	LWLockRelease(&MySerializableXact->lock);
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+AttachSerializableXact(SerializableXactHandle *handle)
+{
+
+	Assert(MySerializableXact == NULL);
+
+	if (handle->sxact != NULL)
+	{
+		MySerializableXact = (SERIALIZABLEXACT *) handle->sxact;
+
+		LWLockAcquire(&MySerializableXact->lock, LW_EXCLUSIVE);
+		if (MySerializableXact->generation != handle->generation)
+		{
+			/*
+			 * Everyone, including the leader, has already called
+			 * ReleasePredicateLocks().  An error must have occurred.  Since
+			 * the SERIALIZABLEXACT has been released and recycled, it's not
+			 * safe to dereference it.
+			 */
+			MySerializableXact = InvalidSerializableXact;
+			elog(ERROR, "could not attach to shared serializable transaction");
+		}
+		else
+		{
+			Assert(MySerializableXact->refcount > 0);
+			++MySerializableXact->refcount;
+		}
+		LWLockRelease(&MySerializableXact->lock);
+
+		CreateLocalPredicateLockHash();
+	}
+}
diff --git a/src/include/access/parallel.h b/src/include/access/parallel.h
index 025691fd82d..45e7fbb43f8 100644
--- a/src/include/access/parallel.h
+++ b/src/include/access/parallel.h
@@ -60,8 +60,7 @@ extern PGDLLIMPORT bool InitializingParallelWorker;
 #define		IsParallelWorker()		(ParallelWorkerNumber >= 0)
 
 extern ParallelContext *CreateParallelContext(const char *library_name,
-					  const char *function_name, int nworkers,
-					  bool serializable_okay);
+					  const char *function_name, int nworkers);
 extern void InitializeParallelDSM(ParallelContext *pcxt);
 extern void ReinitializeParallelDSM(ParallelContext *pcxt);
 extern void LaunchParallelWorkers(ParallelContext *pcxt);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index c21bfe2f666..b25c43fc6be 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SHARED_TUPLESTORE,
 	LWTRANCHE_TBM,
 	LWTRANCHE_PARALLEL_APPEND,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 6a3464daa1e..5585a20c0c6 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -30,6 +30,15 @@ extern int	max_predicate_locks_per_page;
 /* Number of SLRU buffers to use for predicate locking */
 #define NUM_OLDSERXID_BUFFERS	16
 
+/*
+ * A handle used for sharing SERIALIZABLEXACT objects between the participants
+ * in a parallel query.
+ */
+typedef struct SerializableXactHandle
+{
+	void   *sxact;
+	int		generation;
+} SerializableXactHandle;
 
 /*
  * function prototypes
@@ -74,4 +83,9 @@ extern void PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit);
 extern void predicatelock_twophase_recover(TransactionId xid, uint16 info,
 							   void *recdata, uint32 len);
 
+/* parallel query support */
+extern void ShareSerializableXact(SerializableXactHandle *handle);
+extern void UnshareSerializableXact(void);
+extern void AttachSerializableXact(SerializableXactHandle *handle);
+
 #endif							/* PREDICATE_H */
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 0f736d37dff..c14a9fbe8ff 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,11 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	int			refcount;		/* reference count, in parallel mode */
+	int			generation;		/* generation counter to detect recycling */
+	LWLock		lock;			/* protects predicateLocks list, refcount
+								 * and generation in parallel mode */
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
@@ -467,7 +473,6 @@ typedef struct TwoPhasePredicateRecord
  */
 #define InvalidSerializableXact ((SERIALIZABLEXACT *) NULL)
 
-
 /*
  * Function definitions for functions needing awareness of predicate
  * locking internals.
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 00000000000..f43aa6a2990
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 74d7d59546a..aed46d8d549 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,3 +66,4 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+test: serializable-parallel
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 00000000000..0e7c2c7c1fa
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"
-- 
2.15.1

#30

Thomas Munro

thomas.munro@enterprisedb.com

almost 8 years ago

In reply to: Thomas Munro (#29)

Re: [HACKERS] SERIALIZABLE with parallel query

On Fri, Feb 23, 2018 at 1:54 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

The attached is a draft patch only, needing some testing and polish.
Brickbats, better ideas?

Note, that version is broken for multiple Gather nodes, but that's
fixable. Comments on the general idea welcome.

--
Thomas Munro
http://www.enterprisedb.com

#31

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Thomas Munro (#29)

Re: [HACKERS] SERIALIZABLE with parallel query

On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:>

The best solution I have come up with so far is to add a reference
count to SERIALIZABLEXACT. I toyed with putting the refcount into the
DSM instead, but then I ran into problems making that work when you
have a query with multiple Gather nodes. Since the refcount is in
SERIALIZABLEXACT I also had to add a generation counter so that I
could detect the case where you try to attach too late (the leader has
already errored out, the refcount has reached 0 and the
SERIALIZABLEXACT object has been recycled).

I don't know whether that's safe or not. It certainly sounds like
it's solving one category of problem, but is that the only issue? If
some backends haven't noticed that we're safe, they might keep
acquiring SIREAD locks or doing other manipulations of shared state,
which maybe could cause confusion. I haven't looked into this deeply
enough to understand whether there's actually a possibility of trouble
there, but I can't rule it out off-hand.

One approach is to just disable this optimization for parallel query.
Being able to use SERIALIZABLE with parallel query is better than not
being able to do it, even if some optimizations are not applied in
that case. Of course making the optimizations work is better, but
we've got to be sure we're doing it right.

PS I noticed that for BecomeLockGroupMember() we say "If we can't
join the lock group, the leader has gone away, so just exit quietly"
but for various other similar things we spew errors (most commonly
seen one being "ERROR: could not map dynamic shared memory segment").
Intentional?

I suppose I thought that if we failed to map the dynamic shared memory
segment, it might be down to any one of several causes; whereas if we
fail to join the lock group, it must be because the leader has already
exited. There might be a flaw in that thinking, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#32

Amit Kapila

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Robert Haas (#31)

Re: [HACKERS] SERIALIZABLE with parallel query

On Thu, Feb 22, 2018 at 10:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro

PS I noticed that for BecomeLockGroupMember() we say "If we can't
join the lock group, the leader has gone away, so just exit quietly"
but for various other similar things we spew errors (most commonly
seen one being "ERROR: could not map dynamic shared memory segment").
Intentional?

I suppose I thought that if we failed to map the dynamic shared memory
segment, it might be down to any one of several causes; whereas if we
fail to join the lock group, it must be because the leader has already
exited. There might be a flaw in that thinking, though.

By the way, in which case leader can exit early? As of now, we do
wait for workers to end both before the query is finished or in error
cases.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#33

Thomas Munro

thomas.munro@enterprisedb.com

almost 8 years ago

In reply to: Amit Kapila (#32)

Re: [HACKERS] SERIALIZABLE with parallel query

On Fri, Feb 23, 2018 at 3:29 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Feb 22, 2018 at 10:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro

PS I noticed that for BecomeLockGroupMember() we say "If we can't
join the lock group, the leader has gone away, so just exit quietly"
but for various other similar things we spew errors (most commonly
seen one being "ERROR: could not map dynamic shared memory segment").
Intentional?

I suppose I thought that if we failed to map the dynamic shared memory
segment, it might be down to any one of several causes; whereas if we
fail to join the lock group, it must be because the leader has already
exited. There might be a flaw in that thinking, though.

By the way, in which case leader can exit early? As of now, we do
wait for workers to end both before the query is finished or in error
cases.

create table foo as select generate_series(1, 10)::int a;
alter table foo set (parallel_workers = 2);
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
select count(a / 0) from foo;

That reliably gives me:
ERROR: division by zero [from leader]
ERROR: could not map dynamic shared memory segment [from workers]

I thought this was coming from resource manager cleanup, but you're
right: that happens after we wait for all workers to finish. Perhaps
this is a race within DestroyParallelContext() itself: when it is
called by AtEOXact_Parallel() during an abort, it asks the postmaster
to SIGTERM the workers, then it immediately detaches from the DSM
segment, and then it waits for the worker to start up. The workers
unblock signals before the they try to attach to the DSM segment, but
they don't CHECK_FOR_INTERRUPTS before they try to attach (and even if
they did it wouldn't solve nothing).

I don't like the error much, though at least the root cause error is
logged first.

I don't immediately see how BecomeLockGroupMember() could have the
same kind of problem though, for the reason you said: the leader waits
for the workers to finish, so I'm not sure in which circumstances it
would cease to be the lock group leader while the workers are still
running.

--
Thomas Munro
http://www.enterprisedb.com

#34

Amit Kapila

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Thomas Munro (#33)

Re: [HACKERS] SERIALIZABLE with parallel query

On Fri, Feb 23, 2018 at 8:48 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Fri, Feb 23, 2018 at 3:29 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Feb 22, 2018 at 10:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro

PS I noticed that for BecomeLockGroupMember() we say "If we can't
join the lock group, the leader has gone away, so just exit quietly"
but for various other similar things we spew errors (most commonly
seen one being "ERROR: could not map dynamic shared memory segment").
Intentional?

I suppose I thought that if we failed to map the dynamic shared memory
segment, it might be down to any one of several causes; whereas if we
fail to join the lock group, it must be because the leader has already
exited. There might be a flaw in that thinking, though.

By the way, in which case leader can exit early? As of now, we do
wait for workers to end both before the query is finished or in error
cases.

create table foo as select generate_series(1, 10)::int a;
alter table foo set (parallel_workers = 2);
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
select count(a / 0) from foo;

That reliably gives me:
ERROR: division by zero [from leader]
ERROR: could not map dynamic shared memory segment [from workers]

I thought this was coming from resource manager cleanup, but you're
right: that happens after we wait for all workers to finish. Perhaps
this is a race within DestroyParallelContext() itself: when it is
called by AtEOXact_Parallel() during an abort, it asks the postmaster
to SIGTERM the workers, then it immediately detaches from the DSM
segment, and then it waits for the worker to start up.

I guess you mean to say worker waits to shutdown/exit. Why would it
wait for startup at that stage?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#35

Thomas Munro

thomas.munro@enterprisedb.com

almost 8 years ago

In reply to: Amit Kapila (#34)

Re: [HACKERS] SERIALIZABLE with parallel query

On Fri, Feb 23, 2018 at 7:56 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Feb 23, 2018 at 8:48 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Fri, Feb 23, 2018 at 3:29 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

By the way, in which case leader can exit early? As of now, we do
wait for workers to end both before the query is finished or in error
cases.

create table foo as select generate_series(1, 10)::int a;
alter table foo set (parallel_workers = 2);
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
select count(a / 0) from foo;

That reliably gives me:
ERROR: division by zero [from leader]
ERROR: could not map dynamic shared memory segment [from workers]

I thought this was coming from resource manager cleanup, but you're
right: that happens after we wait for all workers to finish. Perhaps
this is a race within DestroyParallelContext() itself: when it is
called by AtEOXact_Parallel() during an abort, it asks the postmaster
to SIGTERM the workers, then it immediately detaches from the DSM
segment, and then it waits for the worker to start up.

I guess you mean to say worker waits to shutdown/exit. Why would it
wait for startup at that stage?

Right, I meant to say shutdown/exit.

--
Thomas Munro
http://www.enterprisedb.com

#36

Thomas Munro

thomas.munro@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#31)

Re: [HACKERS] SERIALIZABLE with parallel query

On Fri, Feb 23, 2018 at 6:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:>

The best solution I have come up with so far is to add a reference
count to SERIALIZABLEXACT. I toyed with putting the refcount into the
DSM instead, but then I ran into problems making that work when you
have a query with multiple Gather nodes. Since the refcount is in
SERIALIZABLEXACT I also had to add a generation counter so that I
could detect the case where you try to attach too late (the leader has
already errored out, the refcount has reached 0 and the
SERIALIZABLEXACT object has been recycled).

I don't know whether that's safe or not. It certainly sounds like
it's solving one category of problem, but is that the only issue? If
some backends haven't noticed that we're safe, they might keep
acquiring SIREAD locks or doing other manipulations of shared state,
which maybe could cause confusion. I haven't looked into this deeply
enough to understand whether there's actually a possibility of trouble
there, but I can't rule it out off-hand.

After some testing, I think the refcount approach could be made to
work, but it seems quite complicated and there are some weird edge
cases that showed up that started to make it look like more trouble
than it was worth. One downside of refcounts is that you never get to
free the SERIALIZABLEXACT until the end of the transaction with
parallel_leader_participation = off.

I'm testing another version that is a lot simpler: like v10, it relies
on the knowledge that the leader's transaction will always end after
the workers have finished, but it handles the RO_SAFE optimisation by
keeping the SERIALIZABLEXACT alive but freeing its locks etc. More
soon.

#37

Thomas Munro

thomas.munro@enterprisedb.com

almost 8 years ago

In reply to: Thomas Munro (#36)

3 attachment(s)

Re: [HACKERS] SERIALIZABLE with parallel query

On Sat, Feb 24, 2018 at 12:04 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I'm testing another version that is a lot simpler: like v10, it relies
on the knowledge that the leader's transaction will always end after
the workers have finished, but it handles the RO_SAFE optimisation by
keeping the SERIALIZABLEXACT alive but freeing its locks etc. More
soon.

I've now broken it into two patches.

Patch 0001 is like my original patch with some minor improvements,
except that it now disables the RO_SAFE optimisation completely in
parallel mode. In other words, it's the stupidest fix possible to the
problem you flagged up. I think the main questions to answer about
the 0001 patch are whether this new locking protocol is sufficient,
whether anything bad could happen as a result of lock
escalation/transfer, and whether the underlying assumption about the
SERIALIZABLEXACT's lifetime holds true (that the leader will never
call ReleasePredicateLocks() while a worker is still running).

There are a couple of easy incremental improvements that could be made
on top of that patch, but I didn't make them because I'm trying to be
conservative in the hope of landing at least the basic feature in
PostgreSQL 11. Namely:

1. We could still return false if we see SXACT_FLAG_RO_SAFE in
SerializationNeededForRead() (we just couldn't call
ReleasePredicateLocks()).

2. We could set MySerializableXact to InvalidSerializableXact in
worker backends so at least they'd benefit from the optimisation (we
just couldn't do that in the leader or it'd leak resources).

Patch 0002 aims a bit higher than those ideas. I wanted to make sure
that the leader wouldn't arbitrarily miss out on the optimisation, and
I also suspect that the optimisation might be contagious in the sense
that actually releasing sooner might cause the RO_SAFE flag to be set
on *other* transactions sooner. Patch 0002 works like this:

The first backend to observe the RO_SAFE flag 'partially releases' the
SERIALIZABLEXACT, so that the SERIALIZABLEXACT itself remains valid.
(The concept of 'partial release' already existed, but I'm using it in
a new way.) All backends clear their MySerializableXact variable so
that they drop to faster SI in their own time. The leader keeps a
copy of it in SavedSerializableXact, so that it can fully release it
at the end of the transaction when we know that no other backend has a
reference to it.

These patches survive hammering with a simple test that generates a
mixture of read only and read write parallel queries that hit the
interesting case (attached; this test helped me understand that the
refcount scheme I considered was going to be hard). I haven't
personally tried to measure the value of the optimisation (though I'm
pretty sure it exists, based on the VLDB paper and the knowledge that
REPEATABLE READ (what the optimisation effectively gives you) just has
to be faster than SERIALIZABLE 'cause I've see all that code you get
to not run!). I'd like to propose the 0001 patch for now, but keep
the 0002 patch back for a bit as it's very new and needs more
feedback, if possible from Kevin and others involved in the SSI
project. Of course their input on the 0001 patch is also super
welcome.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

test2.pytext/x-python-script; charset=US-ASCII; name=test2.pyDownload

0001-Enable-parallel-query-with-SERIALIZABLE-isolatio-v12.patchapplication/octet-stream; name=0001-Enable-parallel-query-with-SERIALIZABLE-isolatio-v12.patchDownload

From a3b019b9e36ec2212f7fefc915b7d961f3a22f1c Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Fri, 1 Sep 2017 16:54:57 +1200
Subject: [PATCH 1/2] Enable parallel query with SERIALIZABLE isolation.

Previously, the SERIALIZABLE isolation level prevented parallel query from
being used.  Allow the two features to be used together by sharing the
leader's SERIALIZABLEXACT with parallel workers.

Remove the serializable_okay flag added to CreateParallelContext() by commit
9da0cc35284bdbe8d442d732963303ff0e0a40bc, because it's now redundant.

The optimization allowing SSI checks to be skipped after a certain point in
read-only transactions is disabled in parallel mode.  It could be implemented
in a later commit.

Author: Thomas Munro
Reviewed-By: Haribabu Kommi, Robert Haas
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 doc/src/sgml/monitoring.sgml                       |   7 +-
 doc/src/sgml/parallel.sgml                         |  17 ----
 src/backend/access/nbtree/nbtsort.c                |   2 +-
 src/backend/access/transam/parallel.c              |  18 ++--
 src/backend/access/transam/xact.c                  |  14 ++-
 src/backend/executor/execParallel.c                |   2 +-
 src/backend/optimizer/plan/planner.c               |  11 +--
 src/backend/storage/lmgr/lwlock.c                  |   1 +
 src/backend/storage/lmgr/predicate.c               | 109 ++++++++++++++++++---
 src/include/access/parallel.h                      |   3 +-
 src/include/storage/lwlock.h                       |   1 +
 src/include/storage/predicate.h                    |   9 ++
 src/include/storage/predicate_internals.h          |   4 +
 .../isolation/expected/serializable-parallel-2.out |  44 +++++++++
 .../isolation/expected/serializable-parallel.out   |  44 +++++++++
 src/test/isolation/isolation_schedule              |   2 +
 .../isolation/specs/serializable-parallel-2.spec   |  30 ++++++
 .../isolation/specs/serializable-parallel.spec     |  48 +++++++++
 18 files changed, 308 insertions(+), 58 deletions(-)
 create mode 100644 src/test/isolation/expected/serializable-parallel-2.out
 create mode 100644 src/test/isolation/expected/serializable-parallel.out
 create mode 100644 src/test/isolation/specs/serializable-parallel-2.spec
 create mode 100644 src/test/isolation/specs/serializable-parallel.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e138d1ef076..64f60cbdc54 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="63"><literal>LWLock</literal></entry>
+        <entry morerows="64"><literal>LWLock</literal></entry>
         <entry><literal>ShmemIndexLock</literal></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -979,6 +979,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting to perform an operation on a list of locks held by
          serializable transactions.</entry>
         </row>
+        <row>
+         <entry><literal>sxact</literal></entry>
+         <entry>Waiting to perform an operation on a serializable transaction
+         in a parallel query.</entry>
+        </row>
         <row>
          <entry><literal>OldSerXidLock</literal></entry>
          <entry>Waiting to read or record conflicting serializable
diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index f15a9233cbf..9507f1ae2ef 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -192,13 +192,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -241,16 +234,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         that may be suboptimal when run serially.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 521ae6e5f77..98abcc6bf3a 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1207,7 +1207,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	EnterParallelMode();
 	Assert(request > 0);
 	pcxt = CreateParallelContext("postgres", "_bt_parallel_build_main",
-								 request, true);
+								 request);
 	scantuplesortstates = leaderparticipates ? request + 1 : request;
 
 	/*
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 9d4efc0f8fc..cf0ee1499ab 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -30,6 +30,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -86,6 +87,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SerializableXactHandle serializable_xact_handle;
 
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
@@ -150,7 +152,7 @@ static void ParallelWorkerShutdown(int code, Datum arg);
  */
 ParallelContext *
 CreateParallelContext(const char *library_name, const char *function_name,
-					  int nworkers, bool serializable_okay)
+					  int nworkers)
 {
 	MemoryContext oldcontext;
 	ParallelContext *pcxt;
@@ -168,16 +170,6 @@ CreateParallelContext(const char *library_name, const char *function_name,
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.  Utility statement callers may ask us to ignore this
-	 * restriction because they're always able to safely ignore the fact that
-	 * SIREAD locks do not work with parallelism.
-	 */
-	if (IsolationIsSerializable() && !serializable_okay)
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -321,6 +313,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->serializable_xact_handle = ShareSerializableXact();
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_FIXED, fps);
@@ -1384,6 +1377,9 @@ ParallelWorkerMain(Datum main_arg)
 	reindexspace = shm_toc_lookup(toc, PARALLEL_KEY_REINDEX_STATE, false);
 	RestoreReindexState(reindexspace);
 
+	/* Attach to the leader's serializable transaction, if SERIALIZABLE. */
+	AttachSerializableXact(fps->serializable_xact_handle);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index dbaaf8e0053..dd486835867 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2004,9 +2004,12 @@ CommitTransaction(void)
 	/*
 	 * Mark serializable transaction as complete for predicate locking
 	 * purposes.  This should be done as late as we can put it and still allow
-	 * errors to be raised for failure patterns found at commit.
+	 * errors to be raised for failure patterns found at commit.  This is not
+	 * appropriate in a parallel worker however, because we aren't committing
+	 * the leader's transaction and its serializable state will live on.
 	 */
-	PreCommit_CheckForSerializationFailure();
+	if (!is_parallel_worker)
+		PreCommit_CheckForSerializationFailure();
 
 	/*
 	 * Insert notifications sent by NOTIFY commands into the queue.  This
@@ -2232,9 +2235,12 @@ PrepareTransaction(void)
 	/*
 	 * Mark serializable transaction as complete for predicate locking
 	 * purposes.  This should be done as late as we can put it and still allow
-	 * errors to be raised for failure patterns found at commit.
+	 * errors to be raised for failure patterns found at commit.  This is not
+	 * appropriate for parallel workers however, because we aren't committing
+	 * the leader's transaction and its serializable state will live on.
 	 */
-	PreCommit_CheckForSerializationFailure();
+	if (!IsParallelWorker())
+		PreCommit_CheckForSerializationFailure();
 
 	/* NOTIFY will be handled below */
 
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 14b0b89463c..f8b72ebab99 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -592,7 +592,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pstmt_data = ExecSerializePlan(planstate->plan, estate);
 
 	/* Create a parallel context. */
-	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers, false);
+	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
 	pei->pcxt = pcxt;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd1447cc..c39e79a26d3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -292,14 +292,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -307,8 +299,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 233606b4141..8ef1f3f3c4c 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -520,6 +520,7 @@ RegisterLWLockTranches(void)
 						  "shared_tuplestore");
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "sxact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 654eca4f3f5..3d844faa891 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLEXACT's member 'predicateLockListLock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	PredicateLockHashPartitionLock(hashcode)
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target.
@@ -186,6 +194,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -469,6 +478,7 @@ static void CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag);
 static void FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer);
 static void OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
 										SERIALIZABLEXACT *writer);
+static void CreateLocalPredicateLockHash(void);
 
 
 /*------------------------------------------------------------------------*/
@@ -522,8 +532,10 @@ SerializationNeededForRead(Relation relation, Snapshot snapshot)
 	 * A transaction is flagged as RO_SAFE if all concurrent R/W transactions
 	 * commit without having conflicts out to an earlier snapshot, thus
 	 * ensuring that no conflicts are possible for this transaction.
+	 *
+	 * This optimization is not yet supported in parallel mode.
 	 */
-	if (SxactIsROSafe(MySerializableXact))
+	if (SxactIsROSafe(MySerializableXact) && !IsInParallelMode())
 	{
 		ReleasePredicateLocks(false);
 		return false;
@@ -1214,6 +1226,8 @@ InitPredicateLocks(void)
 		memset(PredXact->element, 0, requestSize);
 		for (i = 0; i < max_table_size; i++)
 		{
+			LWLockInitialize(&PredXact->element[i].sxact.predicateLockListLock,
+							 LWTRANCHE_SXACT);
 			SHMQueueInsertBefore(&(PredXact->availableList),
 								 &(PredXact->element[i].link));
 		}
@@ -1679,6 +1693,17 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
 {
 	Assert(IsolationIsSerializable());
 
+	/*
+	 * If this is called by parallel.c in a parallel worker, we don't want to
+	 * create a SERIALIZABLEXACT just yet because the leader's
+	 * SERIALIZABLEXACT will be installed with AttachSerializableXact().  We
+	 * also don't want to reject SERIALIZABLE READ ONLY DEFERRABLE in this
+	 * case, because the leader has already determined that the snapshot it
+	 * has passed us is safe.  So there is nothing for us to do.
+	 */
+	if (IsParallelWorker())
+		return;
+
 	/*
 	 * We do not allow SERIALIZABLE READ ONLY DEFERRABLE transactions to
 	 * import snapshots, since there's no way to wait for a safe snapshot when
@@ -1712,7 +1737,6 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	VirtualTransactionId vxid;
 	SERIALIZABLEXACT *sxact,
 			   *othersxact;
-	HASHCTL		hash_ctl;
 
 	/* We only do this for serializable transactions.  Once. */
 	Assert(MySerializableXact == InvalidSerializableXact);
@@ -1859,6 +1883,16 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 
 	LWLockRelease(SerializableXactHashLock);
 
+	CreateLocalPredicateLockHash();
+
+	return snapshot;
+}
+
+static void
+CreateLocalPredicateLockHash(void)
+{
+	HASHCTL		hash_ctl;
+
 	/* Initialize the backend-local hash table of parent locks */
 	Assert(LocalPredicateLockHash == NULL);
 	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
@@ -1868,8 +1902,6 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 										 max_predicate_locks_per_xact,
 										 &hash_ctl,
 										 HASH_ELEM | HASH_BLOBS);
-
-	return snapshot;
 }
 
 /*
@@ -2124,7 +2156,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2137,6 +2171,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2190,6 +2226,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2388,6 +2426,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2425,6 +2465,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2612,7 +2654,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2672,7 +2715,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2687,7 +2730,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3290,6 +3334,10 @@ ReleasePredicateLocks(bool isCommit)
 		return;
 	}
 
+	/* Parallel workers mustn't release predicate locks. */
+	if (IsParallelWorker())
+		goto backend_local_cleanup;
+
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
 	Assert(!isCommit || SxactIsPrepared(MySerializableXact));
@@ -3319,8 +3367,8 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact->finishedBefore = ShmemVariableCache->nextXid;
 
 	/*
-	 * If it's not a commit it's a rollback, and we can clear our locks
-	 * immediately.
+	 * If it's not a commit it's either a rollback or a read-only transaction
+	 * flagged SXACT_FLAG_RO_SAFE, and we can clear our locks immediately.
 	 */
 	if (isCommit)
 	{
@@ -3567,6 +3615,7 @@ ReleasePredicateLocks(bool isCommit)
 	if (needToClear)
 		ClearOldPredicateLocks();
 
+backend_local_cleanup:
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
@@ -4259,6 +4308,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->predicateLockListLock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4293,6 +4344,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->predicateLockListLock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4841,6 +4894,13 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->predicateLockListLock in parallel mode because
+	 * there cannot be any parallel workers running while we are preparing a
+	 * transaction.
+	 */
+	Assert(!IsParallelWorker() && !ParallelContextActive());
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -5049,3 +5109,30 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Prepare to share the current SERIALIZABLEXACT with parallel workers,
+ * filling in a handle object that can be used by AttachSerializableXact() in
+ * a parallel worker.
+ */
+SerializableXactHandle
+ShareSerializableXact(void)
+{
+	Assert(!IsParallelWorker());
+
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+AttachSerializableXact(SerializableXactHandle handle)
+{
+
+	Assert(MySerializableXact == InvalidSerializableXact);
+
+	MySerializableXact = (SERIALIZABLEXACT *) handle;
+	if (MySerializableXact != InvalidSerializableXact)
+		CreateLocalPredicateLockHash();
+}
diff --git a/src/include/access/parallel.h b/src/include/access/parallel.h
index 025691fd82d..45e7fbb43f8 100644
--- a/src/include/access/parallel.h
+++ b/src/include/access/parallel.h
@@ -60,8 +60,7 @@ extern PGDLLIMPORT bool InitializingParallelWorker;
 #define		IsParallelWorker()		(ParallelWorkerNumber >= 0)
 
 extern ParallelContext *CreateParallelContext(const char *library_name,
-					  const char *function_name, int nworkers,
-					  bool serializable_okay);
+					  const char *function_name, int nworkers);
 extern void InitializeParallelDSM(ParallelContext *pcxt);
 extern void ReinitializeParallelDSM(ParallelContext *pcxt);
 extern void LaunchParallelWorkers(ParallelContext *pcxt);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index c21bfe2f666..b25c43fc6be 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SHARED_TUPLESTORE,
 	LWTRANCHE_TBM,
 	LWTRANCHE_PARALLEL_APPEND,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 6a3464daa1e..23f3acc3ce1 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -30,6 +30,11 @@ extern int	max_predicate_locks_per_page;
 /* Number of SLRU buffers to use for predicate locking */
 #define NUM_OLDSERXID_BUFFERS	16
 
+/*
+ * A handle used for sharing SERIALIZABLEXACT objects between the participants
+ * in a parallel query.
+ */
+typedef void *SerializableXactHandle;
 
 /*
  * function prototypes
@@ -74,4 +79,8 @@ extern void PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit);
 extern void predicatelock_twophase_recover(TransactionId xid, uint16 info,
 							   void *recdata, uint32 len);
 
+/* parallel query support */
+extern SerializableXactHandle ShareSerializableXact(void);
+extern void AttachSerializableXact(SerializableXactHandle handle);
+
 #endif							/* PREDICATE_H */
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 0f736d37dff..59eb49e57ee 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	LWLock		predicateLockListLock;	/* protects predicateLocks in parallel
+										 * mode */
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
diff --git a/src/test/isolation/expected/serializable-parallel-2.out b/src/test/isolation/expected/serializable-parallel-2.out
new file mode 100644
index 00000000000..9a693c4dc62
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel-2.out
@@ -0,0 +1,44 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1r s2r1 s1c s2r2 s2c
+step s1r: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s2r1: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s1c: COMMIT;
+step s2r2: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s2c: COMMIT;
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 00000000000..f43aa6a2990
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 74d7d59546a..36890d74b60 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,3 +66,5 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+test: serializable-parallel
+test: serializable-parallel-2
diff --git a/src/test/isolation/specs/serializable-parallel-2.spec b/src/test/isolation/specs/serializable-parallel-2.spec
new file mode 100644
index 00000000000..7f90f75d882
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel-2.spec
@@ -0,0 +1,30 @@
+# Exercise the case where a read-only serializable transaction has
+# SXACT_FLAG_RO_SAFE set in a parallel query.
+
+setup
+{
+	CREATE TABLE foo AS SELECT generate_series(1, 10)::int a;
+	ALTER TABLE foo SET (parallel_workers = 2);
+}
+
+teardown
+{
+	DROP TABLE foo;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1r"	{ SELECT * FROM foo; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY;
+			  SET parallel_setup_cost = 0;
+			  SET parallel_tuple_cost = 0;
+			}
+step "s2r1"	{ SELECT * FROM foo; }
+step "s2r2"	{ SELECT * FROM foo; }
+step "s2c"	{ COMMIT; }
+
+permutation "s1r" "s2r1" "s1c" "s2r2" "s2c"
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 00000000000..0e7c2c7c1fa
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"
-- 
2.15.1

0002-Enable-the-read-only-SERIALIZABLE-optimization-f-v12.patchapplication/octet-stream; name=0002-Enable-the-read-only-SERIALIZABLE-optimization-f-v12.patchDownload

From 865e3b77d91a83d0bf707e7024b394c278b2de52 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Sun, 25 Feb 2018 23:45:09 +1300
Subject: [PATCH 2/2] Enable the read-only SERIALIZABLE optimization for
 parallel query.

A SERIALIZABLEXACT can be marked as SXACT_FLAG_RO_SAFE by a concurrent session,
meaning that it is safe to throw away this SERIALIZABLEXACT and start behaving
like a REPEATABLE READ transaction.  The problem is that the leader and workers
are sharing the same SERIALIZABLEXACT so this must be coordinated carefully.
This commit solves that problem as follows:

The first backend to observe the SXACT_FLAG_RO_SAFE flag will 'partially
release' it, meaning that the conflicts and locks it holds can be released, but
the SERIALIZABLEXACT itself will remain active because other backends might
have a pointer to it.

Whenever any backend notices the SXACT_FLAG_RO_SAFE flag, it clears its own
MySerializableXact variable so that it can skip SSI checks for the rest of the
transaction.  In the special case of the leader process, it transfers the
SERIALIZABLEXACT to a new variable SavedSerializableXact, so that it can be
completely released at the end of the transaction after all workers have
exited.

Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 src/backend/storage/lmgr/predicate.c      | 136 ++++++++++++++++++++++++++----
 src/backend/utils/resowner/resowner.c     |   2 +-
 src/include/storage/predicate.h           |   2 +-
 src/include/storage/predicate_internals.h |   6 ++
 4 files changed, 127 insertions(+), 19 deletions(-)

diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 3d844faa891..0253f2d7d7e 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -170,7 +170,7 @@
  *		PredicateLockPageCombine(Relation relation, BlockNumber oldblkno,
  *								 BlockNumber newblkno)
  *		TransferPredicateLocksToHeapRelation(Relation relation)
- *		ReleasePredicateLocks(bool isCommit)
+ *		ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
  *
  * conflict detection (may also trigger rollback)
  *		CheckForSerializableConflictOut(bool visible, Relation relation,
@@ -288,6 +288,7 @@
 #define SxactIsDeferrableWaiting(sxact) (((sxact)->flags & SXACT_FLAG_DEFERRABLE_WAITING) != 0)
 #define SxactIsROSafe(sxact) (((sxact)->flags & SXACT_FLAG_RO_SAFE) != 0)
 #define SxactIsROUnsafe(sxact) (((sxact)->flags & SXACT_FLAG_RO_UNSAFE) != 0)
+#define SxactIsPartiallyReleased(sxact) (((sxact)->flags & SXACT_FLAG_PARTIALLY_RELEASED) != 0)
 
 /*
  * Compute the hash code associated with a PREDICATELOCKTARGETTAG.
@@ -422,6 +423,15 @@ static HTAB *LocalPredicateLockHash = NULL;
 static SERIALIZABLEXACT *MySerializableXact = InvalidSerializableXact;
 static bool MyXactDidWrite = false;
 
+/*
+ * The SXACT_FLAG_RO_UNSAFE optimization might lead us to release
+ * MySerializableXact early.  If that happens in a parallel query, the leader
+ * needs to defer the destruction of the SERIALIZABLEXACT until end of
+ * transaction, because the workers still have a reference to it.  In that
+ * case, the leader stores it here.
+ */
+static SERIALIZABLEXACT *SavedSerializableXact = InvalidSerializableXact;
+
 /* local functions */
 
 static SERIALIZABLEXACT *CreatePredXact(void);
@@ -532,12 +542,10 @@ SerializationNeededForRead(Relation relation, Snapshot snapshot)
 	 * A transaction is flagged as RO_SAFE if all concurrent R/W transactions
 	 * commit without having conflicts out to an earlier snapshot, thus
 	 * ensuring that no conflicts are possible for this transaction.
-	 *
-	 * This optimization is not yet supported in parallel mode.
 	 */
-	if (SxactIsROSafe(MySerializableXact) && !IsInParallelMode())
+	if (SxactIsROSafe(MySerializableXact))
 	{
-		ReleasePredicateLocks(false);
+		ReleasePredicateLocks(false, true);
 		return false;
 	}
 
@@ -1573,14 +1581,14 @@ GetSafeSnapshot(Snapshot origSnapshot)
 		ereport(DEBUG2,
 				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 				 errmsg("deferrable snapshot was unsafe; trying a new one")));
-		ReleasePredicateLocks(false);
+		ReleasePredicateLocks(false, false);
 	}
 
 	/*
 	 * Now we have a safe snapshot, so we don't need to do any further checks.
 	 */
 	Assert(SxactIsROSafe(MySerializableXact));
-	ReleasePredicateLocks(false);
+	ReleasePredicateLocks(false, true);
 
 	return snapshot;
 }
@@ -3307,9 +3315,17 @@ SetNewSxactGlobalXmin(void)
  * If this transaction is committing and is holding any predicate locks,
  * it must be added to a list of completed serializable transactions still
  * holding locks.
+ *
+ * If isReadOnlySafe is true, then predicate locks are being released before
+ * the end of the transaction because MySerializableXact has been determined
+ * to be RO_SAFE.  In non-parallel mode we can release it completely, but it
+ * in parallel mode we partially release the SERIALIZABLEXACT and keep it
+ * around until the end of the transaction, allowing each backend to clear its
+ * MySerializableXact variable and benefit from the optimization in its own
+ * time.
  */
 void
-ReleasePredicateLocks(bool isCommit)
+ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
 {
 	bool		needToClear;
 	RWConflict	conflict,
@@ -3328,22 +3344,93 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* We can't be both committing and releasing early due to RO_SAFE. */
+	Assert(!(isCommit && isReadOnlySafe));
+
+	/* Are we at the end of a transaction, that is, a commit or abort? */
+	if (!isReadOnlySafe)
+	{
+		/*
+		 * Parallel workers mustn't release predicate locks at the end of
+		 * their transaction.  The leader will do that at the end of its
+		 * transaction.
+		 */
+		if (IsParallelWorker())
+			goto backend_local_cleanup;
+
+		/*
+		 * By the time the leader in a parallel query reaches end of
+		 * transaction, it has waited for all workers to exit.
+		 */
+		Assert(!ParallelContextActive());
+
+		/*
+		 * If the leader in a parallel query earler stashed a partially
+		 * released SERIALIZABLEXACT for final clean-up at end of transaction
+		 * (because workers might still have been accessing it), then it's
+		 * time to restore it.
+		 */
+		if (SavedSerializableXact != InvalidSerializableXact)
+		{
+			Assert(MySerializableXact == InvalidSerializableXact);
+			MySerializableXact = SavedSerializableXact;
+			SavedSerializableXact = InvalidSerializableXact;
+			Assert(SxactIsPartiallyReleased(MySerializableXact));
+		}
+	}
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
 		return;
 	}
 
-	/* Parallel workers mustn't release predicate locks. */
-	if (IsParallelWorker())
-		goto backend_local_cleanup;
-
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
+	/*
+	 * If the transaction is committing, but it has been partially released
+	 * already, then treat this as a roll back.  It was marked as rolled back.
+	 */
+	if (isCommit && SxactIsPartiallyReleased(MySerializableXact))
+		isCommit = false;
+
+	/*
+	 * If we're called in the middle of a transaction because we discovered
+	 * that the SXACT_FLAG_RO_SAFE flag was set, then we'll partially release
+	 * it (that is, release the predicate locks and conflicts, but not the
+	 * SERIALIZABLEXACT itself) if we're the first backend to have noticed.
+	 */
+	if (isReadOnlySafe && IsInParallelMode())
+	{
+		/*
+		 * The leader needs to stash a pointer to it, so that it can
+		 * completely release it at end-of-transaction.
+		 */
+		if (!IsParallelWorker())
+			SavedSerializableXact = MySerializableXact;
+
+		/*
+		 * The first backend to reach this condition will partially release
+		 * the SERIALIZABLEXACT.  All others will just clear their
+		 * backend-local state so that they stop doing SSI checks for the rest
+		 * of the transaction.
+		 */
+		if (SxactIsPartiallyReleased(MySerializableXact))
+		{
+			LWLockRelease(SerializableXactHashLock);
+			goto backend_local_cleanup;
+		}
+		else
+		{
+			MySerializableXact->flags |= SXACT_FLAG_PARTIALLY_RELEASED;
+			/* ... and proceed to perform the partial release below. */
+		}
+	}
 	Assert(!isCommit || SxactIsPrepared(MySerializableXact));
 	Assert(!isCommit || !SxactIsDoomed(MySerializableXact));
 	Assert(!SxactIsCommitted(MySerializableXact));
-	Assert(!SxactIsRolledBack(MySerializableXact));
+	Assert(SxactIsPartiallyReleased(MySerializableXact)
+		   || !SxactIsRolledBack(MySerializableXact));
 
 	/* may not be serializable during COMMIT/ROLLBACK PREPARED */
 	Assert(MySerializableXact->pid == 0 || IsolationIsSerializable());
@@ -3392,7 +3479,8 @@ ReleasePredicateLocks(bool isCommit)
 		 * cleanup. This means it should not be considered when calculating
 		 * SxactGlobalXmin.
 		 */
-		MySerializableXact->flags |= SXACT_FLAG_DOOMED;
+		if (!isReadOnlySafe)
+			MySerializableXact->flags |= SXACT_FLAG_DOOMED;
 		MySerializableXact->flags |= SXACT_FLAG_ROLLED_BACK;
 
 		/*
@@ -3588,7 +3676,8 @@ ReleasePredicateLocks(bool isCommit)
 	 * was launched.
 	 */
 	needToClear = false;
-	if (TransactionIdEquals(MySerializableXact->xmin, PredXact->SxactGlobalXmin))
+	if (!isReadOnlySafe &&
+		TransactionIdEquals(MySerializableXact->xmin, PredXact->SxactGlobalXmin))
 	{
 		Assert(PredXact->SxactGlobalXminCount > 0);
 		if (--(PredXact->SxactGlobalXminCount) == 0)
@@ -3607,8 +3696,16 @@ ReleasePredicateLocks(bool isCommit)
 		SHMQueueInsertBefore(FinishedSerializableTransactions,
 							 &MySerializableXact->finishedLink);
 
+	/*
+	 * If we're releasing a RO_SAFE transaction in parallel mode, we'll only
+	 * partially release it.  That's necessary because other backends may have
+	 * a reference to it.  The leader will release the SERIALIZABLEXACT itself
+	 * at the end of the transaction after workers have stopped running.
+	 */
 	if (!isCommit)
-		ReleaseOneSerializableXact(MySerializableXact, false, false);
+		ReleaseOneSerializableXact(MySerializableXact,
+								   isReadOnlySafe && IsInParallelMode(),
+								   false);
 
 	LWLockRelease(SerializableFinishedListLock);
 
@@ -3807,6 +3904,8 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 	 * them to OldCommittedSxact if summarize is true)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -3886,6 +3985,8 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 	 */
 	SHMQueueInit(&sxact->predicateLocks);
 
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 
 	sxidtag.xid = sxact->topXid;
@@ -4776,6 +4877,7 @@ PreCommit_CheckForSerializationFailure(void)
 	/* Check if someone else has already decided that we need to die */
 	if (SxactIsDoomed(MySerializableXact))
 	{
+		Assert(!SxactIsPartiallyReleased(MySerializableXact));
 		LWLockRelease(SerializableXactHashLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
@@ -4973,7 +5075,7 @@ PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit)
 	MySerializableXact = sxid->myXact;
 	MyXactDidWrite = true;		/* conservatively assume that we wrote
 								 * something */
-	ReleasePredicateLocks(isCommit);
+	ReleasePredicateLocks(isCommit, false);
 }
 
 /*
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index e09a4f1ddb4..ab0523e90a5 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -551,7 +551,7 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
 			if (owner == TopTransactionResourceOwner)
 			{
 				ProcReleaseLocks(isCommit);
-				ReleasePredicateLocks(isCommit);
+				ReleasePredicateLocks(isCommit, false);
 			}
 		}
 		else
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 23f3acc3ce1..0925270b91e 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -61,7 +61,7 @@ extern void PredicateLockTuple(Relation relation, HeapTuple tuple, Snapshot snap
 extern void PredicateLockPageSplit(Relation relation, BlockNumber oldblkno, BlockNumber newblkno);
 extern void PredicateLockPageCombine(Relation relation, BlockNumber oldblkno, BlockNumber newblkno);
 extern void TransferPredicateLocksToHeapRelation(Relation relation);
-extern void ReleasePredicateLocks(bool isCommit);
+extern void ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe);
 
 /* conflict detection (may also trigger rollback) */
 extern void CheckForSerializableConflictOut(bool valid, Relation relation, HeapTuple tuple,
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 59eb49e57ee..04de63877d5 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -127,6 +127,12 @@ typedef struct SERIALIZABLEXACT
 #define SXACT_FLAG_RO_UNSAFE			0x00000100
 #define SXACT_FLAG_SUMMARY_CONFLICT_IN	0x00000200
 #define SXACT_FLAG_SUMMARY_CONFLICT_OUT 0x00000400
+/*
+ * The following flag means the transaction has been partially released
+ * already, but is being preserved because parallel workers might have a
+ * reference to it.  It'll be recycled by the leader at end-of-transaction.
+ */
+#define SXACT_FLAG_PARTIALLY_RELEASED	0x00000800
 
 /*
  * The following types are used to provide an ad hoc list for holding
-- 
2.15.1

#38

Thomas Munro

thomas.munro@enterprisedb.com

almost 8 years ago

In reply to: Thomas Munro (#37)

2 attachment(s)

Re: [HACKERS] SERIALIZABLE with parallel query

On Mon, Feb 26, 2018 at 6:37 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I've now broken it into two patches.

Rebased.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Enable-parallel-query-with-SERIALIZABLE-isolatio-v13.patchapplication/octet-stream; name=0001-Enable-parallel-query-with-SERIALIZABLE-isolatio-v13.patchDownload

From 72609a154b5fe79536166f35581dca6e1ef2e260 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Fri, 1 Sep 2017 16:54:57 +1200
Subject: [PATCH 1/2] Enable parallel query with SERIALIZABLE isolation.

Previously, the SERIALIZABLE isolation level prevented parallel query from
being used.  Allow the two features to be used together by sharing the
leader's SERIALIZABLEXACT with parallel workers.

Remove the serializable_okay flag added to CreateParallelContext() by commit
9da0cc35284bdbe8d442d732963303ff0e0a40bc, because it's now redundant.

The optimization allowing SSI checks to be skipped after a certain point in
read-only transactions is disabled in parallel mode.  It could be implemented
in a later commit.

Author: Thomas Munro
Reviewed-By: Haribabu Kommi, Robert Haas
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 doc/src/sgml/monitoring.sgml                       |   5 +
 doc/src/sgml/parallel.sgml                         |  17 ----
 src/backend/access/nbtree/nbtsort.c                |   2 +-
 src/backend/access/transam/parallel.c              |  18 ++--
 src/backend/access/transam/xact.c                  |  14 ++-
 src/backend/executor/execParallel.c                |   2 +-
 src/backend/optimizer/plan/planner.c               |  11 +--
 src/backend/storage/lmgr/lwlock.c                  |   1 +
 src/backend/storage/lmgr/predicate.c               | 109 ++++++++++++++++++---
 src/include/access/parallel.h                      |   3 +-
 src/include/storage/lwlock.h                       |   1 +
 src/include/storage/predicate.h                    |   9 ++
 src/include/storage/predicate_internals.h          |   4 +
 .../isolation/expected/serializable-parallel-2.out |  44 +++++++++
 .../isolation/expected/serializable-parallel.out   |  44 +++++++++
 src/test/isolation/isolation_schedule              |   2 +
 .../isolation/specs/serializable-parallel-2.spec   |  30 ++++++
 .../isolation/specs/serializable-parallel.spec     |  48 +++++++++
 18 files changed, 307 insertions(+), 57 deletions(-)
 create mode 100644 src/test/isolation/expected/serializable-parallel-2.out
 create mode 100644 src/test/isolation/expected/serializable-parallel.out
 create mode 100644 src/test/isolation/specs/serializable-parallel-2.spec
 create mode 100644 src/test/isolation/specs/serializable-parallel.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3bc4de57d5a..daa961a3226 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -979,6 +979,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting to perform an operation on a list of locks held by
          serializable transactions.</entry>
         </row>
+        <row>
+         <entry><literal>sxact</literal></entry>
+         <entry>Waiting to perform an operation on a serializable transaction
+         in a parallel query.</entry>
+        </row>
         <row>
          <entry><literal>OldSerXidLock</literal></entry>
          <entry>Waiting to read or record conflicting serializable
diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index f15a9233cbf..9507f1ae2ef 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -192,13 +192,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -241,16 +234,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         that may be suboptimal when run serially.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index f0c276b52a1..c5807294ba8 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1207,7 +1207,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	EnterParallelMode();
 	Assert(request > 0);
 	pcxt = CreateParallelContext("postgres", "_bt_parallel_build_main",
-								 request, true);
+								 request);
 	scantuplesortstates = leaderparticipates ? request + 1 : request;
 
 	/*
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 9d4efc0f8fc..cf0ee1499ab 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -30,6 +30,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -86,6 +87,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SerializableXactHandle serializable_xact_handle;
 
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
@@ -150,7 +152,7 @@ static void ParallelWorkerShutdown(int code, Datum arg);
  */
 ParallelContext *
 CreateParallelContext(const char *library_name, const char *function_name,
-					  int nworkers, bool serializable_okay)
+					  int nworkers)
 {
 	MemoryContext oldcontext;
 	ParallelContext *pcxt;
@@ -168,16 +170,6 @@ CreateParallelContext(const char *library_name, const char *function_name,
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.  Utility statement callers may ask us to ignore this
-	 * restriction because they're always able to safely ignore the fact that
-	 * SIREAD locks do not work with parallelism.
-	 */
-	if (IsolationIsSerializable() && !serializable_okay)
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -321,6 +313,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->serializable_xact_handle = ShareSerializableXact();
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_FIXED, fps);
@@ -1384,6 +1377,9 @@ ParallelWorkerMain(Datum main_arg)
 	reindexspace = shm_toc_lookup(toc, PARALLEL_KEY_REINDEX_STATE, false);
 	RestoreReindexState(reindexspace);
 
+	/* Attach to the leader's serializable transaction, if SERIALIZABLE. */
+	AttachSerializableXact(fps->serializable_xact_handle);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index dbaaf8e0053..dd486835867 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2004,9 +2004,12 @@ CommitTransaction(void)
 	/*
 	 * Mark serializable transaction as complete for predicate locking
 	 * purposes.  This should be done as late as we can put it and still allow
-	 * errors to be raised for failure patterns found at commit.
+	 * errors to be raised for failure patterns found at commit.  This is not
+	 * appropriate in a parallel worker however, because we aren't committing
+	 * the leader's transaction and its serializable state will live on.
 	 */
-	PreCommit_CheckForSerializationFailure();
+	if (!is_parallel_worker)
+		PreCommit_CheckForSerializationFailure();
 
 	/*
 	 * Insert notifications sent by NOTIFY commands into the queue.  This
@@ -2232,9 +2235,12 @@ PrepareTransaction(void)
 	/*
 	 * Mark serializable transaction as complete for predicate locking
 	 * purposes.  This should be done as late as we can put it and still allow
-	 * errors to be raised for failure patterns found at commit.
+	 * errors to be raised for failure patterns found at commit.  This is not
+	 * appropriate for parallel workers however, because we aren't committing
+	 * the leader's transaction and its serializable state will live on.
 	 */
-	PreCommit_CheckForSerializationFailure();
+	if (!IsParallelWorker())
+		PreCommit_CheckForSerializationFailure();
 
 	/* NOTIFY will be handled below */
 
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 14b0b89463c..f8b72ebab99 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -592,7 +592,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pstmt_data = ExecSerializePlan(planstate->plan, estate);
 
 	/* Create a parallel context. */
-	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers, false);
+	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
 	pei->pcxt = pcxt;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index de1257d9c22..44e14d6a8b7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -289,14 +289,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -304,8 +296,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index a6fda81feb6..a11b4bebd61 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -521,6 +521,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN, "parallel_hash_join");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "sxact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 654eca4f3f5..617208c42cb 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLEXACT's member 'predicateLockListLock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	PredicateLockHashPartitionLock(hashcode)
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target.
@@ -186,6 +194,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -469,6 +478,7 @@ static void CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag);
 static void FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer);
 static void OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
 										SERIALIZABLEXACT *writer);
+static void CreateLocalPredicateLockHash(void);
 
 
 /*------------------------------------------------------------------------*/
@@ -522,8 +532,10 @@ SerializationNeededForRead(Relation relation, Snapshot snapshot)
 	 * A transaction is flagged as RO_SAFE if all concurrent R/W transactions
 	 * commit without having conflicts out to an earlier snapshot, thus
 	 * ensuring that no conflicts are possible for this transaction.
+	 *
+	 * This optimization is not yet supported in parallel mode.
 	 */
-	if (SxactIsROSafe(MySerializableXact))
+	if (SxactIsROSafe(MySerializableXact) && !IsInParallelMode())
 	{
 		ReleasePredicateLocks(false);
 		return false;
@@ -1214,6 +1226,8 @@ InitPredicateLocks(void)
 		memset(PredXact->element, 0, requestSize);
 		for (i = 0; i < max_table_size; i++)
 		{
+			LWLockInitialize(&PredXact->element[i].sxact.predicateLockListLock,
+							 LWTRANCHE_SXACT);
 			SHMQueueInsertBefore(&(PredXact->availableList),
 								 &(PredXact->element[i].link));
 		}
@@ -1679,6 +1693,17 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
 {
 	Assert(IsolationIsSerializable());
 
+	/*
+	 * If this is called by parallel.c in a parallel worker, we don't want to
+	 * create a SERIALIZABLEXACT just yet because the leader's
+	 * SERIALIZABLEXACT will be installed with AttachSerializableXact().  We
+	 * also don't want to reject SERIALIZABLE READ ONLY DEFERRABLE in this
+	 * case, because the leader has already determined that the snapshot it
+	 * has passed us is safe.  So there is nothing for us to do.
+	 */
+	if (IsParallelWorker())
+		return;
+
 	/*
 	 * We do not allow SERIALIZABLE READ ONLY DEFERRABLE transactions to
 	 * import snapshots, since there's no way to wait for a safe snapshot when
@@ -1712,7 +1737,6 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	VirtualTransactionId vxid;
 	SERIALIZABLEXACT *sxact,
 			   *othersxact;
-	HASHCTL		hash_ctl;
 
 	/* We only do this for serializable transactions.  Once. */
 	Assert(MySerializableXact == InvalidSerializableXact);
@@ -1859,6 +1883,16 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 
 	LWLockRelease(SerializableXactHashLock);
 
+	CreateLocalPredicateLockHash();
+
+	return snapshot;
+}
+
+static void
+CreateLocalPredicateLockHash(void)
+{
+	HASHCTL		hash_ctl;
+
 	/* Initialize the backend-local hash table of parent locks */
 	Assert(LocalPredicateLockHash == NULL);
 	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
@@ -1868,8 +1902,6 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 										 max_predicate_locks_per_xact,
 										 &hash_ctl,
 										 HASH_ELEM | HASH_BLOBS);
-
-	return snapshot;
 }
 
 /*
@@ -2124,7 +2156,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2137,6 +2171,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2190,6 +2226,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2388,6 +2426,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2425,6 +2465,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2612,7 +2654,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2672,7 +2715,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2687,7 +2730,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3290,6 +3334,10 @@ ReleasePredicateLocks(bool isCommit)
 		return;
 	}
 
+	/* Parallel workers mustn't release predicate locks. */
+	if (IsParallelWorker())
+		goto backend_local_cleanup;
+
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
 	Assert(!isCommit || SxactIsPrepared(MySerializableXact));
@@ -3319,8 +3367,8 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact->finishedBefore = ShmemVariableCache->nextXid;
 
 	/*
-	 * If it's not a commit it's a rollback, and we can clear our locks
-	 * immediately.
+	 * If it's not a commit it's either a rollback or a read-only transaction
+	 * flagged SXACT_FLAG_RO_SAFE, and we can clear our locks immediately.
 	 */
 	if (isCommit)
 	{
@@ -3567,6 +3615,7 @@ ReleasePredicateLocks(bool isCommit)
 	if (needToClear)
 		ClearOldPredicateLocks();
 
+backend_local_cleanup:
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
@@ -4259,6 +4308,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->predicateLockListLock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4293,6 +4344,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->predicateLockListLock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4841,6 +4894,13 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->predicateLockListLock in parallel mode because
+	 * there cannot be any parallel workers running while we are preparing a
+	 * transaction.
+	 */
+	Assert(!IsParallelWorker() && !ParallelContextActive());
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -5049,3 +5109,30 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Prepare to share the current SERIALIZABLEXACT with parallel workers.
+ * Return a handle object that can be used by AttachSerializableXact() in a
+ * parallel worker.
+ */
+SerializableXactHandle
+ShareSerializableXact(void)
+{
+	Assert(!IsParallelWorker());
+
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+AttachSerializableXact(SerializableXactHandle handle)
+{
+
+	Assert(MySerializableXact == InvalidSerializableXact);
+
+	MySerializableXact = (SERIALIZABLEXACT *) handle;
+	if (MySerializableXact != InvalidSerializableXact)
+		CreateLocalPredicateLockHash();
+}
diff --git a/src/include/access/parallel.h b/src/include/access/parallel.h
index 025691fd82d..45e7fbb43f8 100644
--- a/src/include/access/parallel.h
+++ b/src/include/access/parallel.h
@@ -60,8 +60,7 @@ extern PGDLLIMPORT bool InitializingParallelWorker;
 #define		IsParallelWorker()		(ParallelWorkerNumber >= 0)
 
 extern ParallelContext *CreateParallelContext(const char *library_name,
-					  const char *function_name, int nworkers,
-					  bool serializable_okay);
+					  const char *function_name, int nworkers);
 extern void InitializeParallelDSM(ParallelContext *pcxt);
 extern void ReinitializeParallelDSM(ParallelContext *pcxt);
 extern void LaunchParallelWorkers(ParallelContext *pcxt);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index c21bfe2f666..b25c43fc6be 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SHARED_TUPLESTORE,
 	LWTRANCHE_TBM,
 	LWTRANCHE_PARALLEL_APPEND,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 6a3464daa1e..23f3acc3ce1 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -30,6 +30,11 @@ extern int	max_predicate_locks_per_page;
 /* Number of SLRU buffers to use for predicate locking */
 #define NUM_OLDSERXID_BUFFERS	16
 
+/*
+ * A handle used for sharing SERIALIZABLEXACT objects between the participants
+ * in a parallel query.
+ */
+typedef void *SerializableXactHandle;
 
 /*
  * function prototypes
@@ -74,4 +79,8 @@ extern void PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit);
 extern void predicatelock_twophase_recover(TransactionId xid, uint16 info,
 							   void *recdata, uint32 len);
 
+/* parallel query support */
+extern SerializableXactHandle ShareSerializableXact(void);
+extern void AttachSerializableXact(SerializableXactHandle handle);
+
 #endif							/* PREDICATE_H */
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 0f736d37dff..59eb49e57ee 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	LWLock		predicateLockListLock;	/* protects predicateLocks in parallel
+										 * mode */
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
diff --git a/src/test/isolation/expected/serializable-parallel-2.out b/src/test/isolation/expected/serializable-parallel-2.out
new file mode 100644
index 00000000000..9a693c4dc62
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel-2.out
@@ -0,0 +1,44 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1r s2r1 s1c s2r2 s2c
+step s1r: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s2r1: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s1c: COMMIT;
+step s2r2: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s2c: COMMIT;
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 00000000000..f43aa6a2990
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 74d7d59546a..36890d74b60 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,3 +66,5 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+test: serializable-parallel
+test: serializable-parallel-2
diff --git a/src/test/isolation/specs/serializable-parallel-2.spec b/src/test/isolation/specs/serializable-parallel-2.spec
new file mode 100644
index 00000000000..7f90f75d882
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel-2.spec
@@ -0,0 +1,30 @@
+# Exercise the case where a read-only serializable transaction has
+# SXACT_FLAG_RO_SAFE set in a parallel query.
+
+setup
+{
+	CREATE TABLE foo AS SELECT generate_series(1, 10)::int a;
+	ALTER TABLE foo SET (parallel_workers = 2);
+}
+
+teardown
+{
+	DROP TABLE foo;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1r"	{ SELECT * FROM foo; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY;
+			  SET parallel_setup_cost = 0;
+			  SET parallel_tuple_cost = 0;
+			}
+step "s2r1"	{ SELECT * FROM foo; }
+step "s2r2"	{ SELECT * FROM foo; }
+step "s2c"	{ COMMIT; }
+
+permutation "s1r" "s2r1" "s1c" "s2r2" "s2c"
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 00000000000..0e7c2c7c1fa
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"
-- 
2.15.1

0002-Enable-the-read-only-SERIALIZABLE-optimization-f-v13.patchapplication/octet-stream; name=0002-Enable-the-read-only-SERIALIZABLE-optimization-f-v13.patchDownload

From e3b411cdd5ca5c8c3f6ccd73ae4c7d8ef1470903 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Sun, 25 Feb 2018 23:45:09 +1300
Subject: [PATCH 2/2] Enable the read-only SERIALIZABLE optimization for
 parallel query.

A SERIALIZABLEXACT can be marked as SXACT_FLAG_RO_SAFE by a concurrent session,
meaning that it is safe to throw away this SERIALIZABLEXACT and start behaving
like a REPEATABLE READ transaction.  The problem is that the leader and workers
are sharing the same SERIALIZABLEXACT so this must be coordinated carefully.
This commit solves that problem as follows:

The first backend to observe the SXACT_FLAG_RO_SAFE flag will 'partially
release' it, meaning that the conflicts and locks it holds can be released, but
the SERIALIZABLEXACT itself will remain active because other backends might
have a pointer to it.

Whenever any backend notices the SXACT_FLAG_RO_SAFE flag, it clears its own
MySerializableXact variable so that it can skip SSI checks for the rest of the
transaction.  In the special case of the leader process, it transfers the
SERIALIZABLEXACT to a new variable SavedSerializableXact, so that it can be
completely released at the end of the transaction after all workers have
exited.

Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 src/backend/storage/lmgr/predicate.c      | 136 ++++++++++++++++++++++++++----
 src/backend/utils/resowner/resowner.c     |   2 +-
 src/include/storage/predicate.h           |   2 +-
 src/include/storage/predicate_internals.h |   6 ++
 4 files changed, 127 insertions(+), 19 deletions(-)

diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 617208c42cb..a3e36081db8 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -170,7 +170,7 @@
  *		PredicateLockPageCombine(Relation relation, BlockNumber oldblkno,
  *								 BlockNumber newblkno)
  *		TransferPredicateLocksToHeapRelation(Relation relation)
- *		ReleasePredicateLocks(bool isCommit)
+ *		ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
  *
  * conflict detection (may also trigger rollback)
  *		CheckForSerializableConflictOut(bool visible, Relation relation,
@@ -288,6 +288,7 @@
 #define SxactIsDeferrableWaiting(sxact) (((sxact)->flags & SXACT_FLAG_DEFERRABLE_WAITING) != 0)
 #define SxactIsROSafe(sxact) (((sxact)->flags & SXACT_FLAG_RO_SAFE) != 0)
 #define SxactIsROUnsafe(sxact) (((sxact)->flags & SXACT_FLAG_RO_UNSAFE) != 0)
+#define SxactIsPartiallyReleased(sxact) (((sxact)->flags & SXACT_FLAG_PARTIALLY_RELEASED) != 0)
 
 /*
  * Compute the hash code associated with a PREDICATELOCKTARGETTAG.
@@ -422,6 +423,15 @@ static HTAB *LocalPredicateLockHash = NULL;
 static SERIALIZABLEXACT *MySerializableXact = InvalidSerializableXact;
 static bool MyXactDidWrite = false;
 
+/*
+ * The SXACT_FLAG_RO_UNSAFE optimization might lead us to release
+ * MySerializableXact early.  If that happens in a parallel query, the leader
+ * needs to defer the destruction of the SERIALIZABLEXACT until end of
+ * transaction, because the workers still have a reference to it.  In that
+ * case, the leader stores it here.
+ */
+static SERIALIZABLEXACT *SavedSerializableXact = InvalidSerializableXact;
+
 /* local functions */
 
 static SERIALIZABLEXACT *CreatePredXact(void);
@@ -532,12 +542,10 @@ SerializationNeededForRead(Relation relation, Snapshot snapshot)
 	 * A transaction is flagged as RO_SAFE if all concurrent R/W transactions
 	 * commit without having conflicts out to an earlier snapshot, thus
 	 * ensuring that no conflicts are possible for this transaction.
-	 *
-	 * This optimization is not yet supported in parallel mode.
 	 */
-	if (SxactIsROSafe(MySerializableXact) && !IsInParallelMode())
+	if (SxactIsROSafe(MySerializableXact))
 	{
-		ReleasePredicateLocks(false);
+		ReleasePredicateLocks(false, true);
 		return false;
 	}
 
@@ -1573,14 +1581,14 @@ GetSafeSnapshot(Snapshot origSnapshot)
 		ereport(DEBUG2,
 				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 				 errmsg("deferrable snapshot was unsafe; trying a new one")));
-		ReleasePredicateLocks(false);
+		ReleasePredicateLocks(false, false);
 	}
 
 	/*
 	 * Now we have a safe snapshot, so we don't need to do any further checks.
 	 */
 	Assert(SxactIsROSafe(MySerializableXact));
-	ReleasePredicateLocks(false);
+	ReleasePredicateLocks(false, true);
 
 	return snapshot;
 }
@@ -3307,9 +3315,17 @@ SetNewSxactGlobalXmin(void)
  * If this transaction is committing and is holding any predicate locks,
  * it must be added to a list of completed serializable transactions still
  * holding locks.
+ *
+ * If isReadOnlySafe is true, then predicate locks are being released before
+ * the end of the transaction because MySerializableXact has been determined
+ * to be RO_SAFE.  In non-parallel mode we can release it completely, but it
+ * in parallel mode we partially release the SERIALIZABLEXACT and keep it
+ * around until the end of the transaction, allowing each backend to clear its
+ * MySerializableXact variable and benefit from the optimization in its own
+ * time.
  */
 void
-ReleasePredicateLocks(bool isCommit)
+ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
 {
 	bool		needToClear;
 	RWConflict	conflict,
@@ -3328,22 +3344,93 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* We can't be both committing and releasing early due to RO_SAFE. */
+	Assert(!(isCommit && isReadOnlySafe));
+
+	/* Are we at the end of a transaction, that is, a commit or abort? */
+	if (!isReadOnlySafe)
+	{
+		/*
+		 * Parallel workers mustn't release predicate locks at the end of
+		 * their transaction.  The leader will do that at the end of its
+		 * transaction.
+		 */
+		if (IsParallelWorker())
+			goto backend_local_cleanup;
+
+		/*
+		 * By the time the leader in a parallel query reaches end of
+		 * transaction, it has waited for all workers to exit.
+		 */
+		Assert(!ParallelContextActive());
+
+		/*
+		 * If the leader in a parallel query earler stashed a partially
+		 * released SERIALIZABLEXACT for final clean-up at end of transaction
+		 * (because workers might still have been accessing it), then it's
+		 * time to restore it.
+		 */
+		if (SavedSerializableXact != InvalidSerializableXact)
+		{
+			Assert(MySerializableXact == InvalidSerializableXact);
+			MySerializableXact = SavedSerializableXact;
+			SavedSerializableXact = InvalidSerializableXact;
+			Assert(SxactIsPartiallyReleased(MySerializableXact));
+		}
+	}
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
 		return;
 	}
 
-	/* Parallel workers mustn't release predicate locks. */
-	if (IsParallelWorker())
-		goto backend_local_cleanup;
-
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
+	/*
+	 * If the transaction is committing, but it has been partially released
+	 * already, then treat this as a roll back.  It was marked as rolled back.
+	 */
+	if (isCommit && SxactIsPartiallyReleased(MySerializableXact))
+		isCommit = false;
+
+	/*
+	 * If we're called in the middle of a transaction because we discovered
+	 * that the SXACT_FLAG_RO_SAFE flag was set, then we'll partially release
+	 * it (that is, release the predicate locks and conflicts, but not the
+	 * SERIALIZABLEXACT itself) if we're the first backend to have noticed.
+	 */
+	if (isReadOnlySafe && IsInParallelMode())
+	{
+		/*
+		 * The leader needs to stash a pointer to it, so that it can
+		 * completely release it at end-of-transaction.
+		 */
+		if (!IsParallelWorker())
+			SavedSerializableXact = MySerializableXact;
+
+		/*
+		 * The first backend to reach this condition will partially release
+		 * the SERIALIZABLEXACT.  All others will just clear their
+		 * backend-local state so that they stop doing SSI checks for the rest
+		 * of the transaction.
+		 */
+		if (SxactIsPartiallyReleased(MySerializableXact))
+		{
+			LWLockRelease(SerializableXactHashLock);
+			goto backend_local_cleanup;
+		}
+		else
+		{
+			MySerializableXact->flags |= SXACT_FLAG_PARTIALLY_RELEASED;
+			/* ... and proceed to perform the partial release below. */
+		}
+	}
 	Assert(!isCommit || SxactIsPrepared(MySerializableXact));
 	Assert(!isCommit || !SxactIsDoomed(MySerializableXact));
 	Assert(!SxactIsCommitted(MySerializableXact));
-	Assert(!SxactIsRolledBack(MySerializableXact));
+	Assert(SxactIsPartiallyReleased(MySerializableXact)
+		   || !SxactIsRolledBack(MySerializableXact));
 
 	/* may not be serializable during COMMIT/ROLLBACK PREPARED */
 	Assert(MySerializableXact->pid == 0 || IsolationIsSerializable());
@@ -3392,7 +3479,8 @@ ReleasePredicateLocks(bool isCommit)
 		 * cleanup. This means it should not be considered when calculating
 		 * SxactGlobalXmin.
 		 */
-		MySerializableXact->flags |= SXACT_FLAG_DOOMED;
+		if (!isReadOnlySafe)
+			MySerializableXact->flags |= SXACT_FLAG_DOOMED;
 		MySerializableXact->flags |= SXACT_FLAG_ROLLED_BACK;
 
 		/*
@@ -3588,7 +3676,8 @@ ReleasePredicateLocks(bool isCommit)
 	 * was launched.
 	 */
 	needToClear = false;
-	if (TransactionIdEquals(MySerializableXact->xmin, PredXact->SxactGlobalXmin))
+	if (!isReadOnlySafe &&
+		TransactionIdEquals(MySerializableXact->xmin, PredXact->SxactGlobalXmin))
 	{
 		Assert(PredXact->SxactGlobalXminCount > 0);
 		if (--(PredXact->SxactGlobalXminCount) == 0)
@@ -3607,8 +3696,16 @@ ReleasePredicateLocks(bool isCommit)
 		SHMQueueInsertBefore(FinishedSerializableTransactions,
 							 &MySerializableXact->finishedLink);
 
+	/*
+	 * If we're releasing a RO_SAFE transaction in parallel mode, we'll only
+	 * partially release it.  That's necessary because other backends may have
+	 * a reference to it.  The leader will release the SERIALIZABLEXACT itself
+	 * at the end of the transaction after workers have stopped running.
+	 */
 	if (!isCommit)
-		ReleaseOneSerializableXact(MySerializableXact, false, false);
+		ReleaseOneSerializableXact(MySerializableXact,
+								   isReadOnlySafe && IsInParallelMode(),
+								   false);
 
 	LWLockRelease(SerializableFinishedListLock);
 
@@ -3807,6 +3904,8 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 	 * them to OldCommittedSxact if summarize is true)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -3886,6 +3985,8 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 	 */
 	SHMQueueInit(&sxact->predicateLocks);
 
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 
 	sxidtag.xid = sxact->topXid;
@@ -4776,6 +4877,7 @@ PreCommit_CheckForSerializationFailure(void)
 	/* Check if someone else has already decided that we need to die */
 	if (SxactIsDoomed(MySerializableXact))
 	{
+		Assert(!SxactIsPartiallyReleased(MySerializableXact));
 		LWLockRelease(SerializableXactHashLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
@@ -4973,7 +5075,7 @@ PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit)
 	MySerializableXact = sxid->myXact;
 	MyXactDidWrite = true;		/* conservatively assume that we wrote
 								 * something */
-	ReleasePredicateLocks(isCommit);
+	ReleasePredicateLocks(isCommit, false);
 }
 
 /*
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index e09a4f1ddb4..ab0523e90a5 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -551,7 +551,7 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
 			if (owner == TopTransactionResourceOwner)
 			{
 				ProcReleaseLocks(isCommit);
-				ReleasePredicateLocks(isCommit);
+				ReleasePredicateLocks(isCommit, false);
 			}
 		}
 		else
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 23f3acc3ce1..0925270b91e 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -61,7 +61,7 @@ extern void PredicateLockTuple(Relation relation, HeapTuple tuple, Snapshot snap
 extern void PredicateLockPageSplit(Relation relation, BlockNumber oldblkno, BlockNumber newblkno);
 extern void PredicateLockPageCombine(Relation relation, BlockNumber oldblkno, BlockNumber newblkno);
 extern void TransferPredicateLocksToHeapRelation(Relation relation);
-extern void ReleasePredicateLocks(bool isCommit);
+extern void ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe);
 
 /* conflict detection (may also trigger rollback) */
 extern void CheckForSerializableConflictOut(bool valid, Relation relation, HeapTuple tuple,
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 59eb49e57ee..04de63877d5 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -127,6 +127,12 @@ typedef struct SERIALIZABLEXACT
 #define SXACT_FLAG_RO_UNSAFE			0x00000100
 #define SXACT_FLAG_SUMMARY_CONFLICT_IN	0x00000200
 #define SXACT_FLAG_SUMMARY_CONFLICT_OUT 0x00000400
+/*
+ * The following flag means the transaction has been partially released
+ * already, but is being preserved because parallel workers might have a
+ * reference to it.  It'll be recycled by the leader at end-of-transaction.
+ */
+#define SXACT_FLAG_PARTIALLY_RELEASED	0x00000800
 
 /*
  * The following types are used to provide an ad hoc list for holding
-- 
2.15.1

#39

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Thomas Munro (#38)

Re: [HACKERS] SERIALIZABLE with parallel query

On Wed, Feb 28, 2018 at 11:35 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Mon, Feb 26, 2018 at 6:37 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I've now broken it into two patches.

Rebased.

+SerializableXactHandle
+ShareSerializableXact(void)
+{
+    Assert(!IsParallelWorker());
+
+    return MySerializableXact;
+}

Uh, how's that OK? There's no rule that you can't create a
ParallelContext in a worker. Parallel query currently doesn't, so it
probably won't happen, but burying an assertion to that effect in the
predicate locking code doesn't seem nice.

Is "sxact" really the best (i.e. clearest) name we can come up with
for the lock tranche?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#40

Thomas Munro

thomas.munro@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#39)

Re: [HACKERS] SERIALIZABLE with parallel query

On Thu, Mar 8, 2018 at 10:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:

+SerializableXactHandle
+ShareSerializableXact(void)
+{
+    Assert(!IsParallelWorker());
+
+    return MySerializableXact;
+}
Uh, how's that OK? There's no rule that you can't create a
ParallelContext in a worker. Parallel query currently doesn't, so it
probably won't happen, but burying an assertion to that effect in the
predicate locking code doesn't seem nice.

Hmm. I suppose you could have a PARALLEL SAFE function that itself
launches parallel workers explicitly (not via parallel query), and
they should inherit the same SERIALIZABLEXACT from their parent and
that should all just work.

Is "sxact" really the best (i.e. clearest) name we can come up with
for the lock tranche?

Yeah, needs a better name.

I have some lingering uncertainty about this patch and we're out of
time, so I moved it to PG12 CF1. Thanks Haribabu, Robert, Amit for
the reviews and comments so far.

--
Thomas Munro
http://www.enterprisedb.com

#41

Masahiko Sawada

sawada.mshk@gmail.com

over 7 years ago

In reply to: Thomas Munro (#40)

Re: [HACKERS] SERIALIZABLE with parallel query

On Fri, Mar 30, 2018 at 2:56 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Mar 8, 2018 at 10:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
+SerializableXactHandle
+ShareSerializableXact(void)
+{
+    Assert(!IsParallelWorker());
+
+    return MySerializableXact;
+}
Uh, how's that OK? There's no rule that you can't create a
ParallelContext in a worker. Parallel query currently doesn't, so it
probably won't happen, but burying an assertion to that effect in the
predicate locking code doesn't seem nice.
Hmm. I suppose you could have a PARALLEL SAFE function that itself
launches parallel workers explicitly (not via parallel query), and
they should inherit the same SERIALIZABLEXACT from their parent and
that should all just work.

Is "sxact" really the best (i.e. clearest) name we can come up with
for the lock tranche?

Yeah, needs a better name.

I have some lingering uncertainty about this patch and we're out of
time, so I moved it to PG12 CF1. Thanks Haribabu, Robert, Amit for
the reviews and comments so far.

I'd like to test and review this patches but they seem to conflict
with current HEAD. Could you please rebase them?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#42

Thomas Munro

thomas.munro@enterprisedb.com

over 7 years ago

In reply to: Masahiko Sawada (#41)

2 attachment(s)

Re: [HACKERS] SERIALIZABLE with parallel query

On Thu, Jun 28, 2018 at 7:55 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'd like to test and review this patches but they seem to conflict
with current HEAD. Could you please rebase them?

Hi Sawada-san,

Thanks! Rebased and attached. The only changes are: the LWLock
tranche is now shown as "serializable_xact" instead of "sxact" (hmm,
LWLock tranches have lower_case_names_with_underscores, but individual
LWLocks have CamelCaseName...), and ShareSerializableXact() no longer
does Assert(!IsParallelWorker()). These changes are based on the last
feedback from Robert.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Enable-parallel-query-with-SERIALIZABLE-isolatio-v14.patchapplication/octet-stream; name=0001-Enable-parallel-query-with-SERIALIZABLE-isolatio-v14.patchDownload

From cb183445e0fcd26ca337763f77b655c06bd50e79 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Fri, 1 Sep 2017 16:54:57 +1200
Subject: [PATCH 1/2] Enable parallel query with SERIALIZABLE isolation.

Previously, the SERIALIZABLE isolation level prevented parallel query from
being used.  Allow the two features to be used together by sharing the
leader's SERIALIZABLEXACT with parallel workers.

Remove the serializable_okay flag added to CreateParallelContext() by commit
9da0cc35284bdbe8d442d732963303ff0e0a40bc, because it's now redundant.

The optimization allowing SSI checks to be skipped after a certain point in
read-only transactions is disabled in parallel mode.  It could be implemented
in a later commit.

Author: Thomas Munro
Reviewed-By: Haribabu Kommi, Robert Haas
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 doc/src/sgml/monitoring.sgml                  |   5 +
 doc/src/sgml/parallel.sgml                    |  17 ---
 src/backend/access/nbtree/nbtsort.c           |   2 +-
 src/backend/access/transam/parallel.c         |  18 ++-
 src/backend/access/transam/xact.c             |  14 ++-
 src/backend/executor/execParallel.c           |   2 +-
 src/backend/optimizer/plan/planner.c          |  11 +-
 src/backend/storage/lmgr/lwlock.c             |   1 +
 src/backend/storage/lmgr/predicate.c          | 107 ++++++++++++++++--
 src/include/access/parallel.h                 |   3 +-
 src/include/storage/lwlock.h                  |   1 +
 src/include/storage/predicate.h               |   9 ++
 src/include/storage/predicate_internals.h     |   4 +
 .../expected/serializable-parallel-2.out      |  44 +++++++
 .../expected/serializable-parallel.out        |  44 +++++++
 src/test/isolation/isolation_schedule         |   2 +
 .../specs/serializable-parallel-2.spec        |  30 +++++
 .../specs/serializable-parallel.spec          |  48 ++++++++
 18 files changed, 305 insertions(+), 57 deletions(-)
 create mode 100644 src/test/isolation/expected/serializable-parallel-2.out
 create mode 100644 src/test/isolation/expected/serializable-parallel.out
 create mode 100644 src/test/isolation/specs/serializable-parallel-2.spec
 create mode 100644 src/test/isolation/specs/serializable-parallel.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c2adb22dff9..49ee5b59d5c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -979,6 +979,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting to perform an operation on a list of locks held by
          serializable transactions.</entry>
         </row>
+        <row>
+         <entry><literal>serializable_xact</literal></entry>
+         <entry>Waiting to perform an operation on a serializable transaction
+         in a parallel query.</entry>
+        </row>
         <row>
          <entry><literal>OldSerXidLock</literal></entry>
          <entry>Waiting to read or record conflicting serializable
diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index dd7834a763f..11b7ea6dd6e 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -192,13 +192,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -241,16 +234,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         that may be suboptimal when run serially.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 16f57557776..c4e1721e553 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1255,7 +1255,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	EnterParallelMode();
 	Assert(request > 0);
 	pcxt = CreateParallelContext("postgres", "_bt_parallel_build_main",
-								 request, true);
+								 request);
 	scantuplesortstates = leaderparticipates ? request + 1 : request;
 
 	/*
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 1d631b72755..b4f4be4df55 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -30,6 +30,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -86,6 +87,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SerializableXactHandle serializable_xact_handle;
 
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
@@ -150,7 +152,7 @@ static void ParallelWorkerShutdown(int code, Datum arg);
  */
 ParallelContext *
 CreateParallelContext(const char *library_name, const char *function_name,
-					  int nworkers, bool serializable_okay)
+					  int nworkers)
 {
 	MemoryContext oldcontext;
 	ParallelContext *pcxt;
@@ -168,16 +170,6 @@ CreateParallelContext(const char *library_name, const char *function_name,
 	if (dynamic_shared_memory_type == DSM_IMPL_NONE)
 		nworkers = 0;
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.  Utility statement callers may ask us to ignore this
-	 * restriction because they're always able to safely ignore the fact that
-	 * SIREAD locks do not work with parallelism.
-	 */
-	if (IsolationIsSerializable() && !serializable_okay)
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -321,6 +313,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->serializable_xact_handle = ShareSerializableXact();
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_FIXED, fps);
@@ -1385,6 +1378,9 @@ ParallelWorkerMain(Datum main_arg)
 	reindexspace = shm_toc_lookup(toc, PARALLEL_KEY_REINDEX_STATE, false);
 	RestoreReindexState(reindexspace);
 
+	/* Attach to the leader's serializable transaction, if SERIALIZABLE. */
+	AttachSerializableXact(fps->serializable_xact_handle);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8e6aef332cb..844baed96d6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2003,9 +2003,12 @@ CommitTransaction(void)
 	/*
 	 * Mark serializable transaction as complete for predicate locking
 	 * purposes.  This should be done as late as we can put it and still allow
-	 * errors to be raised for failure patterns found at commit.
+	 * errors to be raised for failure patterns found at commit.  This is not
+	 * appropriate in a parallel worker however, because we aren't committing
+	 * the leader's transaction and its serializable state will live on.
 	 */
-	PreCommit_CheckForSerializationFailure();
+	if (!is_parallel_worker)
+		PreCommit_CheckForSerializationFailure();
 
 	/*
 	 * Insert notifications sent by NOTIFY commands into the queue.  This
@@ -2231,9 +2234,12 @@ PrepareTransaction(void)
 	/*
 	 * Mark serializable transaction as complete for predicate locking
 	 * purposes.  This should be done as late as we can put it and still allow
-	 * errors to be raised for failure patterns found at commit.
+	 * errors to be raised for failure patterns found at commit.  This is not
+	 * appropriate for parallel workers however, because we aren't committing
+	 * the leader's transaction and its serializable state will live on.
 	 */
-	PreCommit_CheckForSerializationFailure();
+	if (!IsParallelWorker())
+		PreCommit_CheckForSerializationFailure();
 
 	/* NOTIFY will be handled below */
 
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 52f1a96db5f..9c1f334d90b 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -593,7 +593,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pstmt_data = ExecSerializePlan(planstate->plan, estate);
 
 	/* Create a parallel context. */
-	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers, false);
+	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
 	pei->pcxt = pcxt;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fd45c9767df..8dcd61096d7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -324,14 +324,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
@@ -339,8 +331,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index a6fda81feb6..3b47eb057f6 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -521,6 +521,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN, "parallel_hash_join");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "serializable_xact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index e8390311d03..5623b62753b 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLEXACT's member 'predicateLockListLock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	PredicateLockHashPartitionLock(hashcode)
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target.
@@ -186,6 +194,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -465,6 +474,7 @@ static void CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag);
 static void FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer);
 static void OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
 										SERIALIZABLEXACT *writer);
+static void CreateLocalPredicateLockHash(void);
 
 
 /*------------------------------------------------------------------------*/
@@ -518,8 +528,10 @@ SerializationNeededForRead(Relation relation, Snapshot snapshot)
 	 * A transaction is flagged as RO_SAFE if all concurrent R/W transactions
 	 * commit without having conflicts out to an earlier snapshot, thus
 	 * ensuring that no conflicts are possible for this transaction.
+	 *
+	 * This optimization is not yet supported in parallel mode.
 	 */
-	if (SxactIsROSafe(MySerializableXact))
+	if (SxactIsROSafe(MySerializableXact) && !IsInParallelMode())
 	{
 		ReleasePredicateLocks(false);
 		return false;
@@ -1168,6 +1180,8 @@ InitPredicateLocks(void)
 		memset(PredXact->element, 0, requestSize);
 		for (i = 0; i < max_table_size; i++)
 		{
+			LWLockInitialize(&PredXact->element[i].sxact.predicateLockListLock,
+							 LWTRANCHE_SXACT);
 			SHMQueueInsertBefore(&(PredXact->availableList),
 								 &(PredXact->element[i].link));
 		}
@@ -1633,6 +1647,17 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
 {
 	Assert(IsolationIsSerializable());
 
+	/*
+	 * If this is called by parallel.c in a parallel worker, we don't want to
+	 * create a SERIALIZABLEXACT just yet because the leader's
+	 * SERIALIZABLEXACT will be installed with AttachSerializableXact().  We
+	 * also don't want to reject SERIALIZABLE READ ONLY DEFERRABLE in this
+	 * case, because the leader has already determined that the snapshot it
+	 * has passed us is safe.  So there is nothing for us to do.
+	 */
+	if (IsParallelWorker())
+		return;
+
 	/*
 	 * We do not allow SERIALIZABLE READ ONLY DEFERRABLE transactions to
 	 * import snapshots, since there's no way to wait for a safe snapshot when
@@ -1666,7 +1691,6 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	VirtualTransactionId vxid;
 	SERIALIZABLEXACT *sxact,
 			   *othersxact;
-	HASHCTL		hash_ctl;
 
 	/* We only do this for serializable transactions.  Once. */
 	Assert(MySerializableXact == InvalidSerializableXact);
@@ -1813,6 +1837,16 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 
 	LWLockRelease(SerializableXactHashLock);
 
+	CreateLocalPredicateLockHash();
+
+	return snapshot;
+}
+
+static void
+CreateLocalPredicateLockHash(void)
+{
+	HASHCTL		hash_ctl;
+
 	/* Initialize the backend-local hash table of parent locks */
 	Assert(LocalPredicateLockHash == NULL);
 	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
@@ -1822,8 +1856,6 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 										 max_predicate_locks_per_xact,
 										 &hash_ctl,
 										 HASH_ELEM | HASH_BLOBS);
-
-	return snapshot;
 }
 
 /*
@@ -2078,7 +2110,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2091,6 +2125,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2144,6 +2180,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2342,6 +2380,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2379,6 +2419,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2566,7 +2608,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2626,7 +2669,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2641,7 +2684,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3244,6 +3288,10 @@ ReleasePredicateLocks(bool isCommit)
 		return;
 	}
 
+	/* Parallel workers mustn't release predicate locks. */
+	if (IsParallelWorker())
+		goto backend_local_cleanup;
+
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
 	Assert(!isCommit || SxactIsPrepared(MySerializableXact));
@@ -3273,8 +3321,8 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact->finishedBefore = ShmemVariableCache->nextXid;
 
 	/*
-	 * If it's not a commit it's a rollback, and we can clear our locks
-	 * immediately.
+	 * If it's not a commit it's either a rollback or a read-only transaction
+	 * flagged SXACT_FLAG_RO_SAFE, and we can clear our locks immediately.
 	 */
 	if (isCommit)
 	{
@@ -3521,6 +3569,7 @@ ReleasePredicateLocks(bool isCommit)
 	if (needToClear)
 		ClearOldPredicateLocks();
 
+backend_local_cleanup:
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
@@ -4213,6 +4262,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->predicateLockListLock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4247,6 +4298,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->predicateLockListLock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4795,6 +4848,13 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->predicateLockListLock in parallel mode because
+	 * there cannot be any parallel workers running while we are preparing a
+	 * transaction.
+	 */
+	Assert(!IsParallelWorker() && !ParallelContextActive());
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -5003,3 +5063,28 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Prepare to share the current SERIALIZABLEXACT with parallel workers.
+ * Return a handle object that can be used by AttachSerializableXact() in a
+ * parallel worker.
+ */
+SerializableXactHandle
+ShareSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+AttachSerializableXact(SerializableXactHandle handle)
+{
+
+	Assert(MySerializableXact == InvalidSerializableXact);
+
+	MySerializableXact = (SERIALIZABLEXACT *) handle;
+	if (MySerializableXact != InvalidSerializableXact)
+		CreateLocalPredicateLockHash();
+}
diff --git a/src/include/access/parallel.h b/src/include/access/parallel.h
index 025691fd82d..45e7fbb43f8 100644
--- a/src/include/access/parallel.h
+++ b/src/include/access/parallel.h
@@ -60,8 +60,7 @@ extern PGDLLIMPORT bool InitializingParallelWorker;
 #define		IsParallelWorker()		(ParallelWorkerNumber >= 0)
 
 extern ParallelContext *CreateParallelContext(const char *library_name,
-					  const char *function_name, int nworkers,
-					  bool serializable_okay);
+					  const char *function_name, int nworkers);
 extern void InitializeParallelDSM(ParallelContext *pcxt);
 extern void ReinitializeParallelDSM(ParallelContext *pcxt);
 extern void LaunchParallelWorkers(ParallelContext *pcxt);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index c21bfe2f666..b25c43fc6be 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SHARED_TUPLESTORE,
 	LWTRANCHE_TBM,
 	LWTRANCHE_PARALLEL_APPEND,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 6a3464daa1e..23f3acc3ce1 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -30,6 +30,11 @@ extern int	max_predicate_locks_per_page;
 /* Number of SLRU buffers to use for predicate locking */
 #define NUM_OLDSERXID_BUFFERS	16
 
+/*
+ * A handle used for sharing SERIALIZABLEXACT objects between the participants
+ * in a parallel query.
+ */
+typedef void *SerializableXactHandle;
 
 /*
  * function prototypes
@@ -74,4 +79,8 @@ extern void PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit);
 extern void predicatelock_twophase_recover(TransactionId xid, uint16 info,
 							   void *recdata, uint32 len);
 
+/* parallel query support */
+extern SerializableXactHandle ShareSerializableXact(void);
+extern void AttachSerializableXact(SerializableXactHandle handle);
+
 #endif							/* PREDICATE_H */
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 0f736d37dff..59eb49e57ee 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	LWLock		predicateLockListLock;	/* protects predicateLocks in parallel
+										 * mode */
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
diff --git a/src/test/isolation/expected/serializable-parallel-2.out b/src/test/isolation/expected/serializable-parallel-2.out
new file mode 100644
index 00000000000..9a693c4dc62
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel-2.out
@@ -0,0 +1,44 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1r s2r1 s1c s2r2 s2c
+step s1r: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s2r1: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s1c: COMMIT;
+step s2r2: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s2c: COMMIT;
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 00000000000..f43aa6a2990
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 0e997215a80..b8f8932e676 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -75,3 +75,5 @@ test: partition-key-update-1
 test: partition-key-update-2
 test: partition-key-update-3
 test: plpgsql-toast
+test: serializable-parallel
+test: serializable-parallel-2
diff --git a/src/test/isolation/specs/serializable-parallel-2.spec b/src/test/isolation/specs/serializable-parallel-2.spec
new file mode 100644
index 00000000000..7f90f75d882
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel-2.spec
@@ -0,0 +1,30 @@
+# Exercise the case where a read-only serializable transaction has
+# SXACT_FLAG_RO_SAFE set in a parallel query.
+
+setup
+{
+	CREATE TABLE foo AS SELECT generate_series(1, 10)::int a;
+	ALTER TABLE foo SET (parallel_workers = 2);
+}
+
+teardown
+{
+	DROP TABLE foo;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1r"	{ SELECT * FROM foo; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY;
+			  SET parallel_setup_cost = 0;
+			  SET parallel_tuple_cost = 0;
+			}
+step "s2r1"	{ SELECT * FROM foo; }
+step "s2r2"	{ SELECT * FROM foo; }
+step "s2c"	{ COMMIT; }
+
+permutation "s1r" "s2r1" "s1c" "s2r2" "s2c"
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 00000000000..0e7c2c7c1fa
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"
-- 
2.17.0

0002-Enable-the-read-only-SERIALIZABLE-optimization-f-v14.patchapplication/octet-stream; name=0002-Enable-the-read-only-SERIALIZABLE-optimization-f-v14.patchDownload

From 10ecd61bd4c6f9951b854479a96bb9cef647318e Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Sun, 25 Feb 2018 23:45:09 +1300
Subject: [PATCH 2/2] Enable the read-only SERIALIZABLE optimization for
 parallel query.

A SERIALIZABLEXACT can be marked as SXACT_FLAG_RO_SAFE by a concurrent session,
meaning that it is safe to throw away this SERIALIZABLEXACT and start behaving
like a REPEATABLE READ transaction.  The problem is that the leader and workers
are sharing the same SERIALIZABLEXACT so this must be coordinated carefully.
This commit solves that problem as follows:

The first backend to observe the SXACT_FLAG_RO_SAFE flag will 'partially
release' it, meaning that the conflicts and locks it holds can be released, but
the SERIALIZABLEXACT itself will remain active because other backends might
have a pointer to it.

Whenever any backend notices the SXACT_FLAG_RO_SAFE flag, it clears its own
MySerializableXact variable so that it can skip SSI checks for the rest of the
transaction.  In the special case of the leader process, it transfers the
SERIALIZABLEXACT to a new variable SavedSerializableXact, so that it can be
completely released at the end of the transaction after all workers have
exited.

Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 src/backend/storage/lmgr/predicate.c      | 136 +++++++++++++++++++---
 src/backend/utils/resowner/resowner.c     |   2 +-
 src/include/storage/predicate.h           |   2 +-
 src/include/storage/predicate_internals.h |   6 +
 4 files changed, 127 insertions(+), 19 deletions(-)

diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 5623b62753b..abebd6a4908 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -170,7 +170,7 @@
  *		PredicateLockPageCombine(Relation relation, BlockNumber oldblkno,
  *								 BlockNumber newblkno)
  *		TransferPredicateLocksToHeapRelation(Relation relation)
- *		ReleasePredicateLocks(bool isCommit)
+ *		ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
  *
  * conflict detection (may also trigger rollback)
  *		CheckForSerializableConflictOut(bool visible, Relation relation,
@@ -288,6 +288,7 @@
 #define SxactIsDeferrableWaiting(sxact) (((sxact)->flags & SXACT_FLAG_DEFERRABLE_WAITING) != 0)
 #define SxactIsROSafe(sxact) (((sxact)->flags & SXACT_FLAG_RO_SAFE) != 0)
 #define SxactIsROUnsafe(sxact) (((sxact)->flags & SXACT_FLAG_RO_UNSAFE) != 0)
+#define SxactIsPartiallyReleased(sxact) (((sxact)->flags & SXACT_FLAG_PARTIALLY_RELEASED) != 0)
 
 /*
  * Compute the hash code associated with a PREDICATELOCKTARGETTAG.
@@ -418,6 +419,15 @@ static HTAB *LocalPredicateLockHash = NULL;
 static SERIALIZABLEXACT *MySerializableXact = InvalidSerializableXact;
 static bool MyXactDidWrite = false;
 
+/*
+ * The SXACT_FLAG_RO_UNSAFE optimization might lead us to release
+ * MySerializableXact early.  If that happens in a parallel query, the leader
+ * needs to defer the destruction of the SERIALIZABLEXACT until end of
+ * transaction, because the workers still have a reference to it.  In that
+ * case, the leader stores it here.
+ */
+static SERIALIZABLEXACT *SavedSerializableXact = InvalidSerializableXact;
+
 /* local functions */
 
 static SERIALIZABLEXACT *CreatePredXact(void);
@@ -528,12 +538,10 @@ SerializationNeededForRead(Relation relation, Snapshot snapshot)
 	 * A transaction is flagged as RO_SAFE if all concurrent R/W transactions
 	 * commit without having conflicts out to an earlier snapshot, thus
 	 * ensuring that no conflicts are possible for this transaction.
-	 *
-	 * This optimization is not yet supported in parallel mode.
 	 */
-	if (SxactIsROSafe(MySerializableXact) && !IsInParallelMode())
+	if (SxactIsROSafe(MySerializableXact))
 	{
-		ReleasePredicateLocks(false);
+		ReleasePredicateLocks(false, true);
 		return false;
 	}
 
@@ -1527,14 +1535,14 @@ GetSafeSnapshot(Snapshot origSnapshot)
 		ereport(DEBUG2,
 				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 				 errmsg("deferrable snapshot was unsafe; trying a new one")));
-		ReleasePredicateLocks(false);
+		ReleasePredicateLocks(false, false);
 	}
 
 	/*
 	 * Now we have a safe snapshot, so we don't need to do any further checks.
 	 */
 	Assert(SxactIsROSafe(MySerializableXact));
-	ReleasePredicateLocks(false);
+	ReleasePredicateLocks(false, true);
 
 	return snapshot;
 }
@@ -3261,9 +3269,17 @@ SetNewSxactGlobalXmin(void)
  * If this transaction is committing and is holding any predicate locks,
  * it must be added to a list of completed serializable transactions still
  * holding locks.
+ *
+ * If isReadOnlySafe is true, then predicate locks are being released before
+ * the end of the transaction because MySerializableXact has been determined
+ * to be RO_SAFE.  In non-parallel mode we can release it completely, but it
+ * in parallel mode we partially release the SERIALIZABLEXACT and keep it
+ * around until the end of the transaction, allowing each backend to clear its
+ * MySerializableXact variable and benefit from the optimization in its own
+ * time.
  */
 void
-ReleasePredicateLocks(bool isCommit)
+ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
 {
 	bool		needToClear;
 	RWConflict	conflict,
@@ -3282,22 +3298,93 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
+	/* We can't be both committing and releasing early due to RO_SAFE. */
+	Assert(!(isCommit && isReadOnlySafe));
+
+	/* Are we at the end of a transaction, that is, a commit or abort? */
+	if (!isReadOnlySafe)
+	{
+		/*
+		 * Parallel workers mustn't release predicate locks at the end of
+		 * their transaction.  The leader will do that at the end of its
+		 * transaction.
+		 */
+		if (IsParallelWorker())
+			goto backend_local_cleanup;
+
+		/*
+		 * By the time the leader in a parallel query reaches end of
+		 * transaction, it has waited for all workers to exit.
+		 */
+		Assert(!ParallelContextActive());
+
+		/*
+		 * If the leader in a parallel query earler stashed a partially
+		 * released SERIALIZABLEXACT for final clean-up at end of transaction
+		 * (because workers might still have been accessing it), then it's
+		 * time to restore it.
+		 */
+		if (SavedSerializableXact != InvalidSerializableXact)
+		{
+			Assert(MySerializableXact == InvalidSerializableXact);
+			MySerializableXact = SavedSerializableXact;
+			SavedSerializableXact = InvalidSerializableXact;
+			Assert(SxactIsPartiallyReleased(MySerializableXact));
+		}
+	}
+
 	if (MySerializableXact == InvalidSerializableXact)
 	{
 		Assert(LocalPredicateLockHash == NULL);
 		return;
 	}
 
-	/* Parallel workers mustn't release predicate locks. */
-	if (IsParallelWorker())
-		goto backend_local_cleanup;
-
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
+	/*
+	 * If the transaction is committing, but it has been partially released
+	 * already, then treat this as a roll back.  It was marked as rolled back.
+	 */
+	if (isCommit && SxactIsPartiallyReleased(MySerializableXact))
+		isCommit = false;
+
+	/*
+	 * If we're called in the middle of a transaction because we discovered
+	 * that the SXACT_FLAG_RO_SAFE flag was set, then we'll partially release
+	 * it (that is, release the predicate locks and conflicts, but not the
+	 * SERIALIZABLEXACT itself) if we're the first backend to have noticed.
+	 */
+	if (isReadOnlySafe && IsInParallelMode())
+	{
+		/*
+		 * The leader needs to stash a pointer to it, so that it can
+		 * completely release it at end-of-transaction.
+		 */
+		if (!IsParallelWorker())
+			SavedSerializableXact = MySerializableXact;
+
+		/*
+		 * The first backend to reach this condition will partially release
+		 * the SERIALIZABLEXACT.  All others will just clear their
+		 * backend-local state so that they stop doing SSI checks for the rest
+		 * of the transaction.
+		 */
+		if (SxactIsPartiallyReleased(MySerializableXact))
+		{
+			LWLockRelease(SerializableXactHashLock);
+			goto backend_local_cleanup;
+		}
+		else
+		{
+			MySerializableXact->flags |= SXACT_FLAG_PARTIALLY_RELEASED;
+			/* ... and proceed to perform the partial release below. */
+		}
+	}
 	Assert(!isCommit || SxactIsPrepared(MySerializableXact));
 	Assert(!isCommit || !SxactIsDoomed(MySerializableXact));
 	Assert(!SxactIsCommitted(MySerializableXact));
-	Assert(!SxactIsRolledBack(MySerializableXact));
+	Assert(SxactIsPartiallyReleased(MySerializableXact)
+		   || !SxactIsRolledBack(MySerializableXact));
 
 	/* may not be serializable during COMMIT/ROLLBACK PREPARED */
 	Assert(MySerializableXact->pid == 0 || IsolationIsSerializable());
@@ -3346,7 +3433,8 @@ ReleasePredicateLocks(bool isCommit)
 		 * cleanup. This means it should not be considered when calculating
 		 * SxactGlobalXmin.
 		 */
-		MySerializableXact->flags |= SXACT_FLAG_DOOMED;
+		if (!isReadOnlySafe)
+			MySerializableXact->flags |= SXACT_FLAG_DOOMED;
 		MySerializableXact->flags |= SXACT_FLAG_ROLLED_BACK;
 
 		/*
@@ -3542,7 +3630,8 @@ ReleasePredicateLocks(bool isCommit)
 	 * was launched.
 	 */
 	needToClear = false;
-	if (TransactionIdEquals(MySerializableXact->xmin, PredXact->SxactGlobalXmin))
+	if (!isReadOnlySafe &&
+		TransactionIdEquals(MySerializableXact->xmin, PredXact->SxactGlobalXmin))
 	{
 		Assert(PredXact->SxactGlobalXminCount > 0);
 		if (--(PredXact->SxactGlobalXminCount) == 0)
@@ -3561,8 +3650,16 @@ ReleasePredicateLocks(bool isCommit)
 		SHMQueueInsertBefore(FinishedSerializableTransactions,
 							 &MySerializableXact->finishedLink);
 
+	/*
+	 * If we're releasing a RO_SAFE transaction in parallel mode, we'll only
+	 * partially release it.  That's necessary because other backends may have
+	 * a reference to it.  The leader will release the SERIALIZABLEXACT itself
+	 * at the end of the transaction after workers have stopped running.
+	 */
 	if (!isCommit)
-		ReleaseOneSerializableXact(MySerializableXact, false, false);
+		ReleaseOneSerializableXact(MySerializableXact,
+								   isReadOnlySafe && IsInParallelMode(),
+								   false);
 
 	LWLockRelease(SerializableFinishedListLock);
 
@@ -3761,6 +3858,8 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 	 * them to OldCommittedSxact if summarize is true)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -3840,6 +3939,8 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 	 */
 	SHMQueueInit(&sxact->predicateLocks);
 
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 
 	sxidtag.xid = sxact->topXid;
@@ -4730,6 +4831,7 @@ PreCommit_CheckForSerializationFailure(void)
 	/* Check if someone else has already decided that we need to die */
 	if (SxactIsDoomed(MySerializableXact))
 	{
+		Assert(!SxactIsPartiallyReleased(MySerializableXact));
 		LWLockRelease(SerializableXactHashLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
@@ -4927,7 +5029,7 @@ PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit)
 	MySerializableXact = sxid->myXact;
 	MyXactDidWrite = true;		/* conservatively assume that we wrote
 								 * something */
-	ReleasePredicateLocks(isCommit);
+	ReleasePredicateLocks(isCommit, false);
 }
 
 /*
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index bce021e1001..3e8829887f4 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -562,7 +562,7 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
 			if (owner == TopTransactionResourceOwner)
 			{
 				ProcReleaseLocks(isCommit);
-				ReleasePredicateLocks(isCommit);
+				ReleasePredicateLocks(isCommit, false);
 			}
 		}
 		else
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 23f3acc3ce1..0925270b91e 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -61,7 +61,7 @@ extern void PredicateLockTuple(Relation relation, HeapTuple tuple, Snapshot snap
 extern void PredicateLockPageSplit(Relation relation, BlockNumber oldblkno, BlockNumber newblkno);
 extern void PredicateLockPageCombine(Relation relation, BlockNumber oldblkno, BlockNumber newblkno);
 extern void TransferPredicateLocksToHeapRelation(Relation relation);
-extern void ReleasePredicateLocks(bool isCommit);
+extern void ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe);
 
 /* conflict detection (may also trigger rollback) */
 extern void CheckForSerializableConflictOut(bool valid, Relation relation, HeapTuple tuple,
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 59eb49e57ee..04de63877d5 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -127,6 +127,12 @@ typedef struct SERIALIZABLEXACT
 #define SXACT_FLAG_RO_UNSAFE			0x00000100
 #define SXACT_FLAG_SUMMARY_CONFLICT_IN	0x00000200
 #define SXACT_FLAG_SUMMARY_CONFLICT_OUT 0x00000400
+/*
+ * The following flag means the transaction has been partially released
+ * already, but is being preserved because parallel workers might have a
+ * reference to it.  It'll be recycled by the leader at end-of-transaction.
+ */
+#define SXACT_FLAG_PARTIALLY_RELEASED	0x00000800
 
 /*
  * The following types are used to provide an ad hoc list for holding
-- 
2.17.0

#43

Masahiko Sawada

sawada.mshk@gmail.com

over 7 years ago

In reply to: Thomas Munro (#42)

Re: [HACKERS] SERIALIZABLE with parallel query

On Fri, Jun 29, 2018 at 7:28 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Jun 28, 2018 at 7:55 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'd like to test and review this patches but they seem to conflict
with current HEAD. Could you please rebase them?

Hi Sawada-san,

Thanks! Rebased and attached. The only changes are: the LWLock
tranche is now shown as "serializable_xact" instead of "sxact" (hmm,
LWLock tranches have lower_case_names_with_underscores, but individual
LWLocks have CamelCaseName...), and ShareSerializableXact() no longer
does Assert(!IsParallelWorker()). These changes are based on the last
feedback from Robert.

Thank you! Will look at patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#44

Masahiko Sawada

sawada.mshk@gmail.com

over 7 years ago

In reply to: Masahiko Sawada (#43)

Re: [HACKERS] SERIALIZABLE with parallel query

On Mon, Jul 2, 2018 at 3:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jun 29, 2018 at 7:28 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Jun 28, 2018 at 7:55 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'd like to test and review this patches but they seem to conflict
with current HEAD. Could you please rebase them?

Hi Sawada-san,

Thanks! Rebased and attached. The only changes are: the LWLock
tranche is now shown as "serializable_xact" instead of "sxact" (hmm,
LWLock tranches have lower_case_names_with_underscores, but individual
LWLocks have CamelCaseName...), and ShareSerializableXact() no longer
does Assert(!IsParallelWorker()). These changes are based on the last
feedback from Robert.

Thank you! Will look at patches.

I looked at this patches. The latest patch can build without any
errors and warnings and pass all regression tests. I don't see
critical bugs but there are random comments.

+               /*
+                * If the leader in a parallel query earler stashed a partially
+                * released SERIALIZABLEXACT for final clean-up at end
of transaction
+                * (because workers might still have been accessing
it), then it's
+                * time to restore it.
+                */

There is a typo.
s/earler/earlier/

----
Should we add test to check if write-skew[1]https://en.wikipedia.org/wiki/Snapshot_isolation#Definition anomaly doesn't happen
even in parallel mode?

----
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock

There are LWLock and lightweight lock. Maybe it's better to unify the spelling.

----
@@ -2231,9 +2234,12 @@ PrepareTransaction(void)
        /*
         * Mark serializable transaction as complete for predicate locking
         * purposes.  This should be done as late as we can put it and
still allow
-        * errors to be raised for failure patterns found at commit.
+        * errors to be raised for failure patterns found at commit.
This is not
+        * appropriate for parallel workers however, because we aren't
committing
+        * the leader's transaction and its serializable state will live on.
         */
-       PreCommit_CheckForSerializationFailure();
+       if (!IsParallelWorker())
+               PreCommit_CheckForSerializationFailure();

This code assumes that parallel workers could prepare transaction. Is
that expected behavior of parallel query? There is an assertion
!IsInParallelMode() at the beginning of that function though.

----
+    /*
+     * If the transaction is committing, but it has been partially released
+     * already, then treat this as a roll back.  It was marked as rolled back.
+     */
+    if (isCommit && SxactIsPartiallyReleased(MySerializableXact))
+        isCommit = false;
+

Isn't it better to add an assertion to check if
MySerializableXact->flags has SXACT_FLAG_ROLLED_BACK flag for safety?

[1]: https://en.wikipedia.org/wiki/Snapshot_isolation#Definition

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#45

Kevin Grittner

kgrittn@gmail.com

over 7 years ago

In reply to: Masahiko Sawada (#44)

Re: [HACKERS] SERIALIZABLE with parallel query

After reviewing the thread and the current two patches, I agree with
Masahiko Sawada plus preferring one adjustment to the coding: I would
prefer to break out the majority of the ReleasePredicateLocks function
to a static ReleasePredicateLocksMain (or similar) function and
eliminating the goto.

The optimization in patch 0002 is important. Previous benchmarks
showed a fairly straightforward pgbench test scaled as well as
REPEATABLE READ when it was present, but performance fell off up to
20% as the scale increased without it.

I will spend a few more days in testing and review, but figured I
should pass along "first impressions" now.

On Tue, Jul 10, 2018 at 8:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 2, 2018 at 3:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jun 29, 2018 at 7:28 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Jun 28, 2018 at 7:55 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'd like to test and review this patches but they seem to conflict
with current HEAD. Could you please rebase them?

Hi Sawada-san,

Thanks! Rebased and attached. The only changes are: the LWLock
tranche is now shown as "serializable_xact" instead of "sxact" (hmm,
LWLock tranches have lower_case_names_with_underscores, but individual
LWLocks have CamelCaseName...), and ShareSerializableXact() no longer
does Assert(!IsParallelWorker()). These changes are based on the last
feedback from Robert.

Thank you! Will look at patches.

I looked at this patches. The latest patch can build without any
errors and warnings and pass all regression tests. I don't see
critical bugs but there are random comments.
+               /*
+                * If the leader in a parallel query earler stashed a partially
+                * released SERIALIZABLEXACT for final clean-up at end
of transaction
+                * (because workers might still have been accessing
it), then it's
+                * time to restore it.
+                */
There is a typo.
s/earler/earlier/

----
Should we add test to check if write-skew[1] anomaly doesn't happen
even in parallel mode?
----
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock
There are LWLock and lightweight lock. Maybe it's better to unify the spelling.
----
@@ -2231,9 +2234,12 @@ PrepareTransaction(void)
/*
* Mark serializable transaction as complete for predicate locking
* purposes.  This should be done as late as we can put it and
still allow
-        * errors to be raised for failure patterns found at commit.
+        * errors to be raised for failure patterns found at commit.
This is not
+        * appropriate for parallel workers however, because we aren't
committing
+        * the leader's transaction and its serializable state will live on.
*/
-       PreCommit_CheckForSerializationFailure();
+       if (!IsParallelWorker())
+               PreCommit_CheckForSerializationFailure();
This code assumes that parallel workers could prepare transaction. Is
that expected behavior of parallel query? There is an assertion
!IsInParallelMode() at the beginning of that function though.
----
+    /*
+     * If the transaction is committing, but it has been partially released
+     * already, then treat this as a roll back.  It was marked as rolled back.
+     */
+    if (isCommit && SxactIsPartiallyReleased(MySerializableXact))
+        isCommit = false;
+
Isn't it better to add an assertion to check if
MySerializableXact->flags has SXACT_FLAG_ROLLED_BACK flag for safety?

[1] https://en.wikipedia.org/wiki/Snapshot_isolation#Definition

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

#46

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Kevin Grittner (#45)

Re: [HACKERS] SERIALIZABLE with parallel query

On Wed, Sep 19, 2018 at 04:50:40PM -0500, Kevin Grittner wrote:

I will spend a few more days in testing and review, but figured I
should pass along "first impressions" now.

Kevin, it seems that this patch is pending on your input. I have moved
this patch to next CF for now.
--
Michael

#47

Thomas Munro

thomas.munro@enterprisedb.com

over 7 years ago

In reply to: Kevin Grittner (#45)

2 attachment(s)

Re: [HACKERS] SERIALIZABLE with parallel query

On Thu, Sep 20, 2018 at 9:50 AM Kevin Grittner <kgrittn@gmail.com> wrote:

On Tue, Jul 10, 2018 at 8:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I looked at this patches. The latest patch can build without any
errors and warnings and pass all regression tests. I don't see
critical bugs but there are random comments.

Thanks for the review! And sorry for my delayed response. Here is a
rebased patch, with changes as requested. I have replies also for
Kevin, see further down.

+               /*
+                * If the leader in a parallel query earler stashed a partially
+                * released SERIALIZABLEXACT for final clean-up at end
of transaction
+                * (because workers might still have been accessing
it), then it's
+                * time to restore it.
+                */

There is a typo.
s/earler/earlier/

Fixed.

----
Should we add test to check if write-skew[1] anomaly doesn't happen
even in parallel mode?

I suppose we could find another one of the existing specs that shows
write-skew and adapt it to run a read-only part of the transaction in
a parallel worker, but what would it prove that the proposed new test
doesn't prove already?

----
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring lightweight locks for the predicate lock or lock

There are LWLock and lightweight lock. Maybe it's better to unify the spelling.

Done.

----
@@ -2231,9 +2234,12 @@ PrepareTransaction(void)
/*
* Mark serializable transaction as complete for predicate locking
* purposes.  This should be done as late as we can put it and
still allow
-        * errors to be raised for failure patterns found at commit.
+        * errors to be raised for failure patterns found at commit.
This is not
+        * appropriate for parallel workers however, because we aren't
committing
+        * the leader's transaction and its serializable state will live on.
*/
-       PreCommit_CheckForSerializationFailure();
+       if (!IsParallelWorker())
+               PreCommit_CheckForSerializationFailure();

This code assumes that parallel workers could prepare transaction. Is
that expected behavior of parallel query? There is an assertion
!IsInParallelMode() at the beginning of that function though.

You are right. I made a change exactly like this in
CommitTransaction(), where it is necessary, but then somehow I managed
to apply that hunk to the identical code in PrepareTransaction() also,
where it is not necessary. Fixed.

----
+    /*
+     * If the transaction is committing, but it has been partially released
+     * already, then treat this as a roll back.  It was marked as rolled back.
+     */
+    if (isCommit && SxactIsPartiallyReleased(MySerializableXact))
+        isCommit = false;
+

Isn't it better to add an assertion to check if
MySerializableXact->flags has SXACT_FLAG_ROLLED_BACK flag for safety?

That can't hurt. Added.

It's don't really the fact that the flag contradicts reality here...
but it was already established that the read-only safe optimisation
calls ReleasePredicateLocks(false) which behaves like a rollback and
marks the SERIALIZABLEXACT that way. I don't have a better idea right
now.

On Thu, Sep 20, 2018 at 9:50 AM Kevin Grittner <kgrittn@gmail.com> wrote:

After reviewing the thread and the current two patches, I agree with
Masahiko Sawada plus preferring one adjustment to the coding: I would
prefer to break out the majority of the ReleasePredicateLocks function
to a static ReleasePredicateLocksMain (or similar) function and
eliminating the goto.

Hi Kevin,

Thanks for the review.

How about moving that bit of local-cleanup code from the end of the
function into a new static function ReleasePredicateLocksLocal(), so
that we can call it and then return, instead of the evil "goto"? Done
that way in the attached.

The optimization in patch 0002 is important. Previous benchmarks
showed a fairly straightforward pgbench test scaled as well as
REPEATABLE READ when it was present, but performance fell off up to
20% as the scale increased without it.

I will spend a few more days in testing and review, but figured I
should pass along "first impressions" now.

Thanks!

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Enable-parallel-query-with-SERIALIZABLE-isolatio-v15.patchapplication/octet-stream; name=0001-Enable-parallel-query-with-SERIALIZABLE-isolatio-v15.patchDownload

From 4570aa56ed2de70cf6607f0580096a08bcad36b6 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Fri, 1 Sep 2017 16:54:57 +1200
Subject: [PATCH 1/2] Enable parallel query with SERIALIZABLE isolation.

Previously, the SERIALIZABLE isolation level prevented parallel query from
being used.  Allow the two features to be used together by sharing the
leader's SERIALIZABLEXACT with parallel workers.

Remove the serializable_okay flag added to CreateParallelContext() by commit
9da0cc35284bdbe8d442d732963303ff0e0a40bc, because it's now redundant.

The optimization allowing SSI checks to be skipped after a certain point in
read-only transactions is disabled in parallel mode.  It will be added in
a later commit.

Author: Thomas Munro
Reviewed-By: Haribabu Kommi, Robert Haas, Masahiko Sawada, Kevin Grittner
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 doc/src/sgml/monitoring.sgml                  |   5 +
 doc/src/sgml/parallel.sgml                    |  17 ---
 src/backend/access/nbtree/nbtsort.c           |   2 +-
 src/backend/access/transam/parallel.c         |  18 ++-
 src/backend/access/transam/xact.c             |   7 +-
 src/backend/executor/execParallel.c           |   2 +-
 src/backend/optimizer/plan/planner.c          |  11 +-
 src/backend/storage/lmgr/lwlock.c             |   1 +
 src/backend/storage/lmgr/predicate.c          | 116 ++++++++++++++++--
 src/include/access/parallel.h                 |   3 +-
 src/include/storage/lwlock.h                  |   1 +
 src/include/storage/predicate.h               |   9 ++
 src/include/storage/predicate_internals.h     |   4 +
 .../expected/serializable-parallel-2.out      |  44 +++++++
 .../expected/serializable-parallel.out        |  44 +++++++
 src/test/isolation/isolation_schedule         |   2 +
 .../specs/serializable-parallel-2.spec        |  30 +++++
 .../specs/serializable-parallel.spec          |  48 ++++++++
 18 files changed, 309 insertions(+), 55 deletions(-)
 create mode 100644 src/test/isolation/expected/serializable-parallel-2.out
 create mode 100644 src/test/isolation/expected/serializable-parallel.out
 create mode 100644 src/test/isolation/specs/serializable-parallel-2.spec
 create mode 100644 src/test/isolation/specs/serializable-parallel.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0484cfa77ad..f65cd3f8c91 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -979,6 +979,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting to perform an operation on a list of locks held by
          serializable transactions.</entry>
         </row>
+        <row>
+         <entry><literal>serializable_xact</literal></entry>
+         <entry>Waiting to perform an operation on a serializable transaction
+         in a parallel query.</entry>
+        </row>
         <row>
          <entry><literal>OldSerXidLock</literal></entry>
          <entry>Waiting to read or record conflicting serializable
diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index 1005e9fef4d..b0b03c54e5f 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -184,13 +184,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -233,16 +226,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         that may be suboptimal when run serially.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 16f57557776..c4e1721e553 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1255,7 +1255,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	EnterParallelMode();
 	Assert(request > 0);
 	pcxt = CreateParallelContext("postgres", "_bt_parallel_build_main",
-								 request, true);
+								 request);
 	scantuplesortstates = leaderparticipates ? request + 1 : request;
 
 	/*
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index cdaa32e29a4..bc86e9175d6 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -30,6 +30,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -87,6 +88,7 @@ typedef struct FixedParallelState
 	PGPROC	   *parallel_master_pgproc;
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
+	SerializableXactHandle serializable_xact_handle;
 
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
@@ -151,7 +153,7 @@ static void ParallelWorkerShutdown(int code, Datum arg);
  */
 ParallelContext *
 CreateParallelContext(const char *library_name, const char *function_name,
-					  int nworkers, bool serializable_okay)
+					  int nworkers)
 {
 	MemoryContext oldcontext;
 	ParallelContext *pcxt;
@@ -162,16 +164,6 @@ CreateParallelContext(const char *library_name, const char *function_name,
 	/* Number of workers should be non-negative. */
 	Assert(nworkers >= 0);
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.  Utility statement callers may ask us to ignore this
-	 * restriction because they're always able to safely ignore the fact that
-	 * SIREAD locks do not work with parallelism.
-	 */
-	if (IsolationIsSerializable() && !serializable_okay)
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -318,6 +310,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_pgproc = MyProc;
 	fps->parallel_master_pid = MyProcPid;
 	fps->parallel_master_backend_id = MyBackendId;
+	fps->serializable_xact_handle = ShareSerializableXact();
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_FIXED, fps);
@@ -1397,6 +1390,9 @@ ParallelWorkerMain(Datum main_arg)
 	relmapperspace = shm_toc_lookup(toc, PARALLEL_KEY_RELMAPPER_STATE, false);
 	RestoreRelationMap(relmapperspace);
 
+	/* Attach to the leader's serializable transaction, if SERIALIZABLE. */
+	AttachSerializableXact(fps->serializable_xact_handle);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 875be180fe4..a8b46f4461f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1984,9 +1984,12 @@ CommitTransaction(void)
 	/*
 	 * Mark serializable transaction as complete for predicate locking
 	 * purposes.  This should be done as late as we can put it and still allow
-	 * errors to be raised for failure patterns found at commit.
+	 * errors to be raised for failure patterns found at commit.  This is not
+	 * appropriate in a parallel worker however, because we aren't committing
+	 * the leader's transaction and its serializable state will live on.
 	 */
-	PreCommit_CheckForSerializationFailure();
+	if (!is_parallel_worker)
+		PreCommit_CheckForSerializationFailure();
 
 	/*
 	 * Insert notifications sent by NOTIFY commands into the queue.  This
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 7d8bd01994f..ac32d83b8a6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -607,7 +607,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pstmt_data = ExecSerializePlan(planstate->plan, estate);
 
 	/* Create a parallel context. */
-	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers, false);
+	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
 	pei->pcxt = pcxt;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 89625f4f5b1..a1f23f611d2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -335,22 +335,13 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index a6fda81feb6..3b47eb057f6 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -521,6 +521,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN, "parallel_hash_join");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "serializable_xact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index e8390311d03..a10b5dda86a 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLEXACT's member 'predicateLockListLock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	PredicateLockHashPartitionLock(hashcode)
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target.
@@ -186,6 +194,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -465,6 +474,8 @@ static void CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag);
 static void FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer);
 static void OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
 										SERIALIZABLEXACT *writer);
+static void CreateLocalPredicateLockHash(void);
+static void ReleasePredicateLocksLocal(void);
 
 
 /*------------------------------------------------------------------------*/
@@ -518,8 +529,10 @@ SerializationNeededForRead(Relation relation, Snapshot snapshot)
 	 * A transaction is flagged as RO_SAFE if all concurrent R/W transactions
 	 * commit without having conflicts out to an earlier snapshot, thus
 	 * ensuring that no conflicts are possible for this transaction.
+	 *
+	 * This optimization is not yet supported in parallel mode.
 	 */
-	if (SxactIsROSafe(MySerializableXact))
+	if (SxactIsROSafe(MySerializableXact) && !IsInParallelMode())
 	{
 		ReleasePredicateLocks(false);
 		return false;
@@ -1168,6 +1181,8 @@ InitPredicateLocks(void)
 		memset(PredXact->element, 0, requestSize);
 		for (i = 0; i < max_table_size; i++)
 		{
+			LWLockInitialize(&PredXact->element[i].sxact.predicateLockListLock,
+							 LWTRANCHE_SXACT);
 			SHMQueueInsertBefore(&(PredXact->availableList),
 								 &(PredXact->element[i].link));
 		}
@@ -1633,6 +1648,17 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
 {
 	Assert(IsolationIsSerializable());
 
+	/*
+	 * If this is called by parallel.c in a parallel worker, we don't want to
+	 * create a SERIALIZABLEXACT just yet because the leader's
+	 * SERIALIZABLEXACT will be installed with AttachSerializableXact().  We
+	 * also don't want to reject SERIALIZABLE READ ONLY DEFERRABLE in this
+	 * case, because the leader has already determined that the snapshot it
+	 * has passed us is safe.  So there is nothing for us to do.
+	 */
+	if (IsParallelWorker())
+		return;
+
 	/*
 	 * We do not allow SERIALIZABLE READ ONLY DEFERRABLE transactions to
 	 * import snapshots, since there's no way to wait for a safe snapshot when
@@ -1666,7 +1692,6 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	VirtualTransactionId vxid;
 	SERIALIZABLEXACT *sxact,
 			   *othersxact;
-	HASHCTL		hash_ctl;
 
 	/* We only do this for serializable transactions.  Once. */
 	Assert(MySerializableXact == InvalidSerializableXact);
@@ -1813,6 +1838,16 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 
 	LWLockRelease(SerializableXactHashLock);
 
+	CreateLocalPredicateLockHash();
+
+	return snapshot;
+}
+
+static void
+CreateLocalPredicateLockHash(void)
+{
+	HASHCTL		hash_ctl;
+
 	/* Initialize the backend-local hash table of parent locks */
 	Assert(LocalPredicateLockHash == NULL);
 	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
@@ -1822,8 +1857,6 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 										 max_predicate_locks_per_xact,
 										 &hash_ctl,
 										 HASH_ELEM | HASH_BLOBS);
-
-	return snapshot;
 }
 
 /*
@@ -2078,7 +2111,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring LWLocks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2091,6 +2126,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2144,6 +2181,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2342,6 +2381,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2379,6 +2420,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2566,7 +2609,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2626,7 +2670,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2641,7 +2685,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3244,6 +3289,13 @@ ReleasePredicateLocks(bool isCommit)
 		return;
 	}
 
+	/* Parallel workers mustn't release predicate locks. */
+	if (IsParallelWorker())
+	{
+		ReleasePredicateLocksLocal();
+		return;
+	}
+
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
 	Assert(!isCommit || SxactIsPrepared(MySerializableXact));
@@ -3273,8 +3325,8 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact->finishedBefore = ShmemVariableCache->nextXid;
 
 	/*
-	 * If it's not a commit it's a rollback, and we can clear our locks
-	 * immediately.
+	 * If it's not a commit it's either a rollback or a read-only transaction
+	 * flagged SXACT_FLAG_RO_SAFE, and we can clear our locks immediately.
 	 */
 	if (isCommit)
 	{
@@ -3521,6 +3573,12 @@ ReleasePredicateLocks(bool isCommit)
 	if (needToClear)
 		ClearOldPredicateLocks();
 
+	ReleasePredicateLocksLocal();
+}
+
+static void
+ReleasePredicateLocksLocal(void)
+{
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
@@ -4213,6 +4271,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->predicateLockListLock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4247,6 +4307,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->predicateLockListLock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4795,6 +4857,13 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->predicateLockListLock in parallel mode because
+	 * there cannot be any parallel workers running while we are preparing a
+	 * transaction.
+	 */
+	Assert(!IsParallelWorker() && !ParallelContextActive());
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -5003,3 +5072,28 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Prepare to share the current SERIALIZABLEXACT with parallel workers.
+ * Return a handle object that can be used by AttachSerializableXact() in a
+ * parallel worker.
+ */
+SerializableXactHandle
+ShareSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+AttachSerializableXact(SerializableXactHandle handle)
+{
+
+	Assert(MySerializableXact == InvalidSerializableXact);
+
+	MySerializableXact = (SERIALIZABLEXACT *) handle;
+	if (MySerializableXact != InvalidSerializableXact)
+		CreateLocalPredicateLockHash();
+}
diff --git a/src/include/access/parallel.h b/src/include/access/parallel.h
index 025691fd82d..45e7fbb43f8 100644
--- a/src/include/access/parallel.h
+++ b/src/include/access/parallel.h
@@ -60,8 +60,7 @@ extern PGDLLIMPORT bool InitializingParallelWorker;
 #define		IsParallelWorker()		(ParallelWorkerNumber >= 0)
 
 extern ParallelContext *CreateParallelContext(const char *library_name,
-					  const char *function_name, int nworkers,
-					  bool serializable_okay);
+					  const char *function_name, int nworkers);
 extern void InitializeParallelDSM(ParallelContext *pcxt);
 extern void ReinitializeParallelDSM(ParallelContext *pcxt);
 extern void LaunchParallelWorkers(ParallelContext *pcxt);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index c21bfe2f666..b25c43fc6be 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SHARED_TUPLESTORE,
 	LWTRANCHE_TBM,
 	LWTRANCHE_PARALLEL_APPEND,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 6a3464daa1e..23f3acc3ce1 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -30,6 +30,11 @@ extern int	max_predicate_locks_per_page;
 /* Number of SLRU buffers to use for predicate locking */
 #define NUM_OLDSERXID_BUFFERS	16
 
+/*
+ * A handle used for sharing SERIALIZABLEXACT objects between the participants
+ * in a parallel query.
+ */
+typedef void *SerializableXactHandle;
 
 /*
  * function prototypes
@@ -74,4 +79,8 @@ extern void PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit);
 extern void predicatelock_twophase_recover(TransactionId xid, uint16 info,
 							   void *recdata, uint32 len);
 
+/* parallel query support */
+extern SerializableXactHandle ShareSerializableXact(void);
+extern void AttachSerializableXact(SerializableXactHandle handle);
+
 #endif							/* PREDICATE_H */
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 0f736d37dff..59eb49e57ee 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	LWLock		predicateLockListLock;	/* protects predicateLocks in parallel
+										 * mode */
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
diff --git a/src/test/isolation/expected/serializable-parallel-2.out b/src/test/isolation/expected/serializable-parallel-2.out
new file mode 100644
index 00000000000..9a693c4dc62
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel-2.out
@@ -0,0 +1,44 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1r s2r1 s1c s2r2 s2c
+step s1r: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s2r1: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s1c: COMMIT;
+step s2r2: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s2c: COMMIT;
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 00000000000..f43aa6a2990
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index c23b401225d..61b0835854b 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -78,3 +78,5 @@ test: partition-key-update-3
 test: partition-key-update-4
 test: plpgsql-toast
 test: truncate-conflict
+test: serializable-parallel
+test: serializable-parallel-2
diff --git a/src/test/isolation/specs/serializable-parallel-2.spec b/src/test/isolation/specs/serializable-parallel-2.spec
new file mode 100644
index 00000000000..7f90f75d882
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel-2.spec
@@ -0,0 +1,30 @@
+# Exercise the case where a read-only serializable transaction has
+# SXACT_FLAG_RO_SAFE set in a parallel query.
+
+setup
+{
+	CREATE TABLE foo AS SELECT generate_series(1, 10)::int a;
+	ALTER TABLE foo SET (parallel_workers = 2);
+}
+
+teardown
+{
+	DROP TABLE foo;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1r"	{ SELECT * FROM foo; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY;
+			  SET parallel_setup_cost = 0;
+			  SET parallel_tuple_cost = 0;
+			}
+step "s2r1"	{ SELECT * FROM foo; }
+step "s2r2"	{ SELECT * FROM foo; }
+step "s2c"	{ COMMIT; }
+
+permutation "s1r" "s2r1" "s1c" "s2r2" "s2c"
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 00000000000..0e7c2c7c1fa
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"
-- 
2.17.1 (Apple Git-112)

0002-Enable-the-read-only-SERIALIZABLE-optimization-f-v15.patchapplication/octet-stream; name=0002-Enable-the-read-only-SERIALIZABLE-optimization-f-v15.patchDownload

From 7999b4ae4274bd787b825a92414de54a11ed3f90 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Sun, 25 Feb 2018 23:45:09 +1300
Subject: [PATCH 2/2] Enable the read-only SERIALIZABLE optimization for
 parallel query.

A SERIALIZABLEXACT can be marked as SXACT_FLAG_RO_SAFE by a concurrent session,
meaning that it is safe to throw away this SERIALIZABLEXACT and start behaving
like a REPEATABLE READ transaction.  The problem is that the leader and workers
are sharing the same SERIALIZABLEXACT so this must be coordinated carefully.
This commit solves that problem as follows:

The first backend to observe the SXACT_FLAG_RO_SAFE flag will 'partially
release' it, meaning that the conflicts and locks it holds can be released, but
the SERIALIZABLEXACT itself will remain active because other backends might
have a pointer to it.

Whenever any backend notices the SXACT_FLAG_RO_SAFE flag, it clears its own
MySerializableXact variable so that it can skip SSI checks for the rest of the
transaction.  In the special case of the leader process, it transfers the
SERIALIZABLEXACT to a new variable SavedSerializableXact, so that it can be
completely released at the end of the transaction after all workers have
exited.

Author: Thomas Munro
Reviewed-by: Kevin Grittner, Masahiko Sawada
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 src/backend/storage/lmgr/predicate.c      | 141 +++++++++++++++++++---
 src/backend/utils/resowner/resowner.c     |   2 +-
 src/include/storage/predicate.h           |   2 +-
 src/include/storage/predicate_internals.h |   6 +
 4 files changed, 130 insertions(+), 21 deletions(-)

diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index a10b5dda86a..200d968cd5c 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -170,7 +170,7 @@
  *		PredicateLockPageCombine(Relation relation, BlockNumber oldblkno,
  *								 BlockNumber newblkno)
  *		TransferPredicateLocksToHeapRelation(Relation relation)
- *		ReleasePredicateLocks(bool isCommit)
+ *		ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
  *
  * conflict detection (may also trigger rollback)
  *		CheckForSerializableConflictOut(bool visible, Relation relation,
@@ -288,6 +288,7 @@
 #define SxactIsDeferrableWaiting(sxact) (((sxact)->flags & SXACT_FLAG_DEFERRABLE_WAITING) != 0)
 #define SxactIsROSafe(sxact) (((sxact)->flags & SXACT_FLAG_RO_SAFE) != 0)
 #define SxactIsROUnsafe(sxact) (((sxact)->flags & SXACT_FLAG_RO_UNSAFE) != 0)
+#define SxactIsPartiallyReleased(sxact) (((sxact)->flags & SXACT_FLAG_PARTIALLY_RELEASED) != 0)
 
 /*
  * Compute the hash code associated with a PREDICATELOCKTARGETTAG.
@@ -418,6 +419,15 @@ static HTAB *LocalPredicateLockHash = NULL;
 static SERIALIZABLEXACT *MySerializableXact = InvalidSerializableXact;
 static bool MyXactDidWrite = false;
 
+/*
+ * The SXACT_FLAG_RO_UNSAFE optimization might lead us to release
+ * MySerializableXact early.  If that happens in a parallel query, the leader
+ * needs to defer the destruction of the SERIALIZABLEXACT until end of
+ * transaction, because the workers still have a reference to it.  In that
+ * case, the leader stores it here.
+ */
+static SERIALIZABLEXACT *SavedSerializableXact = InvalidSerializableXact;
+
 /* local functions */
 
 static SERIALIZABLEXACT *CreatePredXact(void);
@@ -529,12 +539,10 @@ SerializationNeededForRead(Relation relation, Snapshot snapshot)
 	 * A transaction is flagged as RO_SAFE if all concurrent R/W transactions
 	 * commit without having conflicts out to an earlier snapshot, thus
 	 * ensuring that no conflicts are possible for this transaction.
-	 *
-	 * This optimization is not yet supported in parallel mode.
 	 */
-	if (SxactIsROSafe(MySerializableXact) && !IsInParallelMode())
+	if (SxactIsROSafe(MySerializableXact))
 	{
-		ReleasePredicateLocks(false);
+		ReleasePredicateLocks(false, true);
 		return false;
 	}
 
@@ -1528,14 +1536,14 @@ GetSafeSnapshot(Snapshot origSnapshot)
 		ereport(DEBUG2,
 				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 				 errmsg("deferrable snapshot was unsafe; trying a new one")));
-		ReleasePredicateLocks(false);
+		ReleasePredicateLocks(false, false);
 	}
 
 	/*
 	 * Now we have a safe snapshot, so we don't need to do any further checks.
 	 */
 	Assert(SxactIsROSafe(MySerializableXact));
-	ReleasePredicateLocks(false);
+	ReleasePredicateLocks(false, true);
 
 	return snapshot;
 }
@@ -3262,9 +3270,17 @@ SetNewSxactGlobalXmin(void)
  * If this transaction is committing and is holding any predicate locks,
  * it must be added to a list of completed serializable transactions still
  * holding locks.
+ *
+ * If isReadOnlySafe is true, then predicate locks are being released before
+ * the end of the transaction because MySerializableXact has been determined
+ * to be RO_SAFE.  In non-parallel mode we can release it completely, but it
+ * in parallel mode we partially release the SERIALIZABLEXACT and keep it
+ * around until the end of the transaction, allowing each backend to clear its
+ * MySerializableXact variable and benefit from the optimization in its own
+ * time.
  */
 void
-ReleasePredicateLocks(bool isCommit)
+ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
 {
 	bool		needToClear;
 	RWConflict	conflict,
@@ -3283,25 +3299,97 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
-	if (MySerializableXact == InvalidSerializableXact)
+	/* We can't be both committing and releasing early due to RO_SAFE. */
+	Assert(!(isCommit && isReadOnlySafe));
+
+	/* Are we at the end of a transaction, that is, a commit or abort? */
+	if (!isReadOnlySafe)
 	{
-		Assert(LocalPredicateLockHash == NULL);
-		return;
+		/*
+		 * Parallel workers mustn't release predicate locks at the end of
+		 * their transaction.  The leader will do that at the end of its
+		 * transaction.
+		 */
+		if (IsParallelWorker())
+		{
+			ReleasePredicateLocksLocal();
+			return;
+		}
+
+		/*
+		 * By the time the leader in a parallel query reaches end of
+		 * transaction, it has waited for all workers to exit.
+		 */
+		Assert(!ParallelContextActive());
+
+		/*
+		 * If the leader in a parallel query earlier stashed a partially
+		 * released SERIALIZABLEXACT for final clean-up at end of transaction
+		 * (because workers might still have been accessing it), then it's
+		 * time to restore it.
+		 */
+		if (SavedSerializableXact != InvalidSerializableXact)
+		{
+			Assert(MySerializableXact == InvalidSerializableXact);
+			MySerializableXact = SavedSerializableXact;
+			SavedSerializableXact = InvalidSerializableXact;
+			Assert(SxactIsPartiallyReleased(MySerializableXact));
+		}
 	}
 
-	/* Parallel workers mustn't release predicate locks. */
-	if (IsParallelWorker())
+	if (MySerializableXact == InvalidSerializableXact)
 	{
-		ReleasePredicateLocksLocal();
+		Assert(LocalPredicateLockHash == NULL);
 		return;
 	}
 
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
+	/*
+	 * If the transaction is committing, but it has been partially released
+	 * already, then treat this as a roll back.  It was marked as rolled back.
+	 */
+	if (isCommit && SxactIsPartiallyReleased(MySerializableXact))
+		isCommit = false;
+
+	/*
+	 * If we're called in the middle of a transaction because we discovered
+	 * that the SXACT_FLAG_RO_SAFE flag was set, then we'll partially release
+	 * it (that is, release the predicate locks and conflicts, but not the
+	 * SERIALIZABLEXACT itself) if we're the first backend to have noticed.
+	 */
+	if (isReadOnlySafe && IsInParallelMode())
+	{
+		/*
+		 * The leader needs to stash a pointer to it, so that it can
+		 * completely release it at end-of-transaction.
+		 */
+		if (!IsParallelWorker())
+			SavedSerializableXact = MySerializableXact;
+
+		/*
+		 * The first backend to reach this condition will partially release
+		 * the SERIALIZABLEXACT.  All others will just clear their
+		 * backend-local state so that they stop doing SSI checks for the rest
+		 * of the transaction.
+		 */
+		if (SxactIsPartiallyReleased(MySerializableXact))
+		{
+			LWLockRelease(SerializableXactHashLock);
+			ReleasePredicateLocksLocal();
+			return;
+		}
+		else
+		{
+			MySerializableXact->flags |= SXACT_FLAG_PARTIALLY_RELEASED;
+			/* ... and proceed to perform the partial release below. */
+		}
+	}
 	Assert(!isCommit || SxactIsPrepared(MySerializableXact));
 	Assert(!isCommit || !SxactIsDoomed(MySerializableXact));
 	Assert(!SxactIsCommitted(MySerializableXact));
-	Assert(!SxactIsRolledBack(MySerializableXact));
+	Assert(SxactIsPartiallyReleased(MySerializableXact)
+		   || !SxactIsRolledBack(MySerializableXact));
 
 	/* may not be serializable during COMMIT/ROLLBACK PREPARED */
 	Assert(MySerializableXact->pid == 0 || IsolationIsSerializable());
@@ -3350,7 +3438,8 @@ ReleasePredicateLocks(bool isCommit)
 		 * cleanup. This means it should not be considered when calculating
 		 * SxactGlobalXmin.
 		 */
-		MySerializableXact->flags |= SXACT_FLAG_DOOMED;
+		if (!isReadOnlySafe)
+			MySerializableXact->flags |= SXACT_FLAG_DOOMED;
 		MySerializableXact->flags |= SXACT_FLAG_ROLLED_BACK;
 
 		/*
@@ -3546,7 +3635,8 @@ ReleasePredicateLocks(bool isCommit)
 	 * was launched.
 	 */
 	needToClear = false;
-	if (TransactionIdEquals(MySerializableXact->xmin, PredXact->SxactGlobalXmin))
+	if (!isReadOnlySafe &&
+		TransactionIdEquals(MySerializableXact->xmin, PredXact->SxactGlobalXmin))
 	{
 		Assert(PredXact->SxactGlobalXminCount > 0);
 		if (--(PredXact->SxactGlobalXminCount) == 0)
@@ -3565,8 +3655,16 @@ ReleasePredicateLocks(bool isCommit)
 		SHMQueueInsertBefore(FinishedSerializableTransactions,
 							 &MySerializableXact->finishedLink);
 
+	/*
+	 * If we're releasing a RO_SAFE transaction in parallel mode, we'll only
+	 * partially release it.  That's necessary because other backends may have
+	 * a reference to it.  The leader will release the SERIALIZABLEXACT itself
+	 * at the end of the transaction after workers have stopped running.
+	 */
 	if (!isCommit)
-		ReleaseOneSerializableXact(MySerializableXact, false, false);
+		ReleaseOneSerializableXact(MySerializableXact,
+								   isReadOnlySafe && IsInParallelMode(),
+								   false);
 
 	LWLockRelease(SerializableFinishedListLock);
 
@@ -3770,6 +3868,8 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 	 * them to OldCommittedSxact if summarize is true)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -3849,6 +3949,8 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 	 */
 	SHMQueueInit(&sxact->predicateLocks);
 
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 
 	sxidtag.xid = sxact->topXid;
@@ -4739,6 +4841,7 @@ PreCommit_CheckForSerializationFailure(void)
 	/* Check if someone else has already decided that we need to die */
 	if (SxactIsDoomed(MySerializableXact))
 	{
+		Assert(!SxactIsPartiallyReleased(MySerializableXact));
 		LWLockRelease(SerializableXactHashLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
@@ -4936,7 +5039,7 @@ PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit)
 	MySerializableXact = sxid->myXact;
 	MyXactDidWrite = true;		/* conservatively assume that we wrote
 								 * something */
-	ReleasePredicateLocks(isCommit);
+	ReleasePredicateLocks(isCommit, false);
 }
 
 /*
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index 211833da02c..74f80a0942a 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -565,7 +565,7 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
 			if (owner == TopTransactionResourceOwner)
 			{
 				ProcReleaseLocks(isCommit);
-				ReleasePredicateLocks(isCommit);
+				ReleasePredicateLocks(isCommit, false);
 			}
 		}
 		else
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 23f3acc3ce1..0925270b91e 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -61,7 +61,7 @@ extern void PredicateLockTuple(Relation relation, HeapTuple tuple, Snapshot snap
 extern void PredicateLockPageSplit(Relation relation, BlockNumber oldblkno, BlockNumber newblkno);
 extern void PredicateLockPageCombine(Relation relation, BlockNumber oldblkno, BlockNumber newblkno);
 extern void TransferPredicateLocksToHeapRelation(Relation relation);
-extern void ReleasePredicateLocks(bool isCommit);
+extern void ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe);
 
 /* conflict detection (may also trigger rollback) */
 extern void CheckForSerializableConflictOut(bool valid, Relation relation, HeapTuple tuple,
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 59eb49e57ee..04de63877d5 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -127,6 +127,12 @@ typedef struct SERIALIZABLEXACT
 #define SXACT_FLAG_RO_UNSAFE			0x00000100
 #define SXACT_FLAG_SUMMARY_CONFLICT_IN	0x00000200
 #define SXACT_FLAG_SUMMARY_CONFLICT_OUT 0x00000400
+/*
+ * The following flag means the transaction has been partially released
+ * already, but is being preserved because parallel workers might have a
+ * reference to it.  It'll be recycled by the leader at end-of-transaction.
+ */
+#define SXACT_FLAG_PARTIALLY_RELEASED	0x00000800
 
 /*
  * The following types are used to provide an ad hoc list for holding
-- 
2.17.1 (Apple Git-112)

#48

Thomas Munro

thomas.munro@enterprisedb.com

over 7 years ago

In reply to: Thomas Munro (#47)

2 attachment(s)

Re: [HACKERS] SERIALIZABLE with parallel query

On Tue, Oct 2, 2018 at 4:53 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Thanks for the review! And sorry for my delayed response. Here is a
rebased patch, with changes as requested.

Rebased.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Enable-parallel-query-with-SERIALIZABLE-isolatio-v16.patchapplication/octet-stream; name=0001-Enable-parallel-query-with-SERIALIZABLE-isolatio-v16.patchDownload

From c01b6c2d997d54f8b1f46ef3602538eb70ebb68a Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Fri, 1 Sep 2017 16:54:57 +1200
Subject: [PATCH 1/2] Enable parallel query with SERIALIZABLE isolation.

Previously, the SERIALIZABLE isolation level prevented parallel query from
being used.  Allow the two features to be used together by sharing the
leader's SERIALIZABLEXACT with parallel workers.

Remove the serializable_okay flag added to CreateParallelContext() by commit
9da0cc35284bdbe8d442d732963303ff0e0a40bc, because it's now redundant.

The optimization allowing SSI checks to be skipped after a certain point in
read-only transactions is disabled in parallel mode.  It will be added in
a later commit.

Author: Thomas Munro
Reviewed-By: Haribabu Kommi, Robert Haas, Masahiko Sawada, Kevin Grittner
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 doc/src/sgml/monitoring.sgml                  |   5 +
 doc/src/sgml/parallel.sgml                    |  17 ---
 src/backend/access/nbtree/nbtsort.c           |   2 +-
 src/backend/access/transam/parallel.c         |  18 ++-
 src/backend/access/transam/xact.c             |   7 +-
 src/backend/executor/execParallel.c           |   2 +-
 src/backend/optimizer/plan/planner.c          |  11 +-
 src/backend/storage/lmgr/lwlock.c             |   1 +
 src/backend/storage/lmgr/predicate.c          | 116 ++++++++++++++++--
 src/include/access/parallel.h                 |   3 +-
 src/include/storage/lwlock.h                  |   1 +
 src/include/storage/predicate.h               |   9 ++
 src/include/storage/predicate_internals.h     |   4 +
 .../expected/serializable-parallel-2.out      |  44 +++++++
 .../expected/serializable-parallel.out        |  44 +++++++
 src/test/isolation/isolation_schedule         |   2 +
 .../specs/serializable-parallel-2.spec        |  30 +++++
 .../specs/serializable-parallel.spec          |  48 ++++++++
 18 files changed, 309 insertions(+), 55 deletions(-)
 create mode 100644 src/test/isolation/expected/serializable-parallel-2.out
 create mode 100644 src/test/isolation/expected/serializable-parallel.out
 create mode 100644 src/test/isolation/specs/serializable-parallel-2.spec
 create mode 100644 src/test/isolation/specs/serializable-parallel.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0484cfa77ad..f65cd3f8c91 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -979,6 +979,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting to perform an operation on a list of locks held by
          serializable transactions.</entry>
         </row>
+        <row>
+         <entry><literal>serializable_xact</literal></entry>
+         <entry>Waiting to perform an operation on a serializable transaction
+         in a parallel query.</entry>
+        </row>
         <row>
          <entry><literal>OldSerXidLock</literal></entry>
          <entry>Waiting to read or record conflicting serializable
diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index 1005e9fef4d..b0b03c54e5f 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -184,13 +184,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         using a very large number of processes.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This is
-        a limitation of the current implementation.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -233,16 +226,6 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
         that may be suboptimal when run serially.
       </para>
     </listitem>
-
-    <listitem>
-      <para>
-        The transaction isolation level is serializable.  This situation
-        does not normally arise, because parallel query plans are not
-        generated when the transaction isolation level is serializable.
-        However, it can happen if the transaction isolation level is changed to
-        serializable after the plan is generated and before it is executed.
-      </para>
-    </listitem>
   </itemizedlist>
  </sect1>
 
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 16f57557776..c4e1721e553 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1255,7 +1255,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	EnterParallelMode();
 	Assert(request > 0);
 	pcxt = CreateParallelContext("postgres", "_bt_parallel_build_main",
-								 request, true);
+								 request);
 	scantuplesortstates = leaderparticipates ? request + 1 : request;
 
 	/*
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 84197192ec2..d228fc870fe 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -31,6 +31,7 @@
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "storage/ipc.h"
+#include "storage/predicate.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -91,6 +92,7 @@ typedef struct FixedParallelState
 	BackendId	parallel_master_backend_id;
 	TimestampTz xact_ts;
 	TimestampTz stmt_ts;
+	SerializableXactHandle serializable_xact_handle;
 
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
@@ -155,7 +157,7 @@ static void ParallelWorkerShutdown(int code, Datum arg);
  */
 ParallelContext *
 CreateParallelContext(const char *library_name, const char *function_name,
-					  int nworkers, bool serializable_okay)
+					  int nworkers)
 {
 	MemoryContext oldcontext;
 	ParallelContext *pcxt;
@@ -166,16 +168,6 @@ CreateParallelContext(const char *library_name, const char *function_name,
 	/* Number of workers should be non-negative. */
 	Assert(nworkers >= 0);
 
-	/*
-	 * If we are running under serializable isolation, we can't use parallel
-	 * workers, at least not until somebody enhances that mechanism to be
-	 * parallel-aware.  Utility statement callers may ask us to ignore this
-	 * restriction because they're always able to safely ignore the fact that
-	 * SIREAD locks do not work with parallelism.
-	 */
-	if (IsolationIsSerializable() && !serializable_okay)
-		nworkers = 0;
-
 	/* We might be running in a short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
 
@@ -327,6 +319,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	fps->parallel_master_backend_id = MyBackendId;
 	fps->xact_ts = GetCurrentTransactionStartTimestamp();
 	fps->stmt_ts = GetCurrentStatementStartTimestamp();
+	fps->serializable_xact_handle = ShareSerializableXact();
 	SpinLockInit(&fps->mutex);
 	fps->last_xlog_end = 0;
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_FIXED, fps);
@@ -1426,6 +1419,9 @@ ParallelWorkerMain(Datum main_arg)
 										false);
 	RestoreEnumBlacklist(enumblacklistspace);
 
+	/* Attach to the leader's serializable transaction, if SERIALIZABLE. */
+	AttachSerializableXact(fps->serializable_xact_handle);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 6cd00d9aaaf..d4ddfc7a348 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2019,9 +2019,12 @@ CommitTransaction(void)
 	/*
 	 * Mark serializable transaction as complete for predicate locking
 	 * purposes.  This should be done as late as we can put it and still allow
-	 * errors to be raised for failure patterns found at commit.
+	 * errors to be raised for failure patterns found at commit.  This is not
+	 * appropriate in a parallel worker however, because we aren't committing
+	 * the leader's transaction and its serializable state will live on.
 	 */
-	PreCommit_CheckForSerializationFailure();
+	if (!is_parallel_worker)
+		PreCommit_CheckForSerializationFailure();
 
 	/*
 	 * Insert notifications sent by NOTIFY commands into the queue.  This
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 13ef232d39b..e6e36934efa 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -606,7 +606,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	pstmt_data = ExecSerializePlan(planstate->plan, estate);
 
 	/* Create a parallel context. */
-	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers, false);
+	pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
 	pei->pcxt = pcxt;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c729a99f8b6..ee0d68df712 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -334,22 +334,13 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
 	 * trying to create their own parallel workers.
-	 *
-	 * We can't use parallelism in serializable mode because the predicate
-	 * locking code is not parallel-aware.  It's not catastrophic if someone
-	 * tries to run a parallel plan in serializable mode; it just won't get
-	 * any workers and will run serially.  But it seems like a good heuristic
-	 * to assume that the same serialization level will be in effect at plan
-	 * time and execution time, so don't generate a parallel plan if we're in
-	 * serializable mode.
 	 */
 	if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
 		IsUnderPostmaster &&
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		!IsParallelWorker() &&
-		!IsolationIsSerializable())
+		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */
 		glob->maxParallelHazard = max_parallel_hazard(parse);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index a6fda81feb6..3b47eb057f6 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -521,6 +521,7 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN, "parallel_hash_join");
+	LWLockRegisterTranche(LWTRANCHE_SXACT, "serializable_xact");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index e8390311d03..a10b5dda86a 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -97,7 +97,9 @@
  *		- All transactions share this single lock (with no partitioning).
  *		- There is never a need for a process other than the one running
  *			an active transaction to walk the list of locks held by that
- *			transaction.
+ *			transaction, except parallel query workers sharing the leader's
+ *			transaction.  In the parallel case, an extra per-sxact lock is
+ *			taken; see below.
  *		- It is relatively infrequent that another process needs to
  *			modify the list for a transaction, but it does happen for such
  *			things as index page splits for pages with predicate locks and
@@ -116,6 +118,12 @@
  *			than its own active transaction must acquire an exclusive
  *			lock.
  *
+ *	SERIALIZABLEXACT's member 'predicateLockListLock'
+ *		- Protects the linked list of locks held by a transaction.  Only
+ *			needed for parallel mode, where multiple backends share the
+ *			same SERIALIZABLEXACT object.  Not needed if
+ *			SerializablePredicateLockListLock is held exclusively.
+ *
  *	PredicateLockHashPartitionLock(hashcode)
  *		- The same lock protects a target, all locks on that target, and
  *			the linked list of locks on the target.
@@ -186,6 +194,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -465,6 +474,8 @@ static void CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag);
 static void FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer);
 static void OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
 										SERIALIZABLEXACT *writer);
+static void CreateLocalPredicateLockHash(void);
+static void ReleasePredicateLocksLocal(void);
 
 
 /*------------------------------------------------------------------------*/
@@ -518,8 +529,10 @@ SerializationNeededForRead(Relation relation, Snapshot snapshot)
 	 * A transaction is flagged as RO_SAFE if all concurrent R/W transactions
 	 * commit without having conflicts out to an earlier snapshot, thus
 	 * ensuring that no conflicts are possible for this transaction.
+	 *
+	 * This optimization is not yet supported in parallel mode.
 	 */
-	if (SxactIsROSafe(MySerializableXact))
+	if (SxactIsROSafe(MySerializableXact) && !IsInParallelMode())
 	{
 		ReleasePredicateLocks(false);
 		return false;
@@ -1168,6 +1181,8 @@ InitPredicateLocks(void)
 		memset(PredXact->element, 0, requestSize);
 		for (i = 0; i < max_table_size; i++)
 		{
+			LWLockInitialize(&PredXact->element[i].sxact.predicateLockListLock,
+							 LWTRANCHE_SXACT);
 			SHMQueueInsertBefore(&(PredXact->availableList),
 								 &(PredXact->element[i].link));
 		}
@@ -1633,6 +1648,17 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
 {
 	Assert(IsolationIsSerializable());
 
+	/*
+	 * If this is called by parallel.c in a parallel worker, we don't want to
+	 * create a SERIALIZABLEXACT just yet because the leader's
+	 * SERIALIZABLEXACT will be installed with AttachSerializableXact().  We
+	 * also don't want to reject SERIALIZABLE READ ONLY DEFERRABLE in this
+	 * case, because the leader has already determined that the snapshot it
+	 * has passed us is safe.  So there is nothing for us to do.
+	 */
+	if (IsParallelWorker())
+		return;
+
 	/*
 	 * We do not allow SERIALIZABLE READ ONLY DEFERRABLE transactions to
 	 * import snapshots, since there's no way to wait for a safe snapshot when
@@ -1666,7 +1692,6 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	VirtualTransactionId vxid;
 	SERIALIZABLEXACT *sxact,
 			   *othersxact;
-	HASHCTL		hash_ctl;
 
 	/* We only do this for serializable transactions.  Once. */
 	Assert(MySerializableXact == InvalidSerializableXact);
@@ -1813,6 +1838,16 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 
 	LWLockRelease(SerializableXactHashLock);
 
+	CreateLocalPredicateLockHash();
+
+	return snapshot;
+}
+
+static void
+CreateLocalPredicateLockHash(void)
+{
+	HASHCTL		hash_ctl;
+
 	/* Initialize the backend-local hash table of parent locks */
 	Assert(LocalPredicateLockHash == NULL);
 	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
@@ -1822,8 +1857,6 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 										 max_predicate_locks_per_xact,
 										 &hash_ctl,
 										 HASH_ELEM | HASH_BLOBS);
-
-	return snapshot;
 }
 
 /*
@@ -2078,7 +2111,9 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * This implementation is assuming that the usage of each target tag field
  * is uniform.  No need to make this hard if we don't have to.
  *
- * We aren't acquiring lightweight locks for the predicate lock or lock
+ * We acquire an LWLock in the case of parallel mode, because worker
+ * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
+ * we aren't acquiring LWLocks for the predicate lock or lock
  * target structures associated with this transaction unless we're going
  * to modify them, because no other process is permitted to modify our
  * locks.
@@ -2091,6 +2126,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 	sxact = MySerializableXact;
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -2144,6 +2181,8 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 
 		predlock = nextpredlock;
 	}
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2342,6 +2381,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
 
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
 	/* Make sure that the target is represented. */
@@ -2379,6 +2420,8 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	}
 
 	LWLockRelease(partitionLock);
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 }
 
@@ -2566,7 +2609,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
@@ -2626,7 +2670,7 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
  * covers it, or if we are absolutely certain that no one will need to
  * refer to that lock in the future.
  *
- * Caller must hold SerializablePredicateLockListLock.
+ * Caller must hold SerializablePredicateLockListLock exclusively.
  */
 static bool
 TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
@@ -2641,7 +2685,8 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(LWLockHeldByMeInMode(SerializablePredicateLockListLock,
+								LW_EXCLUSIVE));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3244,6 +3289,13 @@ ReleasePredicateLocks(bool isCommit)
 		return;
 	}
 
+	/* Parallel workers mustn't release predicate locks. */
+	if (IsParallelWorker())
+	{
+		ReleasePredicateLocksLocal();
+		return;
+	}
+
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
 	Assert(!isCommit || SxactIsPrepared(MySerializableXact));
@@ -3273,8 +3325,8 @@ ReleasePredicateLocks(bool isCommit)
 	MySerializableXact->finishedBefore = ShmemVariableCache->nextXid;
 
 	/*
-	 * If it's not a commit it's a rollback, and we can clear our locks
-	 * immediately.
+	 * If it's not a commit it's either a rollback or a read-only transaction
+	 * flagged SXACT_FLAG_RO_SAFE, and we can clear our locks immediately.
 	 */
 	if (isCommit)
 	{
@@ -3521,6 +3573,12 @@ ReleasePredicateLocks(bool isCommit)
 	if (needToClear)
 		ClearOldPredicateLocks();
 
+	ReleasePredicateLocksLocal();
+}
+
+static void
+ReleasePredicateLocksLocal(void)
+{
 	MySerializableXact = InvalidSerializableXact;
 	MyXactDidWrite = false;
 
@@ -4213,6 +4271,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		PREDICATELOCK *rmpredlock;
 
 		LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+		if (IsInParallelMode())
+			LWLockAcquire(&MySerializableXact->predicateLockListLock, LW_EXCLUSIVE);
 		LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
@@ -4247,6 +4307,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 
 		LWLockRelease(SerializableXactHashLock);
 		LWLockRelease(partitionLock);
+		if (IsInParallelMode())
+			LWLockRelease(&MySerializableXact->predicateLockListLock);
 		LWLockRelease(SerializablePredicateLockListLock);
 
 		if (rmpredlock != NULL)
@@ -4795,6 +4857,13 @@ AtPrepare_PredicateLocks(void)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
 
+	/*
+	 * No need to take sxact->predicateLockListLock in parallel mode because
+	 * there cannot be any parallel workers running while we are preparing a
+	 * transaction.
+	 */
+	Assert(!IsParallelWorker() && !ParallelContextActive());
+
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -5003,3 +5072,28 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
 	}
 }
+
+/*
+ * Prepare to share the current SERIALIZABLEXACT with parallel workers.
+ * Return a handle object that can be used by AttachSerializableXact() in a
+ * parallel worker.
+ */
+SerializableXactHandle
+ShareSerializableXact(void)
+{
+	return MySerializableXact;
+}
+
+/*
+ * Allow parallel workers to import the leader's SERIALIZABLEXACT.
+ */
+void
+AttachSerializableXact(SerializableXactHandle handle)
+{
+
+	Assert(MySerializableXact == InvalidSerializableXact);
+
+	MySerializableXact = (SERIALIZABLEXACT *) handle;
+	if (MySerializableXact != InvalidSerializableXact)
+		CreateLocalPredicateLockHash();
+}
diff --git a/src/include/access/parallel.h b/src/include/access/parallel.h
index 025691fd82d..45e7fbb43f8 100644
--- a/src/include/access/parallel.h
+++ b/src/include/access/parallel.h
@@ -60,8 +60,7 @@ extern PGDLLIMPORT bool InitializingParallelWorker;
 #define		IsParallelWorker()		(ParallelWorkerNumber >= 0)
 
 extern ParallelContext *CreateParallelContext(const char *library_name,
-					  const char *function_name, int nworkers,
-					  bool serializable_okay);
+					  const char *function_name, int nworkers);
 extern void InitializeParallelDSM(ParallelContext *pcxt);
 extern void ReinitializeParallelDSM(ParallelContext *pcxt);
 extern void LaunchParallelWorkers(ParallelContext *pcxt);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index c21bfe2f666..b25c43fc6be 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_SHARED_TUPLESTORE,
 	LWTRANCHE_TBM,
 	LWTRANCHE_PARALLEL_APPEND,
+	LWTRANCHE_SXACT,
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 6a3464daa1e..23f3acc3ce1 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -30,6 +30,11 @@ extern int	max_predicate_locks_per_page;
 /* Number of SLRU buffers to use for predicate locking */
 #define NUM_OLDSERXID_BUFFERS	16
 
+/*
+ * A handle used for sharing SERIALIZABLEXACT objects between the participants
+ * in a parallel query.
+ */
+typedef void *SerializableXactHandle;
 
 /*
  * function prototypes
@@ -74,4 +79,8 @@ extern void PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit);
 extern void predicatelock_twophase_recover(TransactionId xid, uint16 info,
 							   void *recdata, uint32 len);
 
+/* parallel query support */
+extern SerializableXactHandle ShareSerializableXact(void);
+extern void AttachSerializableXact(SerializableXactHandle handle);
+
 #endif							/* PREDICATE_H */
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 0f736d37dff..59eb49e57ee 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -15,6 +15,7 @@
 #define PREDICATE_INTERNALS_H
 
 #include "storage/lock.h"
+#include "storage/lwlock.h"
 
 /*
  * Commit number.
@@ -91,6 +92,9 @@ typedef struct SERIALIZABLEXACT
 	SHM_QUEUE	finishedLink;	/* list link in
 								 * FinishedSerializableTransactions */
 
+	LWLock		predicateLockListLock;	/* protects predicateLocks in parallel
+										 * mode */
+
 	/*
 	 * for r/o transactions: list of concurrent r/w transactions that we could
 	 * potentially have conflicts with, and vice versa for r/w transactions
diff --git a/src/test/isolation/expected/serializable-parallel-2.out b/src/test/isolation/expected/serializable-parallel-2.out
new file mode 100644
index 00000000000..9a693c4dc62
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel-2.out
@@ -0,0 +1,44 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1r s2r1 s1c s2r2 s2c
+step s1r: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s2r1: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s1c: COMMIT;
+step s2r2: SELECT * FROM foo;
+a              
+
+1              
+2              
+3              
+4              
+5              
+6              
+7              
+8              
+9              
+10             
+step s2c: COMMIT;
diff --git a/src/test/isolation/expected/serializable-parallel.out b/src/test/isolation/expected/serializable-parallel.out
new file mode 100644
index 00000000000..f43aa6a2990
--- /dev/null
+++ b/src/test/isolation/expected/serializable-parallel.out
@@ -0,0 +1,44 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s2wx s2c s3c
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+step s2c: COMMIT;
+step s3c: COMMIT;
+
+starting permutation: s2rx s2ry s1ry s1wy s1c s3r s3c s2wx
+step s2rx: SELECT balance FROM bank_account WHERE id = 'X';
+balance        
+
+0              
+step s2ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1ry: SELECT balance FROM bank_account WHERE id = 'Y';
+balance        
+
+0              
+step s1wy: UPDATE bank_account SET balance = 20 WHERE id = 'Y';
+step s1c: COMMIT;
+step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id;
+id             balance        
+
+X              0              
+Y              20             
+step s3c: COMMIT;
+step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index dd57a96e788..812b361e58a 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -79,3 +79,5 @@ test: partition-key-update-3
 test: partition-key-update-4
 test: plpgsql-toast
 test: truncate-conflict
+test: serializable-parallel
+test: serializable-parallel-2
diff --git a/src/test/isolation/specs/serializable-parallel-2.spec b/src/test/isolation/specs/serializable-parallel-2.spec
new file mode 100644
index 00000000000..7f90f75d882
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel-2.spec
@@ -0,0 +1,30 @@
+# Exercise the case where a read-only serializable transaction has
+# SXACT_FLAG_RO_SAFE set in a parallel query.
+
+setup
+{
+	CREATE TABLE foo AS SELECT generate_series(1, 10)::int a;
+	ALTER TABLE foo SET (parallel_workers = 2);
+}
+
+teardown
+{
+	DROP TABLE foo;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1r"	{ SELECT * FROM foo; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY;
+			  SET parallel_setup_cost = 0;
+			  SET parallel_tuple_cost = 0;
+			}
+step "s2r1"	{ SELECT * FROM foo; }
+step "s2r2"	{ SELECT * FROM foo; }
+step "s2c"	{ COMMIT; }
+
+permutation "s1r" "s2r1" "s1c" "s2r2" "s2c"
diff --git a/src/test/isolation/specs/serializable-parallel.spec b/src/test/isolation/specs/serializable-parallel.spec
new file mode 100644
index 00000000000..0e7c2c7c1fa
--- /dev/null
+++ b/src/test/isolation/specs/serializable-parallel.spec
@@ -0,0 +1,48 @@
+# The example from the paper "A read-only transaction anomaly under snapshot
+# isolation"[1].
+#
+# Here we test that serializable snapshot isolation (SERIALIZABLE) doesn't
+# suffer from the anomaly, because s2 is aborted upon detection of a cycle.
+# In this case the read only query s3 happens to be running in a parallel
+# worker.
+#
+# [1] http://www.cs.umb.edu/~poneil/ROAnom.pdf
+
+setup
+{
+	CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
+	INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
+}
+
+teardown
+{
+	DROP TABLE bank_account;
+}
+
+session "s1"
+setup 		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s1ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s1wy"	{ UPDATE bank_account SET balance = 20 WHERE id = 'Y'; }
+step "s1c" 	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; }
+step "s2rx"	{ SELECT balance FROM bank_account WHERE id = 'X'; }
+step "s2ry"	{ SELECT balance FROM bank_account WHERE id = 'Y'; }
+step "s2wx"	{ UPDATE bank_account SET balance = -11 WHERE id = 'X'; }
+step "s2c"	{ COMMIT; }
+
+session "s3"
+setup		{
+			  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
+			  SET max_parallel_workers_per_gather = 2;
+			  SET force_parallel_mode = on;
+			}
+step "s3r"	{ SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y') ORDER BY id; }
+step "s3c"	{ COMMIT; }
+
+# without s3, s1 and s2 commit
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
+
+# once s3 observes the data committed by s1, a cycle is created and s2 aborts
+permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s3r" "s3c" "s2wx"
-- 
2.17.1 (Apple Git-112)

0002-Enable-the-read-only-SERIALIZABLE-optimization-f-v16.patchapplication/octet-stream; name=0002-Enable-the-read-only-SERIALIZABLE-optimization-f-v16.patchDownload

From abcba6cc0bdbfc2ecb302b1e40c14917e8f18e61 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Sun, 25 Feb 2018 23:45:09 +1300
Subject: [PATCH 2/2] Enable the read-only SERIALIZABLE optimization for
 parallel query.

A SERIALIZABLEXACT can be marked as SXACT_FLAG_RO_SAFE by a concurrent session,
meaning that it is safe to throw away this SERIALIZABLEXACT and start behaving
like a REPEATABLE READ transaction.  The problem is that the leader and workers
are sharing the same SERIALIZABLEXACT so this must be coordinated carefully.
This commit solves that problem as follows:

The first backend to observe the SXACT_FLAG_RO_SAFE flag will 'partially
release' it, meaning that the conflicts and locks it holds can be released, but
the SERIALIZABLEXACT itself will remain active because other backends might
have a pointer to it.

Whenever any backend notices the SXACT_FLAG_RO_SAFE flag, it clears its own
MySerializableXact variable so that it can skip SSI checks for the rest of the
transaction.  In the special case of the leader process, it transfers the
SERIALIZABLEXACT to a new variable SavedSerializableXact, so that it can be
completely released at the end of the transaction after all workers have
exited.

Author: Thomas Munro
Reviewed-by: Kevin Grittner, Masahiko Sawada
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
---
 src/backend/storage/lmgr/predicate.c      | 141 +++++++++++++++++++---
 src/backend/utils/resowner/resowner.c     |   2 +-
 src/include/storage/predicate.h           |   2 +-
 src/include/storage/predicate_internals.h |   6 +
 4 files changed, 130 insertions(+), 21 deletions(-)

diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index a10b5dda86a..200d968cd5c 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -170,7 +170,7 @@
  *		PredicateLockPageCombine(Relation relation, BlockNumber oldblkno,
  *								 BlockNumber newblkno)
  *		TransferPredicateLocksToHeapRelation(Relation relation)
- *		ReleasePredicateLocks(bool isCommit)
+ *		ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
  *
  * conflict detection (may also trigger rollback)
  *		CheckForSerializableConflictOut(bool visible, Relation relation,
@@ -288,6 +288,7 @@
 #define SxactIsDeferrableWaiting(sxact) (((sxact)->flags & SXACT_FLAG_DEFERRABLE_WAITING) != 0)
 #define SxactIsROSafe(sxact) (((sxact)->flags & SXACT_FLAG_RO_SAFE) != 0)
 #define SxactIsROUnsafe(sxact) (((sxact)->flags & SXACT_FLAG_RO_UNSAFE) != 0)
+#define SxactIsPartiallyReleased(sxact) (((sxact)->flags & SXACT_FLAG_PARTIALLY_RELEASED) != 0)
 
 /*
  * Compute the hash code associated with a PREDICATELOCKTARGETTAG.
@@ -418,6 +419,15 @@ static HTAB *LocalPredicateLockHash = NULL;
 static SERIALIZABLEXACT *MySerializableXact = InvalidSerializableXact;
 static bool MyXactDidWrite = false;
 
+/*
+ * The SXACT_FLAG_RO_UNSAFE optimization might lead us to release
+ * MySerializableXact early.  If that happens in a parallel query, the leader
+ * needs to defer the destruction of the SERIALIZABLEXACT until end of
+ * transaction, because the workers still have a reference to it.  In that
+ * case, the leader stores it here.
+ */
+static SERIALIZABLEXACT *SavedSerializableXact = InvalidSerializableXact;
+
 /* local functions */
 
 static SERIALIZABLEXACT *CreatePredXact(void);
@@ -529,12 +539,10 @@ SerializationNeededForRead(Relation relation, Snapshot snapshot)
 	 * A transaction is flagged as RO_SAFE if all concurrent R/W transactions
 	 * commit without having conflicts out to an earlier snapshot, thus
 	 * ensuring that no conflicts are possible for this transaction.
-	 *
-	 * This optimization is not yet supported in parallel mode.
 	 */
-	if (SxactIsROSafe(MySerializableXact) && !IsInParallelMode())
+	if (SxactIsROSafe(MySerializableXact))
 	{
-		ReleasePredicateLocks(false);
+		ReleasePredicateLocks(false, true);
 		return false;
 	}
 
@@ -1528,14 +1536,14 @@ GetSafeSnapshot(Snapshot origSnapshot)
 		ereport(DEBUG2,
 				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 				 errmsg("deferrable snapshot was unsafe; trying a new one")));
-		ReleasePredicateLocks(false);
+		ReleasePredicateLocks(false, false);
 	}
 
 	/*
 	 * Now we have a safe snapshot, so we don't need to do any further checks.
 	 */
 	Assert(SxactIsROSafe(MySerializableXact));
-	ReleasePredicateLocks(false);
+	ReleasePredicateLocks(false, true);
 
 	return snapshot;
 }
@@ -3262,9 +3270,17 @@ SetNewSxactGlobalXmin(void)
  * If this transaction is committing and is holding any predicate locks,
  * it must be added to a list of completed serializable transactions still
  * holding locks.
+ *
+ * If isReadOnlySafe is true, then predicate locks are being released before
+ * the end of the transaction because MySerializableXact has been determined
+ * to be RO_SAFE.  In non-parallel mode we can release it completely, but it
+ * in parallel mode we partially release the SERIALIZABLEXACT and keep it
+ * around until the end of the transaction, allowing each backend to clear its
+ * MySerializableXact variable and benefit from the optimization in its own
+ * time.
  */
 void
-ReleasePredicateLocks(bool isCommit)
+ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
 {
 	bool		needToClear;
 	RWConflict	conflict,
@@ -3283,25 +3299,97 @@ ReleasePredicateLocks(bool isCommit)
 	 */
 	bool		topLevelIsDeclaredReadOnly;
 
-	if (MySerializableXact == InvalidSerializableXact)
+	/* We can't be both committing and releasing early due to RO_SAFE. */
+	Assert(!(isCommit && isReadOnlySafe));
+
+	/* Are we at the end of a transaction, that is, a commit or abort? */
+	if (!isReadOnlySafe)
 	{
-		Assert(LocalPredicateLockHash == NULL);
-		return;
+		/*
+		 * Parallel workers mustn't release predicate locks at the end of
+		 * their transaction.  The leader will do that at the end of its
+		 * transaction.
+		 */
+		if (IsParallelWorker())
+		{
+			ReleasePredicateLocksLocal();
+			return;
+		}
+
+		/*
+		 * By the time the leader in a parallel query reaches end of
+		 * transaction, it has waited for all workers to exit.
+		 */
+		Assert(!ParallelContextActive());
+
+		/*
+		 * If the leader in a parallel query earlier stashed a partially
+		 * released SERIALIZABLEXACT for final clean-up at end of transaction
+		 * (because workers might still have been accessing it), then it's
+		 * time to restore it.
+		 */
+		if (SavedSerializableXact != InvalidSerializableXact)
+		{
+			Assert(MySerializableXact == InvalidSerializableXact);
+			MySerializableXact = SavedSerializableXact;
+			SavedSerializableXact = InvalidSerializableXact;
+			Assert(SxactIsPartiallyReleased(MySerializableXact));
+		}
 	}
 
-	/* Parallel workers mustn't release predicate locks. */
-	if (IsParallelWorker())
+	if (MySerializableXact == InvalidSerializableXact)
 	{
-		ReleasePredicateLocksLocal();
+		Assert(LocalPredicateLockHash == NULL);
 		return;
 	}
 
 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 
+	/*
+	 * If the transaction is committing, but it has been partially released
+	 * already, then treat this as a roll back.  It was marked as rolled back.
+	 */
+	if (isCommit && SxactIsPartiallyReleased(MySerializableXact))
+		isCommit = false;
+
+	/*
+	 * If we're called in the middle of a transaction because we discovered
+	 * that the SXACT_FLAG_RO_SAFE flag was set, then we'll partially release
+	 * it (that is, release the predicate locks and conflicts, but not the
+	 * SERIALIZABLEXACT itself) if we're the first backend to have noticed.
+	 */
+	if (isReadOnlySafe && IsInParallelMode())
+	{
+		/*
+		 * The leader needs to stash a pointer to it, so that it can
+		 * completely release it at end-of-transaction.
+		 */
+		if (!IsParallelWorker())
+			SavedSerializableXact = MySerializableXact;
+
+		/*
+		 * The first backend to reach this condition will partially release
+		 * the SERIALIZABLEXACT.  All others will just clear their
+		 * backend-local state so that they stop doing SSI checks for the rest
+		 * of the transaction.
+		 */
+		if (SxactIsPartiallyReleased(MySerializableXact))
+		{
+			LWLockRelease(SerializableXactHashLock);
+			ReleasePredicateLocksLocal();
+			return;
+		}
+		else
+		{
+			MySerializableXact->flags |= SXACT_FLAG_PARTIALLY_RELEASED;
+			/* ... and proceed to perform the partial release below. */
+		}
+	}
 	Assert(!isCommit || SxactIsPrepared(MySerializableXact));
 	Assert(!isCommit || !SxactIsDoomed(MySerializableXact));
 	Assert(!SxactIsCommitted(MySerializableXact));
-	Assert(!SxactIsRolledBack(MySerializableXact));
+	Assert(SxactIsPartiallyReleased(MySerializableXact)
+		   || !SxactIsRolledBack(MySerializableXact));
 
 	/* may not be serializable during COMMIT/ROLLBACK PREPARED */
 	Assert(MySerializableXact->pid == 0 || IsolationIsSerializable());
@@ -3350,7 +3438,8 @@ ReleasePredicateLocks(bool isCommit)
 		 * cleanup. This means it should not be considered when calculating
 		 * SxactGlobalXmin.
 		 */
-		MySerializableXact->flags |= SXACT_FLAG_DOOMED;
+		if (!isReadOnlySafe)
+			MySerializableXact->flags |= SXACT_FLAG_DOOMED;
 		MySerializableXact->flags |= SXACT_FLAG_ROLLED_BACK;
 
 		/*
@@ -3546,7 +3635,8 @@ ReleasePredicateLocks(bool isCommit)
 	 * was launched.
 	 */
 	needToClear = false;
-	if (TransactionIdEquals(MySerializableXact->xmin, PredXact->SxactGlobalXmin))
+	if (!isReadOnlySafe &&
+		TransactionIdEquals(MySerializableXact->xmin, PredXact->SxactGlobalXmin))
 	{
 		Assert(PredXact->SxactGlobalXminCount > 0);
 		if (--(PredXact->SxactGlobalXminCount) == 0)
@@ -3565,8 +3655,16 @@ ReleasePredicateLocks(bool isCommit)
 		SHMQueueInsertBefore(FinishedSerializableTransactions,
 							 &MySerializableXact->finishedLink);
 
+	/*
+	 * If we're releasing a RO_SAFE transaction in parallel mode, we'll only
+	 * partially release it.  That's necessary because other backends may have
+	 * a reference to it.  The leader will release the SERIALIZABLEXACT itself
+	 * at the end of the transaction after workers have stopped running.
+	 */
 	if (!isCommit)
-		ReleaseOneSerializableXact(MySerializableXact, false, false);
+		ReleaseOneSerializableXact(MySerializableXact,
+								   isReadOnlySafe && IsInParallelMode(),
+								   false);
 
 	LWLockRelease(SerializableFinishedListLock);
 
@@ -3770,6 +3868,8 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 	 * them to OldCommittedSxact if summarize is true)
 	 */
 	LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+	if (IsInParallelMode())
+		LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(sxact->predicateLocks),
 					 &(sxact->predicateLocks),
@@ -3849,6 +3949,8 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 	 */
 	SHMQueueInit(&sxact->predicateLocks);
 
+	if (IsInParallelMode())
+		LWLockRelease(&sxact->predicateLockListLock);
 	LWLockRelease(SerializablePredicateLockListLock);
 
 	sxidtag.xid = sxact->topXid;
@@ -4739,6 +4841,7 @@ PreCommit_CheckForSerializationFailure(void)
 	/* Check if someone else has already decided that we need to die */
 	if (SxactIsDoomed(MySerializableXact))
 	{
+		Assert(!SxactIsPartiallyReleased(MySerializableXact));
 		LWLockRelease(SerializableXactHashLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
@@ -4936,7 +5039,7 @@ PredicateLockTwoPhaseFinish(TransactionId xid, bool isCommit)
 	MySerializableXact = sxid->myXact;
 	MyXactDidWrite = true;		/* conservatively assume that we wrote
 								 * something */
-	ReleasePredicateLocks(isCommit);
+	ReleasePredicateLocks(isCommit, false);
 }
 
 /*
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index 211833da02c..74f80a0942a 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -565,7 +565,7 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
 			if (owner == TopTransactionResourceOwner)
 			{
 				ProcReleaseLocks(isCommit);
-				ReleasePredicateLocks(isCommit);
+				ReleasePredicateLocks(isCommit, false);
 			}
 		}
 		else
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 23f3acc3ce1..0925270b91e 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -61,7 +61,7 @@ extern void PredicateLockTuple(Relation relation, HeapTuple tuple, Snapshot snap
 extern void PredicateLockPageSplit(Relation relation, BlockNumber oldblkno, BlockNumber newblkno);
 extern void PredicateLockPageCombine(Relation relation, BlockNumber oldblkno, BlockNumber newblkno);
 extern void TransferPredicateLocksToHeapRelation(Relation relation);
-extern void ReleasePredicateLocks(bool isCommit);
+extern void ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe);
 
 /* conflict detection (may also trigger rollback) */
 extern void CheckForSerializableConflictOut(bool valid, Relation relation, HeapTuple tuple,
diff --git a/src/include/storage/predicate_internals.h b/src/include/storage/predicate_internals.h
index 59eb49e57ee..04de63877d5 100644
--- a/src/include/storage/predicate_internals.h
+++ b/src/include/storage/predicate_internals.h
@@ -127,6 +127,12 @@ typedef struct SERIALIZABLEXACT
 #define SXACT_FLAG_RO_UNSAFE			0x00000100
 #define SXACT_FLAG_SUMMARY_CONFLICT_IN	0x00000200
 #define SXACT_FLAG_SUMMARY_CONFLICT_OUT 0x00000400
+/*
+ * The following flag means the transaction has been partially released
+ * already, but is being preserved because parallel workers might have a
+ * reference to it.  It'll be recycled by the leader at end-of-transaction.
+ */
+#define SXACT_FLAG_PARTIALLY_RELEASED	0x00000800
 
 /*
  * The following types are used to provide an ad hoc list for holding
-- 
2.17.1 (Apple Git-112)

#49

Kevin Grittner

kgrittn@gmail.com

over 7 years ago

In reply to: Thomas Munro (#48)

Re: [HACKERS] SERIALIZABLE with parallel query

On Mon, Oct 8, 2018 at 9:40 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Rebased.

It applies and builds clean, it passed make world with cassert and TAP
tests, and I can't see any remaining flaws. This is true both of just
the 0001 v16 patch and that with 0002 v16 applied on top of it.

It would be great if someone with a big test machine could stress test
and benchmark this versus current production versions.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

#50

Thomas Munro

thomas.munro@gmail.com

almost 7 years ago

In reply to: Kevin Grittner (#49)

Re: [HACKERS] SERIALIZABLE with parallel query

On Thu, Oct 11, 2018 at 10:15 AM Kevin Grittner <kgrittn@gmail.com> wrote:

It applies and builds clean, it passed make world with cassert and TAP
tests, and I can't see any remaining flaws. This is true both of just
the 0001 v16 patch and that with 0002 v16 applied on top of it.

Thanks. I'd like to commit this soon.

It would be great if someone with a big test machine could stress test
and benchmark this versus current production versions.

Hmm. I can't compare it with current production versions directly
since SERIALIZABLE + parallel query wasn't possible before. I could
compare it against lower isolation levels or non-parallel query, but
those tests don't seem to tell us anything we don't already know:
SERIALIZABLE slows some stuff down, parallel query speeds some stuff
up. As for stress-testing, most benchmarks are either good for testing
parallelism (TPC-H etc) but don't do any writes, or good for testing
writes (TPC-B etc) but don't do any parallelism. I'm going to
experiment with the "SIBENCH" approach from the Cahill paper and see
where that leads.

--
Thomas Munro
https://enterprisedb.com

#51

Thomas Munro

thomas.munro@gmail.com

almost 7 years ago

In reply to: Thomas Munro (#50)

Re: [HACKERS] SERIALIZABLE with parallel query

On Mon, Mar 4, 2019 at 10:17 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Oct 11, 2018 at 10:15 AM Kevin Grittner <kgrittn@gmail.com> wrote:

It applies and builds clean, it passed make world with cassert and TAP
tests, and I can't see any remaining flaws. This is true both of just
the 0001 v16 patch and that with 0002 v16 applied on top of it.

Thanks. I'd like to commit this soon.

I did a round of testing under load and some printf-debugging to
convince myself that the SXACT_FLAG_RO_SAFE handling really is
exercised by serializable-parallel-2.spec and behaving as expected,
along with some more testing by hand, and pushed this.

To generate load I used a knock-off of sibench[1]https://github.com/macdice/petit-sibench/blob/master/petit-sibench.c, run as eg
./petit-sibench --rows 10000 --threads 8 --ssi, against a server
running with -c min_parallel_table_scan_size=128kB -c
parallel_setup_cost=0 -c max_worker_processes=16 -c
max_parallel_workers=16.

[1]: https://github.com/macdice/petit-sibench/blob/master/petit-sibench.c

--
Thomas Munro
https://enterprisedb.com