FlexLocks

Started by Robert Haasabout 14 years ago35 messages

robertmhaas@gmail.com

about 14 years ago

2 attachment(s)

I've been noodling around with various methods of reducing
ProcArrayLock contention and (after many false starts) I think I've
finally found one that works really well. I apologize in advance if
this makes your head explode; I think that the design I have hear is
solid, but it represents a significant and invasive overhaul of the
LWLock system - I think for the better, but you'll have to be the
judge. I'll start with the performance numbers (from that good ol'
32-core Nate Boley system), where I build from commit
f1585362856d4da17113ba2e4ba46cf83cba0cf2, with and without the
attached patches, and then ran pgbench on logged and unlogged tables
with various numbers of clients, with shared_buffers = 8GB,
maintenance_work_mem = 1GB, synchronous_commit = off,
checkpoint_segments = 300, checkpoint_timeout = 15min,
checkpoint_completion_target = 0.9, wal_writer_delay = 20ms. The
numbers below are (as usual) the median of three-five minute runs at
scale factor 100. The lines starting with "m" and a number are that
number of clients on unpatched master, and the lines starting with "f"
are that number of clients with this patch set.

The really big win here is unlogged tables at 32 clients, where
throughput has *doubled* and now scales *better than linearly* as
compared with the single-client results.

== Unlogged Tables ==
m01 tps = 679.737639 (including connections establishing)
f01 tps = 668.275270 (including connections establishing)
m08 tps = 4771.757193 (including connections establishing)
f08 tps = 4867.520049 (including connections establishing)
m32 tps = 10736.232426 (including connections establishing)
f32 tps = 21303.295441 (including connections establishing)
m80 tps = 7829.989887 (including connections establishing)
f80 tps = 19835.231438 (including connections establishing)

== Permanent Tables ==
m01 tps = 634.424125 (including connections establishing)
f01 tps = 633.450405 (including connections establishing)
m08 tps = 4544.781551 (including connections establishing)
f08 tps = 4556.298219 (including connections establishing)
m32 tps = 9902.844302 (including connections establishing)
f32 tps = 11028.745881 (including connections establishing)
m80 tps = 7467.437442 (including connections establishing)
f80 tps = 11909.738232 (including connections establishing)

A couple of other interesting things buried in these numbers:

1. Permanent tables don't derive nearly as much benefit as unlogged
tables. I believe that this is because, for permanent tables, the
major bottleneck is WALInsertLock. Fixing ProcArrayLock squeezes out
a healthy 10%, but we'll have to make significant performance on
WALInsertLock to get anywhere close to linear scaling.
2. The drop-off between 32 clients and 80 clients is greatly reduced
with this patch set; indeed, for permanent tables, tps increased
slightly between 32 and 80 clients. I believe that small decrease for
unlogged tables is likely due to the fact that by 80 tables,
WALInsertLock starts to become a contention point, due to the need to
insert the commit records.

In terms of the actual patches, it's been bugging me for a while that
the LWLock code contains a lot of infrastructure that's not easily
reusable by other parts of the system. So the first of the two
attached patches, flexlock-v1.patch, separates the LWLock code into an
upper layer and a lower layer. The lower layer I called "FlexLocks",
and it's designed to allow a variety of locking implementations to be
built on top of it and reuse as much of the basic infrastructure as I
could figure out how to make reusable without hurting performance too
much. LWLocks become the anchor client of the FlexLock system; in
essence, most of flexlock.c is code that was removed from lwlock.c.
The second patch, procarraylock.c, uses that infrastructure to define
a new type of FlexLock specifically for ProcArrayLock. It basically
works like a regular LWLock, except that it has a special operation to
optimize ProcArrayEndTransaction(). In the uncontended case, instead
of acquiring and releasing the lock, it just grabs the lock, observes
that there is no contention, clears the critical PGPROC fields (which
isn't noticeably slower than updating the state of the lock would be)
and releases the spin lock. There's then no need to reacquire the
spinlock to "release" the lock; we're done. In the contended case,
the backend wishing to end adds itself to a queue of ending
transactions. When ProcArrayLock is released, the last person out
clears the PGPROC structures for all the waiters and wakes them all
up; they don't need to reacquire the lock, because the work they
wished to perform while holding it is already done. Thus, in the
*worst* case, ending transactions only need to acquire the spinlock
protecting ProcArrayLock half as often (once instead of twice), and in
the best case (where backends have to keep retrying only to repeatedly
fail to get the lock) it's far better than that.

Of course, there are ways that this could be implemented without the
FlexLock stuff, if people don't like this solution. Myself, I find it
quite elegant (though there are certainly arguable points in there
where the code could probably be improved), but then again, I wrote
it. For what it's worth, I believe that there are other places where
the FlexLock infrastructure could be helpful. In this case, the new
ProcArrayLock is very specific to what ProcArrayLock actually does,
and can't be really reused for anything else. But I've had a thought
that we might want to have a type of FlexLock that contains an LSN.
The lock holder advances the LSN and can then release everyone who was
waiting for a value <= that LSN without them needing to reacquire the
lock. This could be useful for things like WALWriteLock, and sync
rep. Also, I think there might be interesting applications for buffer
locks, perhaps by having a lock type that manages both content locks
and pins. Alternatively, if we want to support CRCs, it might be
useful to have a third buffer lock mode in between shared and
exclusive. SX would conflict with itself and with exclusive but not
with shared, and would be required to write out the page or set hint
bits but not just to examine tuples; this could be used to ensure that
the page doesn't change (thus invalidating the CRC) while the write is
in progress. I'm not necessarily saying that any of these particular
things are what we want to do, just throwing out the idea that we may
want a variety of lock types that are similar to lightweight locks but
with subtly different behavior, yet with common infrastructure for
error handling and wait queue management.

Anyway, this is all up for discussion, argument, etc. - but here are
the patches. Comments, idea, thoughts, code review, and/or testing
are appreciated.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

flexlock-v1.patchapplication/octet-stream; name=flexlock-v1.patchDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 8dc3054..51b24d0 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -105,7 +105,7 @@ typedef struct pgssEntry
  */
 typedef struct pgssSharedState
 {
-	LWLockId	lock;			/* protects hashtable search/modification */
+	FlexLockId	lock;			/* protects hashtable search/modification */
 	int			query_size;		/* max query length in bytes */
 } pgssSharedState;
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d1e628f..8517b36 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6199,14 +6199,14 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
      </varlistentry>
 
      <varlistentry>
-      <term><varname>trace_lwlocks</varname> (<type>boolean</type>)</term>
+      <term><varname>trace_flexlocks</varname> (<type>boolean</type>)</term>
       <indexterm>
-       <primary><varname>trace_lwlocks</> configuration parameter</primary>
+       <primary><varname>trace_flexlocks</> configuration parameter</primary>
       </indexterm>
       <listitem>
        <para>
-        If on, emit information about lightweight lock usage.  Lightweight
-        locks are intended primarily to provide mutual exclusion of access
+        If on, emit information about FlexLock usage.  FlexLocks
+        are intended primarily to provide mutual exclusion of access
         to shared-memory data structures.
        </para>
        <para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b9dc1d2..98ed0d3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1724,49 +1724,49 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
       or kilobytes of memory used for an internal sort.</entry>
     </row>
     <row>
-     <entry>lwlock-acquire</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock has been acquired.
-      arg0 is the LWLock's ID.
-      arg1 is the requested lock mode, either exclusive or shared.</entry>
+     <entry>flexlock-acquire</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock has been acquired.
+      arg0 is the FlexLock's ID.
+      arg1 is the requested lock mode.</entry>
     </row>
     <row>
-     <entry>lwlock-release</entry>
-     <entry>(LWLockId)</entry>
-     <entry>Probe that fires when an LWLock has been released (but note
+     <entry>flexlock-release</entry>
+     <entry>(FlexLockId)</entry>
+     <entry>Probe that fires when a FlexLock has been released (but note
       that any released waiters have not yet been awakened).
-      arg0 is the LWLock's ID.</entry>
+      arg0 is the FlexLock's ID.</entry>
     </row>
     <row>
-     <entry>lwlock-wait-start</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock was not immediately available and
+     <entry>flexlock-wait-start</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock was not immediately available and
       a server process has begun to wait for the lock to become available.
-      arg0 is the LWLock's ID.
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
-     <entry>lwlock-wait-done</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
+     <entry>flexlock-wait-done</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
      <entry>Probe that fires when a server process has been released from its
-      wait for an LWLock (it does not actually have the lock yet).
-      arg0 is the LWLock's ID.
+      wait for an FlexLock (it does not actually have the lock yet).
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
-     <entry>lwlock-condacquire</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock was successfully acquired when the
-      caller specified no waiting.
-      arg0 is the LWLock's ID.
+     <entry>flexlock-condacquire</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock was successfully acquired when
+      the caller specified no waiting.
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
-     <entry>lwlock-condacquire-fail</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock was not successfully acquired when
-      the caller specified no waiting.
-      arg0 is the LWLock's ID.
+     <entry>flexlock-condacquire-fail</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock was not successfully acquired
+      when the caller specified no waiting.
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
@@ -1813,11 +1813,11 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
      <entry>unsigned int</entry>
     </row>
     <row>
-     <entry>LWLockId</entry>
+     <entry>FlexLockId</entry>
      <entry>int</entry>
     </row>
     <row>
-     <entry>LWLockMode</entry>
+     <entry>FlexLockMode</entry>
      <entry>int</entry>
     </row>
     <row>
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index f7caa34..09d5862 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -151,7 +151,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(bool));		/* page_dirty[] */
 	sz += MAXALIGN(nslots * sizeof(int));		/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));		/* page_lru_count[] */
-	sz += MAXALIGN(nslots * sizeof(LWLockId));	/* buffer_locks[] */
+	sz += MAXALIGN(nslots * sizeof(FlexLockId));		/* buffer_locks[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -161,7 +161,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLockId ctllock, const char *subdir)
+			  FlexLockId ctllock, const char *subdir)
 {
 	SlruShared	shared;
 	bool		found;
@@ -202,8 +202,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		offset += MAXALIGN(nslots * sizeof(int));
 		shared->page_lru_count = (int *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(int));
-		shared->buffer_locks = (LWLockId *) (ptr + offset);
-		offset += MAXALIGN(nslots * sizeof(LWLockId));
+		shared->buffer_locks = (FlexLockId *) (ptr + offset);
+		offset += MAXALIGN(nslots * sizeof(FlexLockId));
 
 		if (nlsns > 0)
 		{
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 477982d..d5d1ee9 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -318,9 +318,9 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 	gxact->proc.roleId = owner;
 	gxact->proc.inCommit = false;
 	gxact->proc.vacuumFlags = 0;
-	gxact->proc.lwWaiting = false;
-	gxact->proc.lwExclusive = false;
-	gxact->proc.lwWaitLink = NULL;
+	gxact->proc.flWaitResult = 0;
+	gxact->proc.flWaitMode = 0;
+	gxact->proc.flWaitLink = NULL;
 	gxact->proc.waitLock = NULL;
 	gxact->proc.waitProcLock = NULL;
 	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index c151d3b..19b708c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2248,7 +2248,7 @@ AbortTransaction(void)
 	 * Releasing LW locks is critical since we might try to grab them again
 	 * while cleaning up!
 	 */
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 
 	/* Clean up buffer I/O and buffer context locks, too */
 	AbortBufferIO();
@@ -4138,7 +4138,7 @@ AbortSubTransaction(void)
 	 * FIXME This may be incorrect --- Are there some locks we should keep?
 	 * Buffer locks, for example?  I don't think so but I'm not sure.
 	 */
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 
 	AbortBufferIO();
 	UnlockBuffers();
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 6bf2421..9ceee91 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -562,13 +562,13 @@ bootstrap_signals(void)
  * Begin shutdown of an auxiliary process.	This is approximately the equivalent
  * of ShutdownPostgres() in postinit.c.  We can't run transactions in an
  * auxiliary process, so most of the work of AbortTransaction() is not needed,
- * but we do need to make sure we've released any LWLocks we are holding.
+ * but we do need to make sure we've released any flex locks we are holding.
  * (This is only critical during an error exit.)
  */
 static void
 ShutdownAuxiliaryProcess(int code, Datum arg)
 {
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index cacedab..f33f573 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -176,9 +176,10 @@ BackgroundWriterMain(void)
 		/*
 		 * These operations are really just a minimal subset of
 		 * AbortTransaction().	We don't have very many resources to worry
-		 * about in bgwriter, but we do have LWLocks, buffers, and temp files.
+		 * about in bgwriter, but we do have flex locks, buffers, and temp
+		 * files.
 		 */
-		LWLockReleaseAll();
+		FlexLockReleaseAll();
 		AbortBufferIO();
 		UnlockBuffers();
 		/* buffer pins are released here: */
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e9ae1e8..2f1e8b3 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -281,9 +281,10 @@ CheckpointerMain(void)
 		/*
 		 * These operations are really just a minimal subset of
 		 * AbortTransaction().	We don't have very many resources to worry
-		 * about in checkpointer, but we do have LWLocks, buffers, and temp files.
+		 * about in checkpointer, but we do have flex locks, buffers, and temp
+		 * files.
 		 */
-		LWLockReleaseAll();
+		FlexLockReleaseAll();
 		AbortBufferIO();
 		UnlockBuffers();
 		/* buffer pins are released here: */
@@ -1109,7 +1110,7 @@ CompactCheckpointerRequestQueue()
 	bool	   *skip_slot;
 
 	/* must hold BgWriterCommLock in exclusive mode */
-	Assert(LWLockHeldByMe(BgWriterCommLock));
+	Assert(FlexLockHeldByMe(BgWriterCommLock));
 
 	/* Initialize temporary hash table */
 	MemSet(&ctl, 0, sizeof(ctl));
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6758083..14b4368 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -109,6 +109,7 @@
 #include "postmaster/syslogger.h"
 #include "replication/walsender.h"
 #include "storage/fd.h"
+#include "storage/flexlock_internals.h"
 #include "storage/ipc.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
@@ -404,8 +405,6 @@ typedef struct
 typedef int InheritableSocket;
 #endif
 
-typedef struct LWLock LWLock;	/* ugly kluge */
-
 /*
  * Structure contains all variables passed to exec:ed backends
  */
@@ -426,7 +425,7 @@ typedef struct
 	slock_t    *ShmemLock;
 	VariableCache ShmemVariableCache;
 	Backend    *ShmemBackendArray;
-	LWLock	   *LWLockArray;
+	FlexLock   *FlexLockArray;
 	slock_t    *ProcStructLock;
 	PROC_HDR   *ProcGlobal;
 	PGPROC	   *AuxiliaryProcs;
@@ -4675,7 +4674,6 @@ MaxLivePostmasterChildren(void)
  * functions
  */
 extern slock_t *ShmemLock;
-extern LWLock *LWLockArray;
 extern slock_t *ProcStructLock;
 extern PGPROC *AuxiliaryProcs;
 extern PMSignalData *PMSignalState;
@@ -4720,7 +4718,7 @@ save_backend_variables(BackendParameters *param, Port *port,
 	param->ShmemVariableCache = ShmemVariableCache;
 	param->ShmemBackendArray = ShmemBackendArray;
 
-	param->LWLockArray = LWLockArray;
+	param->FlexLockArray = FlexLockArray;
 	param->ProcStructLock = ProcStructLock;
 	param->ProcGlobal = ProcGlobal;
 	param->AuxiliaryProcs = AuxiliaryProcs;
@@ -4943,7 +4941,7 @@ restore_backend_variables(BackendParameters *param, Port *port)
 	ShmemVariableCache = param->ShmemVariableCache;
 	ShmemBackendArray = param->ShmemBackendArray;
 
-	LWLockArray = param->LWLockArray;
+	FlexLockArray = param->FlexLockArray;
 	ProcStructLock = param->ProcStructLock;
 	ProcGlobal = param->ProcGlobal;
 	AuxiliaryProcs = param->AuxiliaryProcs;
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 157728e..587443d 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -167,9 +167,9 @@ WalWriterMain(void)
 		/*
 		 * These operations are really just a minimal subset of
 		 * AbortTransaction().	We don't have very many resources to worry
-		 * about in walwriter, but we do have LWLocks, and perhaps buffers?
+		 * about in walwriter, but we do have flex locks, and perhaps buffers?
 		 */
-		LWLockReleaseAll();
+		FlexLockReleaseAll();
 		AbortBufferIO();
 		UnlockBuffers();
 		/* buffer pins are released here: */
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e59af33..73b4cfb 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -141,7 +141,7 @@ PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
 	{
 		BufferTag	newTag;		/* identity of requested block */
 		uint32		newHash;	/* hash value for newTag */
-		LWLockId	newPartitionLock;	/* buffer partition lock for it */
+		FlexLockId	newPartitionLock;	/* buffer partition lock for it */
 		int			buf_id;
 
 		/* create a tag so we can lookup the buffer */
@@ -512,10 +512,10 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 {
 	BufferTag	newTag;			/* identity of requested block */
 	uint32		newHash;		/* hash value for newTag */
-	LWLockId	newPartitionLock;		/* buffer partition lock for it */
+	FlexLockId	newPartitionLock;		/* buffer partition lock for it */
 	BufferTag	oldTag;			/* previous identity of selected buffer */
 	uint32		oldHash;		/* hash value for oldTag */
-	LWLockId	oldPartitionLock;		/* buffer partition lock for it */
+	FlexLockId	oldPartitionLock;		/* buffer partition lock for it */
 	BufFlags	oldFlags;
 	int			buf_id;
 	volatile BufferDesc *buf;
@@ -855,7 +855,7 @@ InvalidateBuffer(volatile BufferDesc *buf)
 {
 	BufferTag	oldTag;
 	uint32		oldHash;		/* hash value for oldTag */
-	LWLockId	oldPartitionLock;		/* buffer partition lock for it */
+	FlexLockId	oldPartitionLock;		/* buffer partition lock for it */
 	BufFlags	oldFlags;
 
 	/* Save the original buffer tag before dropping the spinlock */
@@ -965,7 +965,7 @@ MarkBufferDirty(Buffer buffer)
 
 	Assert(PrivateRefCount[buffer - 1] > 0);
 	/* unfortunately we can't check if the lock is held exclusively */
-	Assert(LWLockHeldByMe(bufHdr->content_lock));
+	Assert(FlexLockHeldByMe(bufHdr->content_lock));
 
 	LockBufHdr(bufHdr);
 
@@ -1134,8 +1134,8 @@ UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
 	if (PrivateRefCount[b] == 0)
 	{
 		/* I'd better not still hold any locks on the buffer */
-		Assert(!LWLockHeldByMe(buf->content_lock));
-		Assert(!LWLockHeldByMe(buf->io_in_progress_lock));
+		Assert(!FlexLockHeldByMe(buf->content_lock));
+		Assert(!FlexLockHeldByMe(buf->io_in_progress_lock));
 
 		LockBufHdr(buf);
 
@@ -2310,7 +2310,7 @@ SetBufferCommitInfoNeedsSave(Buffer buffer)
 
 	Assert(PrivateRefCount[buffer - 1] > 0);
 	/* here, either share or exclusive lock is OK */
-	Assert(LWLockHeldByMe(bufHdr->content_lock));
+	Assert(FlexLockHeldByMe(bufHdr->content_lock));
 
 	/*
 	 * This routine might get called many times on the same page, if we are
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 56c0bd8..02ee8d8 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -113,7 +113,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, SUBTRANSShmemSize());
 		size = add_size(size, TwoPhaseShmemSize());
 		size = add_size(size, MultiXactShmemSize());
-		size = add_size(size, LWLockShmemSize());
+		size = add_size(size, FlexLockShmemSize());
 		size = add_size(size, ProcArrayShmemSize());
 		size = add_size(size, BackendStatusShmemSize());
 		size = add_size(size, SInvalShmemSize());
@@ -179,7 +179,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	 * needed for InitShmemIndex.
 	 */
 	if (!IsUnderPostmaster)
-		CreateLWLocks();
+		CreateFlexLocks();
 
 	/*
 	 * Set up shmem.c index hashtable
diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index e12a854..3730e51 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/storage/lmgr
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o predicate.o
+OBJS = flexlock.o lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o \
+	predicate.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/storage/lmgr/flexlock.c b/src/backend/storage/lmgr/flexlock.c
new file mode 100644
index 0000000..7f657b3
--- /dev/null
+++ b/src/backend/storage/lmgr/flexlock.c
@@ -0,0 +1,353 @@
+/*-------------------------------------------------------------------------
+ *
+ * flexlock.c
+ *	  Low-level routines for managing flex locks.
+ *
+ * Flex locks are intended primarily to provide mutual exclusion of access
+ * to shared-memory data structures.  Most, but not all, flex locks are
+ * lightweight locks (LWLocks).  This file contains support routines that
+ * are used for all types of flex locks, including lwlocks.  User-level
+ * locking should be done with the full lock manager --- which depends on
+ * LWLocks to protect its shared state.
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/lmgr/flexlock.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "access/clog.h"
+#include "access/multixact.h"
+#include "access/subtrans.h"
+#include "commands/async.h"
+#include "storage/flexlock_internals.h"
+#include "storage/lwlock.h"
+#include "storage/predicate.h"
+#include "storage/proc.h"
+#include "storage/spin.h"
+#include "utils/elog.h"
+
+/*
+ * We use this structure to keep track of flex locks held, for release
+ * during error recovery.  The maximum size could be determined at runtime
+ * if necessary, but it seems unlikely that more than a few locks could
+ * ever be held simultaneously.
+ */
+#define MAX_SIMUL_FLEXLOCKS	100
+
+int	num_held_flexlocks = 0;
+FlexLockId held_flexlocks[MAX_SIMUL_FLEXLOCKS];
+
+static int	lock_addin_request = 0;
+static bool lock_addin_request_allowed = true;
+
+#ifdef LOCK_DEBUG
+bool		Trace_flexlocks = false;
+#endif
+
+/*
+ * This points to the array of FlexLocks in shared memory.  Backends inherit
+ * the pointer by fork from the postmaster (except in the EXEC_BACKEND case,
+ * where we have special measures to pass it down).
+ */
+FlexLockPadded *FlexLockArray = NULL;
+
+/* We use the ShmemLock spinlock to protect LWLockAssign */
+extern slock_t *ShmemLock;
+
+static void FlexLockInit(FlexLock *flex, char locktype);
+
+/*
+ * Compute number of FlexLocks to allocate.
+ */
+int
+NumFlexLocks(void)
+{
+	int			numLocks;
+
+	/*
+	 * Possibly this logic should be spread out among the affected modules,
+	 * the same way that shmem space estimation is done.  But for now, there
+	 * are few enough users of FlexLocks that we can get away with just keeping
+	 * the knowledge here.
+	 */
+
+	/* Predefined FlexLocks */
+	numLocks = (int) NumFixedFlexLocks;
+
+	/* bufmgr.c needs two for each shared buffer */
+	numLocks += 2 * NBuffers;
+
+	/* proc.c needs one for each backend or auxiliary process */
+	numLocks += MaxBackends + NUM_AUXILIARY_PROCS;
+
+	/* clog.c needs one per CLOG buffer */
+	numLocks += NUM_CLOG_BUFFERS;
+
+	/* subtrans.c needs one per SubTrans buffer */
+	numLocks += NUM_SUBTRANS_BUFFERS;
+
+	/* multixact.c needs two SLRU areas */
+	numLocks += NUM_MXACTOFFSET_BUFFERS + NUM_MXACTMEMBER_BUFFERS;
+
+	/* async.c needs one per Async buffer */
+	numLocks += NUM_ASYNC_BUFFERS;
+
+	/* predicate.c needs one per old serializable xid buffer */
+	numLocks += NUM_OLDSERXID_BUFFERS;
+
+	/*
+	 * Add any requested by loadable modules; for backwards-compatibility
+	 * reasons, allocate at least NUM_USER_DEFINED_FLEXLOCKS of them even if
+	 * there are no explicit requests.
+	 */
+	lock_addin_request_allowed = false;
+	numLocks += Max(lock_addin_request, NUM_USER_DEFINED_FLEXLOCKS);
+
+	return numLocks;
+}
+
+
+/*
+ * RequestAddinFlexLocks
+ *		Request that extra FlexLocks be allocated for use by
+ *		a loadable module.
+ *
+ * This is only useful if called from the _PG_init hook of a library that
+ * is loaded into the postmaster via shared_preload_libraries.	Once
+ * shared memory has been allocated, calls will be ignored.  (We could
+ * raise an error, but it seems better to make it a no-op, so that
+ * libraries containing such calls can be reloaded if needed.)
+ */
+void
+RequestAddinFlexLocks(int n)
+{
+	if (IsUnderPostmaster || !lock_addin_request_allowed)
+		return;					/* too late */
+	lock_addin_request += n;
+}
+
+
+/*
+ * Compute shmem space needed for FlexLocks.
+ */
+Size
+FlexLockShmemSize(void)
+{
+	Size		size;
+	int			numLocks = NumFlexLocks();
+
+	/* Space for the FlexLock array. */
+	size = mul_size(numLocks, FLEX_LOCK_BYTES);
+
+	/* Space for dynamic allocation counter, plus room for alignment. */
+	size = add_size(size, 2 * sizeof(int) + FLEX_LOCK_BYTES);
+
+	return size;
+}
+
+/*
+ * Allocate shmem space for FlexLocks and initialize the locks.
+ */
+void
+CreateFlexLocks(void)
+{
+	int			numLocks = NumFlexLocks();
+	Size		spaceLocks = FlexLockShmemSize();
+	FlexLockPadded *lock;
+	int		   *FlexLockCounter;
+	char	   *ptr;
+	int			id;
+
+	/* Allocate and zero space */
+	ptr = (char *) ShmemAlloc(spaceLocks);
+	memset(ptr, 0, spaceLocks);
+
+	/* Leave room for dynamic allocation counter */
+	ptr += 2 * sizeof(int);
+
+	/* Ensure desired alignment of FlexLock array */
+	ptr += FLEX_LOCK_BYTES - ((uintptr_t) ptr) % FLEX_LOCK_BYTES;
+
+	FlexLockArray = (FlexLockPadded *) ptr;
+
+	/* All of the "fixed" FlexLocks are LWLocks. */
+	for (id = 0, lock = FlexLockArray; id < NumFixedFlexLocks; id++, lock++)
+		FlexLockInit(&lock->flex, FLEXLOCK_TYPE_LWLOCK);
+
+	/*
+	 * Initialize the dynamic-allocation counter, which is stored just before
+	 * the first FlexLock.
+	 */
+	FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+	FlexLockCounter[0] = (int) NumFixedFlexLocks;
+	FlexLockCounter[1] = numLocks;
+}
+
+/*
+ * FlexLockAssign - assign a dynamically-allocated FlexLock number
+ *
+ * We interlock this using the same spinlock that is used to protect
+ * ShmemAlloc().  Interlocking is not really necessary during postmaster
+ * startup, but it is needed if any user-defined code tries to allocate
+ * LWLocks after startup.
+ */
+FlexLockId
+FlexLockAssign(char locktype)
+{
+	FlexLockId	result;
+
+	/* use volatile pointer to prevent code rearrangement */
+	volatile int *FlexLockCounter;
+
+	FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+	SpinLockAcquire(ShmemLock);
+	if (FlexLockCounter[0] >= FlexLockCounter[1])
+	{
+		SpinLockRelease(ShmemLock);
+		elog(ERROR, "no more FlexLockIds available");
+	}
+	result = (FlexLockId) (FlexLockCounter[0]++);
+	SpinLockRelease(ShmemLock);
+
+	FlexLockInit(&FlexLockArray[result].flex, locktype);
+
+	return result;
+}
+
+/*
+ * Initialize a FlexLock.
+ */
+static void
+FlexLockInit(FlexLock *flex, char locktype)
+{
+	SpinLockInit(&flex->mutex);
+	flex->releaseOK = true;
+	flex->locktype = locktype;
+	/*
+	 * We might need to think a little harder about what should happen here
+	 * if some future type of FlexLock requires more initialization than this.
+	 * For now, this will suffice.
+	 */
+}
+
+/*
+ * Remove lock from list of locks held.  Usually, but not always, it will
+ * be the latest-acquired lock; so search array backwards.
+ */
+void
+FlexLockRemember(FlexLockId id)
+{
+	if (num_held_flexlocks >= MAX_SIMUL_FLEXLOCKS)
+		elog(PANIC, "too many FlexLocks taken");
+	held_flexlocks[num_held_flexlocks++] = id;
+}
+
+/*
+ * Remove lock from list of locks held.  Usually, but not always, it will
+ * be the latest-acquired lock; so search array backwards.
+ */
+void
+FlexLockForget(FlexLockId id)
+{
+	int			i;
+
+	for (i = num_held_flexlocks; --i >= 0;)
+	{
+		if (id == held_flexlocks[i])
+			break;
+	}
+	if (i < 0)
+		elog(ERROR, "lock %d is not held", (int) id);
+	num_held_flexlocks--;
+	for (; i < num_held_flexlocks; i++)
+		held_flexlocks[i] = held_flexlocks[i + 1];
+}
+
+/*
+ * FlexLockWait - wait until awakened
+ *
+ * Since we share the process wait semaphore with the regular lock manager
+ * and ProcWaitForSignal, and we may need to acquire a FlexLock while one of
+ * those is pending, it is possible that we get awakened for a reason other
+ * than being signaled by a FlexLock release.  If so, loop back and wait again.
+ *
+ * Returns the number of "extra" waits absorbed so that, once we've gotten the
+ * FlexLock, we can re-increment the sema by the number of additional signals
+ * received, so that the lock manager or signal manager will see the received
+ * signal when it next waits.
+ */
+int
+FlexLockWait(FlexLockId id, int mode)
+{
+	int		extraWaits = 0;
+
+	FlexLockDebug("LWLockAcquire", id, "waiting");
+	TRACE_POSTGRESQL_FLEXLOCK_WAIT_START(id, mode);
+
+	for (;;)
+   	{
+		/* "false" means cannot accept cancel/die interrupt here. */
+		PGSemaphoreLock(&MyProc->sem, false);
+		/*
+		 * FLEXTODO: I think we should return this, instead of ignoring it.
+		 * Any non-zero value means "wake up".
+		 */
+		if (MyProc->flWaitResult)
+			break;
+		extraWaits++;
+   	}
+
+	TRACE_POSTGRESQL_FLEXLOCK_WAIT_DONE(id, mode);
+	FlexLockDebug("LWLockAcquire", id, "awakened");
+
+	return extraWaits;
+}
+
+/*
+ * FlexLockReleaseAll - release all currently-held locks
+ *
+ * Used to clean up after ereport(ERROR). An important difference between this
+ * function and retail LWLockRelease calls is that InterruptHoldoffCount is
+ * unchanged by this operation.  This is necessary since InterruptHoldoffCount
+ * has been set to an appropriate level earlier in error recovery. We could
+ * decrement it below zero if we allow it to drop for each released lock!
+ */
+void
+FlexLockReleaseAll(void)
+{
+	while (num_held_flexlocks > 0)
+	{
+		HOLD_INTERRUPTS();		/* match the upcoming RESUME_INTERRUPTS */
+
+		/*
+		 * FLEXTODO: When we have multiple types of flex locks, this will
+		 * need to call the appropriate release function for each lock type.
+		 */
+		LWLockRelease(held_flexlocks[num_held_flexlocks - 1]);
+	}
+}
+
+/*
+ * FlexLockHeldByMe - test whether my process currently holds a lock
+ *
+ * This is meant as debug support only.  We do not consider the lock mode.
+ */
+bool
+FlexLockHeldByMe(FlexLockId id)
+{
+	int			i;
+
+	for (i = 0; i < num_held_flexlocks; i++)
+	{
+		if (held_flexlocks[i] == id)
+			return true;
+	}
+	return false;
+}
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 905502f..adc5fd9 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -591,7 +591,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	bool		found;
 	ResourceOwner owner;
 	uint32		hashcode;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	int			status;
 	bool		log_lock = false;
 
@@ -1546,7 +1546,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	LOCALLOCK  *locallock;
 	LOCK	   *lock;
 	PROCLOCK   *proclock;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	bool		wakeupNeeded;
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
@@ -1912,7 +1912,7 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 	 */
 	for (partition = 0; partition < NUM_LOCK_PARTITIONS; partition++)
 	{
-		LWLockId	partitionLock = FirstLockMgrLock + partition;
+		FlexLockId	partitionLock = FirstLockMgrLock + partition;
 		SHM_QUEUE  *procLocks = &(MyProc->myProcLocks[partition]);
 
 		proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks,
@@ -2197,7 +2197,7 @@ static bool
 FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag,
 					  uint32 hashcode)
 {
-	LWLockId		partitionLock = LockHashPartitionLock(hashcode);
+	FlexLockId		partitionLock = LockHashPartitionLock(hashcode);
 	Oid				relid = locktag->locktag_field2;
 	uint32			i;
 
@@ -2281,7 +2281,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	LockMethod		lockMethodTable = LockMethods[DEFAULT_LOCKMETHOD];
 	LOCKTAG		   *locktag = &locallock->tag.lock;
 	PROCLOCK	   *proclock = NULL;
-	LWLockId		partitionLock = LockHashPartitionLock(locallock->hashcode);
+	FlexLockId		partitionLock = LockHashPartitionLock(locallock->hashcode);
 	Oid				relid = locktag->locktag_field2;
 	uint32			f;
 
@@ -2382,7 +2382,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
 	SHM_QUEUE  *procLocks;
 	PROCLOCK   *proclock;
 	uint32		hashcode;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	int			count = 0;
 	int			fast_count = 0;
 
@@ -2593,7 +2593,7 @@ LockRefindAndRelease(LockMethod lockMethodTable, PGPROC *proc,
 	PROCLOCKTAG proclocktag;
 	uint32		hashcode;
 	uint32		proclock_hashcode;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	bool		wakeupNeeded;
 
 	hashcode = LockTagHashCode(locktag);
@@ -2827,7 +2827,7 @@ PostPrepare_Locks(TransactionId xid)
 	 */
 	for (partition = 0; partition < NUM_LOCK_PARTITIONS; partition++)
 	{
-		LWLockId	partitionLock = FirstLockMgrLock + partition;
+		FlexLockId	partitionLock = FirstLockMgrLock + partition;
 		SHM_QUEUE  *procLocks = &(MyProc->myProcLocks[partition]);
 
 		proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks,
@@ -3342,7 +3342,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 	uint32		hashcode;
 	uint32		proclock_hashcode;
 	int			partition;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	LockMethod	lockMethodTable;
 
 	Assert(len == sizeof(TwoPhaseLockRecord));
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 079eb29..e3cebb2 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -21,74 +21,23 @@
  */
 #include "postgres.h"
 
-#include "access/clog.h"
-#include "access/multixact.h"
-#include "access/subtrans.h"
-#include "commands/async.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "storage/flexlock_internals.h"
 #include "storage/ipc.h"
-#include "storage/predicate.h"
 #include "storage/proc.h"
 #include "storage/spin.h"
 
-
-/* We use the ShmemLock spinlock to protect LWLockAssign */
-extern slock_t *ShmemLock;
-
-
 typedef struct LWLock
 {
-	slock_t		mutex;			/* Protects LWLock and queue of PGPROCs */
-	bool		releaseOK;		/* T if ok to release waiters */
+	FlexLock	flex;			/* common FlexLock infrastructure */
 	char		exclusive;		/* # of exclusive holders (0 or 1) */
 	int			shared;			/* # of shared holders (0..MaxBackends) */
-	PGPROC	   *head;			/* head of list of waiting PGPROCs */
-	PGPROC	   *tail;			/* tail of list of waiting PGPROCs */
-	/* tail is undefined when head is NULL */
 } LWLock;
 
-/*
- * All the LWLock structs are allocated as an array in shared memory.
- * (LWLockIds are indexes into the array.)	We force the array stride to
- * be a power of 2, which saves a few cycles in indexing, but more
- * importantly also ensures that individual LWLocks don't cross cache line
- * boundaries.	This reduces cache contention problems, especially on AMD
- * Opterons.  (Of course, we have to also ensure that the array start
- * address is suitably aligned.)
- *
- * LWLock is between 16 and 32 bytes on all known platforms, so these two
- * cases are sufficient.
- */
-#define LWLOCK_PADDED_SIZE	(sizeof(LWLock) <= 16 ? 16 : 32)
-
-typedef union LWLockPadded
-{
-	LWLock		lock;
-	char		pad[LWLOCK_PADDED_SIZE];
-} LWLockPadded;
-
-/*
- * This points to the array of LWLocks in shared memory.  Backends inherit
- * the pointer by fork from the postmaster (except in the EXEC_BACKEND case,
- * where we have special measures to pass it down).
- */
-NON_EXEC_STATIC LWLockPadded *LWLockArray = NULL;
-
-
-/*
- * We use this structure to keep track of locked LWLocks for release
- * during error recovery.  The maximum size could be determined at runtime
- * if necessary, but it seems unlikely that more than a few locks could
- * ever be held simultaneously.
- */
-#define MAX_SIMUL_LWLOCKS	100
-
-static int	num_held_lwlocks = 0;
-static LWLockId held_lwlocks[MAX_SIMUL_LWLOCKS];
-
-static int	lock_addin_request = 0;
-static bool lock_addin_request_allowed = true;
+#define	LWLockPointer(lockid) \
+	(AssertMacro(FlexLockArray[lockid].flex.locktype == FLEXLOCK_TYPE_LWLOCK), \
+	 (volatile LWLock *) &FlexLockArray[lockid])
 
 #ifdef LWLOCK_STATS
 static int	counts_for_pid = 0;
@@ -98,27 +47,17 @@ static int *block_counts;
 #endif
 
 #ifdef LOCK_DEBUG
-bool		Trace_lwlocks = false;
-
 inline static void
-PRINT_LWDEBUG(const char *where, LWLockId lockid, const volatile LWLock *lock)
+PRINT_LWDEBUG(const char *where, FlexLockId lockid, const volatile LWLock *lock)
 {
-	if (Trace_lwlocks)
+	if (Trace_flexlocks)
 		elog(LOG, "%s(%d): excl %d shared %d head %p rOK %d",
 			 where, (int) lockid,
-			 (int) lock->exclusive, lock->shared, lock->head,
-			 (int) lock->releaseOK);
-}
-
-inline static void
-LOG_LWDEBUG(const char *where, LWLockId lockid, const char *msg)
-{
-	if (Trace_lwlocks)
-		elog(LOG, "%s(%d): %s", where, (int) lockid, msg);
+			 (int) lock->exclusive, lock->shared, lock->flex.head,
+			 (int) lock->flex.releaseOK);
 }
 #else							/* not LOCK_DEBUG */
 #define PRINT_LWDEBUG(a,b,c)
-#define LOG_LWDEBUG(a,b,c)
 #endif   /* LOCK_DEBUG */
 
 #ifdef LWLOCK_STATS
@@ -127,8 +66,8 @@ static void
 print_lwlock_stats(int code, Datum arg)
 {
 	int			i;
-	int		   *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	int			numLocks = LWLockCounter[1];
+	int		   *FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+	int			numLocks = FlexLockCounter[1];
 
 	/* Grab an LWLock to keep different backends from mixing reports */
 	LWLockAcquire(0, LW_EXCLUSIVE);
@@ -145,173 +84,15 @@ print_lwlock_stats(int code, Datum arg)
 }
 #endif   /* LWLOCK_STATS */
 
-
-/*
- * Compute number of LWLocks to allocate.
- */
-int
-NumLWLocks(void)
-{
-	int			numLocks;
-
-	/*
-	 * Possibly this logic should be spread out among the affected modules,
-	 * the same way that shmem space estimation is done.  But for now, there
-	 * are few enough users of LWLocks that we can get away with just keeping
-	 * the knowledge here.
-	 */
-
-	/* Predefined LWLocks */
-	numLocks = (int) NumFixedLWLocks;
-
-	/* bufmgr.c needs two for each shared buffer */
-	numLocks += 2 * NBuffers;
-
-	/* proc.c needs one for each backend or auxiliary process */
-	numLocks += MaxBackends + NUM_AUXILIARY_PROCS;
-
-	/* clog.c needs one per CLOG buffer */
-	numLocks += NUM_CLOG_BUFFERS;
-
-	/* subtrans.c needs one per SubTrans buffer */
-	numLocks += NUM_SUBTRANS_BUFFERS;
-
-	/* multixact.c needs two SLRU areas */
-	numLocks += NUM_MXACTOFFSET_BUFFERS + NUM_MXACTMEMBER_BUFFERS;
-
-	/* async.c needs one per Async buffer */
-	numLocks += NUM_ASYNC_BUFFERS;
-
-	/* predicate.c needs one per old serializable xid buffer */
-	numLocks += NUM_OLDSERXID_BUFFERS;
-
-	/*
-	 * Add any requested by loadable modules; for backwards-compatibility
-	 * reasons, allocate at least NUM_USER_DEFINED_LWLOCKS of them even if
-	 * there are no explicit requests.
-	 */
-	lock_addin_request_allowed = false;
-	numLocks += Max(lock_addin_request, NUM_USER_DEFINED_LWLOCKS);
-
-	return numLocks;
-}
-
-
-/*
- * RequestAddinLWLocks
- *		Request that extra LWLocks be allocated for use by
- *		a loadable module.
- *
- * This is only useful if called from the _PG_init hook of a library that
- * is loaded into the postmaster via shared_preload_libraries.	Once
- * shared memory has been allocated, calls will be ignored.  (We could
- * raise an error, but it seems better to make it a no-op, so that
- * libraries containing such calls can be reloaded if needed.)
- */
-void
-RequestAddinLWLocks(int n)
-{
-	if (IsUnderPostmaster || !lock_addin_request_allowed)
-		return;					/* too late */
-	lock_addin_request += n;
-}
-
-
-/*
- * Compute shmem space needed for LWLocks.
- */
-Size
-LWLockShmemSize(void)
-{
-	Size		size;
-	int			numLocks = NumLWLocks();
-
-	/* Space for the LWLock array. */
-	size = mul_size(numLocks, sizeof(LWLockPadded));
-
-	/* Space for dynamic allocation counter, plus room for alignment. */
-	size = add_size(size, 2 * sizeof(int) + LWLOCK_PADDED_SIZE);
-
-	return size;
-}
-
-
-/*
- * Allocate shmem space for LWLocks and initialize the locks.
- */
-void
-CreateLWLocks(void)
-{
-	int			numLocks = NumLWLocks();
-	Size		spaceLocks = LWLockShmemSize();
-	LWLockPadded *lock;
-	int		   *LWLockCounter;
-	char	   *ptr;
-	int			id;
-
-	/* Allocate space */
-	ptr = (char *) ShmemAlloc(spaceLocks);
-
-	/* Leave room for dynamic allocation counter */
-	ptr += 2 * sizeof(int);
-
-	/* Ensure desired alignment of LWLock array */
-	ptr += LWLOCK_PADDED_SIZE - ((uintptr_t) ptr) % LWLOCK_PADDED_SIZE;
-
-	LWLockArray = (LWLockPadded *) ptr;
-
-	/*
-	 * Initialize all LWLocks to "unlocked" state
-	 */
-	for (id = 0, lock = LWLockArray; id < numLocks; id++, lock++)
-	{
-		SpinLockInit(&lock->lock.mutex);
-		lock->lock.releaseOK = true;
-		lock->lock.exclusive = 0;
-		lock->lock.shared = 0;
-		lock->lock.head = NULL;
-		lock->lock.tail = NULL;
-	}
-
-	/*
-	 * Initialize the dynamic-allocation counter, which is stored just before
-	 * the first LWLock.
-	 */
-	LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	LWLockCounter[0] = (int) NumFixedLWLocks;
-	LWLockCounter[1] = numLocks;
-}
-
-
 /*
- * LWLockAssign - assign a dynamically-allocated LWLock number
- *
- * We interlock this using the same spinlock that is used to protect
- * ShmemAlloc().  Interlocking is not really necessary during postmaster
- * startup, but it is needed if any user-defined code tries to allocate
- * LWLocks after startup.
+ * LWLockAssign - initialize a new lwlock and return its ID
  */
-LWLockId
+FlexLockId
 LWLockAssign(void)
 {
-	LWLockId	result;
-
-	/* use volatile pointer to prevent code rearrangement */
-	volatile int *LWLockCounter;
-
-	LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	SpinLockAcquire(ShmemLock);
-	if (LWLockCounter[0] >= LWLockCounter[1])
-	{
-		SpinLockRelease(ShmemLock);
-		elog(ERROR, "no more LWLockIds available");
-	}
-	result = (LWLockId) (LWLockCounter[0]++);
-	SpinLockRelease(ShmemLock);
-	return result;
+	return FlexLockAssign(FLEXLOCK_TYPE_LWLOCK);
 }
 
-
 /*
  * LWLockAcquire - acquire a lightweight lock in the specified mode
  *
@@ -320,9 +101,9 @@ LWLockAssign(void)
  * Side effect: cancel/die interrupts are held off until lock release.
  */
 void
-LWLockAcquire(LWLockId lockid, LWLockMode mode)
+LWLockAcquire(FlexLockId lockid, LWLockMode mode)
 {
-	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	volatile LWLock *lock = LWLockPointer(lockid);
 	PGPROC	   *proc = MyProc;
 	bool		retry = false;
 	int			extraWaits = 0;
@@ -333,8 +114,8 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 	/* Set up local count state first time through in a given process */
 	if (counts_for_pid != MyProcPid)
 	{
-		int		   *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-		int			numLocks = LWLockCounter[1];
+		int		   *FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+		int			numLocks = FlexLockCounter[1];
 
 		sh_acquire_counts = calloc(numLocks, sizeof(int));
 		ex_acquire_counts = calloc(numLocks, sizeof(int));
@@ -356,10 +137,6 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 	 */
 	Assert(!(proc == NULL && IsUnderPostmaster));
 
-	/* Ensure we will have room to remember the lock */
-	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
-		elog(ERROR, "too many LWLocks taken");
-
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -388,11 +165,11 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 		bool		mustwait;
 
 		/* Acquire mutex.  Time spent holding mutex should be short! */
-		SpinLockAcquire(&lock->mutex);
+		SpinLockAcquire(&lock->flex.mutex);
 
 		/* If retrying, allow LWLockRelease to release waiters again */
 		if (retry)
-			lock->releaseOK = true;
+			lock->flex.releaseOK = true;
 
 		/* If I can get the lock, do so quickly. */
 		if (mode == LW_EXCLUSIVE)
@@ -419,72 +196,30 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 		if (!mustwait)
 			break;				/* got the lock */
 
-		/*
-		 * Add myself to wait queue.
-		 *
-		 * If we don't have a PGPROC structure, there's no way to wait. This
-		 * should never occur, since MyProc should only be null during shared
-		 * memory initialization.
-		 */
-		if (proc == NULL)
-			elog(PANIC, "cannot wait without a PGPROC structure");
-
-		proc->lwWaiting = true;
-		proc->lwExclusive = (mode == LW_EXCLUSIVE);
-		proc->lwWaitLink = NULL;
-		if (lock->head == NULL)
-			lock->head = proc;
-		else
-			lock->tail->lwWaitLink = proc;
-		lock->tail = proc;
+		/* Add myself to wait queue. */
+		FlexLockJoinWaitQueue(lock, (int) mode);
 
 		/* Can release the mutex now */
-		SpinLockRelease(&lock->mutex);
-
-		/*
-		 * Wait until awakened.
-		 *
-		 * Since we share the process wait semaphore with the regular lock
-		 * manager and ProcWaitForSignal, and we may need to acquire an LWLock
-		 * while one of those is pending, it is possible that we get awakened
-		 * for a reason other than being signaled by LWLockRelease. If so,
-		 * loop back and wait again.  Once we've gotten the LWLock,
-		 * re-increment the sema by the number of additional signals received,
-		 * so that the lock manager or signal manager will see the received
-		 * signal when it next waits.
-		 */
-		LOG_LWDEBUG("LWLockAcquire", lockid, "waiting");
+		SpinLockRelease(&lock->flex.mutex);
+
+		/* Wait until awakened. */
+		extraWaits += FlexLockWait(lockid, mode);
 
 #ifdef LWLOCK_STATS
 		block_counts[lockid]++;
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);
-
-		for (;;)
-		{
-			/* "false" means cannot accept cancel/die interrupt here. */
-			PGSemaphoreLock(&proc->sem, false);
-			if (!proc->lwWaiting)
-				break;
-			extraWaits++;
-		}
-
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode);
-
-		LOG_LWDEBUG("LWLockAcquire", lockid, "awakened");
-
 		/* Now loop back and try to acquire lock again. */
 		retry = true;
 	}
 
 	/* We are done updating shared state of the lock itself. */
-	SpinLockRelease(&lock->mutex);
+	SpinLockRelease(&lock->flex.mutex);
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(lockid, mode);
+	TRACE_POSTGRESQL_FLEXLOCK_ACQUIRE(lockid, mode);
 
 	/* Add lock to list of locks held by this backend */
-	held_lwlocks[num_held_lwlocks++] = lockid;
+	FlexLockRemember(lockid);
 
 	/*
 	 * Fix the process wait semaphore's count for any absorbed wakeups.
@@ -501,17 +236,13 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
  * If successful, cancel/die interrupts are held off until lock release.
  */
 bool
-LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
+LWLockConditionalAcquire(FlexLockId lockid, LWLockMode mode)
 {
-	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	volatile LWLock *lock = LWLockPointer(lockid);
 	bool		mustwait;
 
 	PRINT_LWDEBUG("LWLockConditionalAcquire", lockid, lock);
 
-	/* Ensure we will have room to remember the lock */
-	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
-		elog(ERROR, "too many LWLocks taken");
-
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -520,7 +251,7 @@ LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
 	HOLD_INTERRUPTS();
 
 	/* Acquire mutex.  Time spent holding mutex should be short! */
-	SpinLockAcquire(&lock->mutex);
+	SpinLockAcquire(&lock->flex.mutex);
 
 	/* If I can get the lock, do so quickly. */
 	if (mode == LW_EXCLUSIVE)
@@ -545,20 +276,20 @@ LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
 	}
 
 	/* We are done updating shared state of the lock itself. */
-	SpinLockRelease(&lock->mutex);
+	SpinLockRelease(&lock->flex.mutex);
 
 	if (mustwait)
 	{
 		/* Failed to get lock, so release interrupt holdoff */
 		RESUME_INTERRUPTS();
-		LOG_LWDEBUG("LWLockConditionalAcquire", lockid, "failed");
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(lockid, mode);
+		FlexLockDebug("LWLockConditionalAcquire", lockid, "failed");
+		TRACE_POSTGRESQL_FLEXLOCK_CONDACQUIRE_FAIL(lockid, mode);
 	}
 	else
 	{
 		/* Add lock to list of locks held by this backend */
-		held_lwlocks[num_held_lwlocks++] = lockid;
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(lockid, mode);
+		FlexLockRemember(lockid);
+		TRACE_POSTGRESQL_FLEXLOCK_CONDACQUIRE(lockid, mode);
 	}
 
 	return !mustwait;
@@ -568,32 +299,18 @@ LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
  * LWLockRelease - release a previously acquired lock
  */
 void
-LWLockRelease(LWLockId lockid)
+LWLockRelease(FlexLockId lockid)
 {
-	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	volatile LWLock *lock = LWLockPointer(lockid);
 	PGPROC	   *head;
 	PGPROC	   *proc;
-	int			i;
 
 	PRINT_LWDEBUG("LWLockRelease", lockid, lock);
 
-	/*
-	 * Remove lock from list of locks held.  Usually, but not always, it will
-	 * be the latest-acquired lock; so search array backwards.
-	 */
-	for (i = num_held_lwlocks; --i >= 0;)
-	{
-		if (lockid == held_lwlocks[i])
-			break;
-	}
-	if (i < 0)
-		elog(ERROR, "lock %d is not held", (int) lockid);
-	num_held_lwlocks--;
-	for (; i < num_held_lwlocks; i++)
-		held_lwlocks[i] = held_lwlocks[i + 1];
+	FlexLockForget(lockid);
 
 	/* Acquire mutex.  Time spent holding mutex should be short! */
-	SpinLockAcquire(&lock->mutex);
+	SpinLockAcquire(&lock->flex.mutex);
 
 	/* Release my hold on lock */
 	if (lock->exclusive > 0)
@@ -610,10 +327,10 @@ LWLockRelease(LWLockId lockid)
 	 * if someone has already awakened waiters that haven't yet acquired the
 	 * lock.
 	 */
-	head = lock->head;
+	head = lock->flex.head;
 	if (head != NULL)
 	{
-		if (lock->exclusive == 0 && lock->shared == 0 && lock->releaseOK)
+		if (lock->exclusive == 0 && lock->shared == 0 && lock->flex.releaseOK)
 		{
 			/*
 			 * Remove the to-be-awakened PGPROCs from the queue.  If the front
@@ -621,17 +338,17 @@ LWLockRelease(LWLockId lockid)
 			 * as many waiters as want shared access.
 			 */
 			proc = head;
-			if (!proc->lwExclusive)
+			if (proc->flWaitMode != LW_EXCLUSIVE)
 			{
-				while (proc->lwWaitLink != NULL &&
-					   !proc->lwWaitLink->lwExclusive)
-					proc = proc->lwWaitLink;
+				while (proc->flWaitLink != NULL &&
+					   proc->flWaitLink->flWaitMode != LW_EXCLUSIVE)
+					proc = proc->flWaitLink;
 			}
 			/* proc is now the last PGPROC to be released */
-			lock->head = proc->lwWaitLink;
-			proc->lwWaitLink = NULL;
+			lock->flex.head = proc->flWaitLink;
+			proc->flWaitLink = NULL;
 			/* prevent additional wakeups until retryer gets to run */
-			lock->releaseOK = false;
+			lock->flex.releaseOK = false;
 		}
 		else
 		{
@@ -641,20 +358,20 @@ LWLockRelease(LWLockId lockid)
 	}
 
 	/* We are done updating shared state of the lock itself. */
-	SpinLockRelease(&lock->mutex);
+	SpinLockRelease(&lock->flex.mutex);
 
-	TRACE_POSTGRESQL_LWLOCK_RELEASE(lockid);
+	TRACE_POSTGRESQL_FLEXLOCK_RELEASE(lockid);
 
 	/*
 	 * Awaken any waiters I removed from the queue.
 	 */
 	while (head != NULL)
 	{
-		LOG_LWDEBUG("LWLockRelease", lockid, "release waiter");
+		FlexLockDebug("LWLockRelease", lockid, "release waiter");
 		proc = head;
-		head = proc->lwWaitLink;
-		proc->lwWaitLink = NULL;
-		proc->lwWaiting = false;
+		head = proc->flWaitLink;
+		proc->flWaitLink = NULL;
+		proc->flWaitResult = 1;		/* any non-zero value will do */
 		PGSemaphoreUnlock(&proc->sem);
 	}
 
@@ -663,44 +380,3 @@ LWLockRelease(LWLockId lockid)
 	 */
 	RESUME_INTERRUPTS();
 }
-
-
-/*
- * LWLockReleaseAll - release all currently-held locks
- *
- * Used to clean up after ereport(ERROR). An important difference between this
- * function and retail LWLockRelease calls is that InterruptHoldoffCount is
- * unchanged by this operation.  This is necessary since InterruptHoldoffCount
- * has been set to an appropriate level earlier in error recovery. We could
- * decrement it below zero if we allow it to drop for each released lock!
- */
-void
-LWLockReleaseAll(void)
-{
-	while (num_held_lwlocks > 0)
-	{
-		HOLD_INTERRUPTS();		/* match the upcoming RESUME_INTERRUPTS */
-
-		LWLockRelease(held_lwlocks[num_held_lwlocks - 1]);
-	}
-}
-
-
-/*
- * LWLockHeldByMe - test whether my process currently holds a lock
- *
- * This is meant as debug support only.  We do not distinguish whether the
- * lock is held shared or exclusive.
- */
-bool
-LWLockHeldByMe(LWLockId lockid)
-{
-	int			i;
-
-	for (i = 0; i < num_held_lwlocks; i++)
-	{
-		if (held_lwlocks[i] == lockid)
-			return true;
-	}
-	return false;
-}
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 345f6f5..02ef963 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -239,7 +239,7 @@
 #define PredicateLockHashPartition(hashcode) \
 	((hashcode) % NUM_PREDICATELOCK_PARTITIONS)
 #define PredicateLockHashPartitionLock(hashcode) \
-	((LWLockId) (FirstPredicateLockMgrLock + PredicateLockHashPartition(hashcode)))
+	((FlexLockId) (FirstPredicateLockMgrLock + PredicateLockHashPartition(hashcode)))
 
 #define NPREDICATELOCKTARGETENTS() \
 	mul_size(max_predicate_locks_per_xact, add_size(MaxBackends, max_prepared_xacts))
@@ -1840,7 +1840,7 @@ PageIsPredicateLocked(Relation relation, BlockNumber blkno)
 {
 	PREDICATELOCKTARGETTAG targettag;
 	uint32		targettaghash;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	PREDICATELOCKTARGET *target;
 
 	SET_PREDICATELOCKTARGETTAG_PAGE(targettag,
@@ -1972,7 +1972,7 @@ RemoveScratchTarget(bool lockheld)
 {
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(FlexLockHeldByMe(SerializablePredicateLockListLock));
 
 	if (!lockheld)
 		LWLockAcquire(ScratchPartitionLock, LW_EXCLUSIVE);
@@ -1993,7 +1993,7 @@ RestoreScratchTarget(bool lockheld)
 {
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(FlexLockHeldByMe(SerializablePredicateLockListLock));
 
 	if (!lockheld)
 		LWLockAcquire(ScratchPartitionLock, LW_EXCLUSIVE);
@@ -2015,7 +2015,7 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
 {
 	PREDICATELOCKTARGET *rmtarget;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(FlexLockHeldByMe(SerializablePredicateLockListLock));
 
 	/* Can't remove it until no locks at this target. */
 	if (!SHMQueueEmpty(&target->predicateLocks))
@@ -2073,7 +2073,7 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 		if (TargetTagIsCoveredBy(oldtargettag, *newtargettag))
 		{
 			uint32		oldtargettaghash;
-			LWLockId	partitionLock;
+			FlexLockId	partitionLock;
 			PREDICATELOCK *rmpredlock;
 
 			oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
@@ -2285,7 +2285,7 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	PREDICATELOCKTARGET *target;
 	PREDICATELOCKTAG locktag;
 	PREDICATELOCK *lock;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	bool		found;
 
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
@@ -2518,8 +2518,8 @@ DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
 	PREDICATELOCK *nextpredlock;
 	bool		found;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
-	Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
+	Assert(FlexLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(FlexLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
 
 	predlock = (PREDICATELOCK *)
 		SHMQueueNext(&(target->predicateLocks),
@@ -2586,14 +2586,14 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 								  bool removeOld)
 {
 	uint32		oldtargettaghash;
-	LWLockId	oldpartitionLock;
+	FlexLockId	oldpartitionLock;
 	PREDICATELOCKTARGET *oldtarget;
 	uint32		newtargettaghash;
-	LWLockId	newpartitionLock;
+	FlexLockId	newpartitionLock;
 	bool		found;
 	bool		outOfShmem = false;
 
-	Assert(LWLockHeldByMe(SerializablePredicateLockListLock));
+	Assert(FlexLockHeldByMe(SerializablePredicateLockListLock));
 
 	oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
 	newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
@@ -3125,7 +3125,7 @@ SetNewSxactGlobalXmin(void)
 {
 	SERIALIZABLEXACT *sxact;
 
-	Assert(LWLockHeldByMe(SerializableXactHashLock));
+	Assert(FlexLockHeldByMe(SerializableXactHashLock));
 
 	PredXact->SxactGlobalXmin = InvalidTransactionId;
 	PredXact->SxactGlobalXminCount = 0;
@@ -3578,7 +3578,7 @@ ClearOldPredicateLocks(void)
 			PREDICATELOCKTARGET *target;
 			PREDICATELOCKTARGETTAG targettag;
 			uint32		targettaghash;
-			LWLockId	partitionLock;
+			FlexLockId	partitionLock;
 
 			tag = predlock->tag;
 			target = tag.myTarget;
@@ -3637,7 +3637,7 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 
 	Assert(sxact != NULL);
 	Assert(SxactIsRolledBack(sxact) || SxactIsCommitted(sxact));
-	Assert(LWLockHeldByMe(SerializableFinishedListLock));
+	Assert(FlexLockHeldByMe(SerializableFinishedListLock));
 
 	/*
 	 * First release all the predicate locks held by this xact (or transfer
@@ -3656,7 +3656,7 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 		PREDICATELOCKTARGET *target;
 		PREDICATELOCKTARGETTAG targettag;
 		uint32		targettaghash;
-		LWLockId	partitionLock;
+		FlexLockId	partitionLock;
 
 		nextpredlock = (PREDICATELOCK *)
 			SHMQueueNext(&(sxact->predicateLocks),
@@ -4034,7 +4034,7 @@ static void
 CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 {
 	uint32		targettaghash;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	PREDICATELOCKTARGET *target;
 	PREDICATELOCK *predlock;
 	PREDICATELOCK *mypredlock = NULL;
@@ -4427,7 +4427,7 @@ OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
 	bool		failure;
 	RWConflict	conflict;
 
-	Assert(LWLockHeldByMe(SerializableXactHashLock));
+	Assert(FlexLockHeldByMe(SerializableXactHashLock));
 
 	failure = false;
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index eda3a98..57da345 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -325,9 +325,9 @@ InitProcess(void)
 	/* NB -- autovac launcher intentionally does not set IS_AUTOVACUUM */
 	if (IsAutoVacuumWorkerProcess())
 		MyProc->vacuumFlags |= PROC_IS_AUTOVACUUM;
-	MyProc->lwWaiting = false;
-	MyProc->lwExclusive = false;
-	MyProc->lwWaitLink = NULL;
+	MyProc->flWaitResult = 0;
+	MyProc->flWaitMode = 0;
+	MyProc->flWaitLink = NULL;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
 #ifdef USE_ASSERT_CHECKING
@@ -479,9 +479,9 @@ InitAuxiliaryProcess(void)
 	MyProc->roleId = InvalidOid;
 	MyProc->inCommit = false;
 	MyProc->vacuumFlags = 0;
-	MyProc->lwWaiting = false;
-	MyProc->lwExclusive = false;
-	MyProc->lwWaitLink = NULL;
+	MyProc->flWaitMode = 0;
+	MyProc->flWaitResult = 0;
+	MyProc->flWaitLink = NULL;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
 #ifdef USE_ASSERT_CHECKING
@@ -607,7 +607,7 @@ IsWaitingForLock(void)
 void
 LockWaitCancel(void)
 {
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 
 	/* Nothing to do if we weren't waiting for a lock */
 	if (lockAwaited == NULL)
@@ -718,11 +718,11 @@ ProcKill(int code, Datum arg)
 #endif
 
 	/*
-	 * Release any LW locks I am holding.  There really shouldn't be any, but
-	 * it's cheap to check again before we cut the knees off the LWLock
+	 * Release any felx locks I am holding.  There really shouldn't be any, but
+	 * it's cheap to check again before we cut the knees off the flex lock
 	 * facility by releasing our PGPROC ...
 	 */
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 
 	/* Release ownership of the process's latch, too */
 	DisownLatch(&MyProc->procLatch);
@@ -779,8 +779,8 @@ AuxiliaryProcKill(int code, Datum arg)
 
 	Assert(MyProc == auxproc);
 
-	/* Release any LW locks I am holding (see notes above) */
-	LWLockReleaseAll();
+	/* Release any flex locks I am holding (see notes above) */
+	FlexLockReleaseAll();
 
 	/* Release ownership of the process's latch, too */
 	DisownLatch(&MyProc->procLatch);
@@ -865,7 +865,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 	LOCK	   *lock = locallock->lock;
 	PROCLOCK   *proclock = locallock->proclock;
 	uint32		hashcode = locallock->hashcode;
-	LWLockId	partitionLock = LockHashPartitionLock(hashcode);
+	FlexLockId	partitionLock = LockHashPartitionLock(hashcode);
 	PROC_QUEUE *waitQueue = &(lock->waitProcs);
 	LOCKMASK	myHeldLocks = MyProc->heldLocks;
 	bool		early_deadlock = false;
diff --git a/src/backend/utils/misc/check_guc b/src/backend/utils/misc/check_guc
index 293fb03..1a19e36 100755
--- a/src/backend/utils/misc/check_guc
+++ b/src/backend/utils/misc/check_guc
@@ -19,7 +19,7 @@
 INTENTIONALLY_NOT_INCLUDED="autocommit debug_deadlocks \
 is_superuser lc_collate lc_ctype lc_messages lc_monetary lc_numeric lc_time \
 pre_auth_delay role seed server_encoding server_version server_version_int \
-session_authorization trace_lock_oidmin trace_lock_table trace_locks trace_lwlocks \
+session_authorization trace_lock_oidmin trace_lock_table trace_locks trace_flexlocks \
 trace_notify trace_userlocks transaction_isolation transaction_read_only \
 zero_damaged_pages"
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index da7b6d4..52de233 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -59,6 +59,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/flexlock_internals.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
 #include "storage/predicate.h"
@@ -1071,12 +1072,12 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 	{
-		{"trace_lwlocks", PGC_SUSET, DEVELOPER_OPTIONS,
+		{"trace_flexlocks", PGC_SUSET, DEVELOPER_OPTIONS,
 			gettext_noop("No description available."),
 			NULL,
 			GUC_NOT_IN_SAMPLE
 		},
-		&Trace_lwlocks,
+		&Trace_flexlocks,
 		false,
 		NULL, NULL, NULL
 	},
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 71c5ab0..5b9cfe6 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -15,8 +15,8 @@
  * in probe definitions, as they cause compilation errors on Mac OS X 10.5.
  */
 #define LocalTransactionId unsigned int
-#define LWLockId int
-#define LWLockMode int
+#define FlexLockId int
+#define FlexLockMode int
 #define LOCKMODE int
 #define BlockNumber unsigned int
 #define Oid unsigned int
@@ -29,12 +29,12 @@ provider postgresql {
 	probe transaction__commit(LocalTransactionId);
 	probe transaction__abort(LocalTransactionId);
 
-	probe lwlock__acquire(LWLockId, LWLockMode);
-	probe lwlock__release(LWLockId);
-	probe lwlock__wait__start(LWLockId, LWLockMode);
-	probe lwlock__wait__done(LWLockId, LWLockMode);
-	probe lwlock__condacquire(LWLockId, LWLockMode);
-	probe lwlock__condacquire__fail(LWLockId, LWLockMode);
+	probe flexlock__acquire(FlexLockId, FlexLockMode);
+	probe flexlock__release(FlexLockId);
+	probe flexlock__wait__start(FlexLockId, FlexLockMode);
+	probe flexlock__wait__done(FlexLockId, FlexLockMode);
+	probe flexlock__condacquire(FlexLockId, FlexLockMode);
+	probe flexlock__condacquire__fail(FlexLockId, FlexLockMode);
 
 	probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
 	probe lock__wait__done(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index e48743f..680a87f 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -55,7 +55,7 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLockId	ControlLock;
+	FlexLockId	ControlLock;
 
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
@@ -69,7 +69,7 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int		   *page_number;
 	int		   *page_lru_count;
-	LWLockId   *buffer_locks;
+	FlexLockId *buffer_locks;
 
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
@@ -136,7 +136,7 @@ typedef SlruCtlData *SlruCtl;
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLockId ctllock, const char *subdir);
+			  FlexLockId ctllock, const char *subdir);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int pageno);
 extern int SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				  TransactionId xid);
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 6c8e312..d3b74db 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -49,9 +49,9 @@
 #define SEQ_MINVALUE	(-SEQ_MAXVALUE)
 
 /*
- * Number of spare LWLocks to allocate for user-defined add-on code.
+ * Number of spare FlexLocks to allocate for user-defined add-on code.
  */
-#define NUM_USER_DEFINED_LWLOCKS	4
+#define NUM_USER_DEFINED_FLEXLOCKS	4
 
 /*
  * Define this if you want to allow the lo_import and lo_export SQL
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index b7d4ea5..ac7f665 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -103,7 +103,7 @@ typedef struct buftag
 #define BufTableHashPartition(hashcode) \
 	((hashcode) % NUM_BUFFER_PARTITIONS)
 #define BufMappingPartitionLock(hashcode) \
-	((LWLockId) (FirstBufMappingLock + BufTableHashPartition(hashcode)))
+	((FlexLockId) (FirstBufMappingLock + BufTableHashPartition(hashcode)))
 
 /*
  *	BufferDesc -- shared descriptor/state data for a single shared buffer.
@@ -143,8 +143,8 @@ typedef struct sbufdesc
 	int			buf_id;			/* buffer's index number (from 0) */
 	int			freeNext;		/* link in freelist chain */
 
-	LWLockId	io_in_progress_lock;	/* to wait for I/O to complete */
-	LWLockId	content_lock;	/* to lock access to buffer contents */
+	FlexLockId	io_in_progress_lock;	/* to wait for I/O to complete */
+	FlexLockId	content_lock;	/* to lock access to buffer contents */
 } BufferDesc;
 
 #define BufferDescriptorGetBuffer(bdesc) ((bdesc)->buf_id + 1)
diff --git a/src/include/storage/flexlock.h b/src/include/storage/flexlock.h
new file mode 100644
index 0000000..612c21a
--- /dev/null
+++ b/src/include/storage/flexlock.h
@@ -0,0 +1,102 @@
+/*-------------------------------------------------------------------------
+ *
+ * flexlock.h
+ *	  Flex lock manager
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/flexlock.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FLEXLOCK_H
+#define FLEXLOCK_H
+
+/*
+ * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
+ * here, but we need them to set up enum FlexLockId correctly, and having
+ * this file include lock.h or bufmgr.h would be backwards.
+ */
+
+/* Number of partitions of the shared buffer mapping hashtable */
+#define NUM_BUFFER_PARTITIONS  16
+
+/* Number of partitions the shared lock tables are divided into */
+#define LOG2_NUM_LOCK_PARTITIONS  4
+#define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)
+
+/* Number of partitions the shared predicate lock tables are divided into */
+#define LOG2_NUM_PREDICATELOCK_PARTITIONS  4
+#define NUM_PREDICATELOCK_PARTITIONS  (1 << LOG2_NUM_PREDICATELOCK_PARTITIONS)
+
+/*
+ * We have a number of predefined FlexLocks, plus a bunch of locks that are
+ * dynamically assigned (e.g., for shared buffers).  The FlexLock structures
+ * live in shared memory (since they contain shared data) and are identified
+ * by values of this enumerated type.  We abuse the notion of an enum somewhat
+ * by allowing values not listed in the enum declaration to be assigned.
+ * The extra value MaxDynamicFlexLock is there to keep the compiler from
+ * deciding that the enum can be represented as char or short ...
+ *
+ * If you remove a lock, please replace it with a placeholder. This retains
+ * the lock numbering, which is helpful for DTrace and other external
+ * debugging scripts.
+ */
+typedef enum FlexLockId
+{
+	BufFreelistLock,
+	ShmemIndexLock,
+	OidGenLock,
+	XidGenLock,
+	ProcArrayLock,
+	SInvalReadLock,
+	SInvalWriteLock,
+	WALInsertLock,
+	WALWriteLock,
+	ControlFileLock,
+	CheckpointLock,
+	CLogControlLock,
+	SubtransControlLock,
+	MultiXactGenLock,
+	MultiXactOffsetControlLock,
+	MultiXactMemberControlLock,
+	RelCacheInitLock,
+	BgWriterCommLock,
+	TwoPhaseStateLock,
+	TablespaceCreateLock,
+	BtreeVacuumLock,
+	AddinShmemInitLock,
+	AutovacuumLock,
+	AutovacuumScheduleLock,
+	SyncScanLock,
+	RelationMappingLock,
+	AsyncCtlLock,
+	AsyncQueueLock,
+	SerializableXactHashLock,
+	SerializableFinishedListLock,
+	SerializablePredicateLockListLock,
+	OldSerXidLock,
+	SyncRepLock,
+	/* Individual lock IDs end here */
+	FirstBufMappingLock,
+	FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
+	FirstPredicateLockMgrLock = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
+
+	/* must be last except for MaxDynamicFlexLock: */
+	NumFixedFlexLocks = FirstPredicateLockMgrLock + NUM_PREDICATELOCK_PARTITIONS,
+
+	MaxDynamicFlexLock = 1000000000
+} FlexLockId;
+
+/* Shared memory setup. */
+extern int	NumFlexLocks(void);
+extern Size FlexLockShmemSize(void);
+extern void RequestAddinFlexLocks(int n);
+extern void CreateFlexLocks(void);
+
+/* Error recovery and debugging support functions. */
+extern void FlexLockReleaseAll(void);
+extern bool FlexLockHeldByMe(FlexLockId id);
+
+#endif   /* FLEXLOCK_H */
diff --git a/src/include/storage/flexlock_internals.h b/src/include/storage/flexlock_internals.h
new file mode 100644
index 0000000..5f78da7
--- /dev/null
+++ b/src/include/storage/flexlock_internals.h
@@ -0,0 +1,88 @@
+/*-------------------------------------------------------------------------
+ *
+ * flexlock_internals.h
+ *	  Flex lock internals.  Only files which implement a FlexLock
+ *    type should need to include this.  Merging this with flexlock.h
+ *    creates a circular header dependency, but even if it didn't, this
+ *    is cleaner.
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/flexlock_internals.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FLEXLOCK_INTERNALS_H
+#define FLEXLOCK_INTERNALS_H
+
+#include "pg_trace.h"
+#include "storage/flexlock.h"
+#include "storage/proc.h"
+#include "storage/s_lock.h"
+
+/*
+ * Individual FlexLock implementations each get this many bytes to store
+ * its state; of course, a given implementation could also allocate additional
+ * shmem elsewhere, but we provide this many bytes within the array.  The
+ * header fields common to all FlexLock types are included in this number.
+ * A power of two should probably be chosen, to avoid alignment issues and
+ * cache line splitting.  It might be useful to increase this on systems where
+ * a cache line is more than 64 bytes in size.
+ */
+#define FLEX_LOCK_BYTES		64
+
+typedef struct FlexLock
+{
+	char		locktype;		/* see FLEXLOCK_TYPE_* constants */
+	slock_t		mutex;			/* Protects FlexLock state and wait queues */
+	bool		releaseOK;		/* T if ok to release waiters */
+	PGPROC	   *head;			/* head of list of waiting PGPROCs */
+	PGPROC	   *tail;			/* tail of list of waiting PGPROCs */
+	/* tail is undefined when head is NULL */
+} FlexLock;
+
+#define FLEXLOCK_TYPE_LWLOCK			'l'
+
+typedef union FlexLockPadded
+{
+	FlexLock	flex;
+	char		pad[FLEX_LOCK_BYTES];
+} FlexLockPadded;
+
+extern FlexLockPadded *FlexLockArray;
+
+extern FlexLockId FlexLockAssign(char locktype);
+extern void FlexLockRemember(FlexLockId id);
+extern void FlexLockForget(FlexLockId id);
+extern int FlexLockWait(FlexLockId id, int mode);
+
+/*
+ * We must join the wait queue while holding the spinlock, so we define this
+ * as a macro, for speed.
+ */
+#define FlexLockJoinWaitQueue(lock, mode) \
+	do { \
+		Assert(MyProc != NULL); \
+		MyProc->flWaitResult = 0; \
+		MyProc->flWaitMode = mode; \
+		MyProc->flWaitLink = NULL; \
+		if (lock->flex.head == NULL) \
+			lock->flex.head = MyProc; \
+		else \
+			lock->flex.tail->flWaitLink = MyProc; \
+		lock->flex.tail = MyProc; \
+	} while (0)
+
+#ifdef LOCK_DEBUG
+extern bool	Trace_flexlocks;
+#define FlexLockDebug(where, id, msg) \
+	do { \
+		if (Trace_flexlocks) \
+			elog(LOG, "%s(%d): %s", where, (int) id, msg); \
+	} while (0)
+#else
+#define FlexLockDebug(where, id, msg)
+#endif
+
+#endif   /* FLEXLOCK_H */
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index e106ad5..ba87db2 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -471,7 +471,7 @@ typedef enum
 #define LockHashPartition(hashcode) \
 	((hashcode) % NUM_LOCK_PARTITIONS)
 #define LockHashPartitionLock(hashcode) \
-	((LWLockId) (FirstLockMgrLock + LockHashPartition(hashcode)))
+	((FlexLockId) (FirstLockMgrLock + LockHashPartition(hashcode)))
 
 
 /*
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 438a48d..69c72f1 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -14,82 +14,7 @@
 #ifndef LWLOCK_H
 #define LWLOCK_H
 
-/*
- * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
- * here, but we need them to set up enum LWLockId correctly, and having
- * this file include lock.h or bufmgr.h would be backwards.
- */
-
-/* Number of partitions of the shared buffer mapping hashtable */
-#define NUM_BUFFER_PARTITIONS  16
-
-/* Number of partitions the shared lock tables are divided into */
-#define LOG2_NUM_LOCK_PARTITIONS  4
-#define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)
-
-/* Number of partitions the shared predicate lock tables are divided into */
-#define LOG2_NUM_PREDICATELOCK_PARTITIONS  4
-#define NUM_PREDICATELOCK_PARTITIONS  (1 << LOG2_NUM_PREDICATELOCK_PARTITIONS)
-
-/*
- * We have a number of predefined LWLocks, plus a bunch of LWLocks that are
- * dynamically assigned (e.g., for shared buffers).  The LWLock structures
- * live in shared memory (since they contain shared data) and are identified
- * by values of this enumerated type.  We abuse the notion of an enum somewhat
- * by allowing values not listed in the enum declaration to be assigned.
- * The extra value MaxDynamicLWLock is there to keep the compiler from
- * deciding that the enum can be represented as char or short ...
- *
- * If you remove a lock, please replace it with a placeholder. This retains
- * the lock numbering, which is helpful for DTrace and other external
- * debugging scripts.
- */
-typedef enum LWLockId
-{
-	BufFreelistLock,
-	ShmemIndexLock,
-	OidGenLock,
-	XidGenLock,
-	ProcArrayLock,
-	SInvalReadLock,
-	SInvalWriteLock,
-	WALInsertLock,
-	WALWriteLock,
-	ControlFileLock,
-	CheckpointLock,
-	CLogControlLock,
-	SubtransControlLock,
-	MultiXactGenLock,
-	MultiXactOffsetControlLock,
-	MultiXactMemberControlLock,
-	RelCacheInitLock,
-	BgWriterCommLock,
-	TwoPhaseStateLock,
-	TablespaceCreateLock,
-	BtreeVacuumLock,
-	AddinShmemInitLock,
-	AutovacuumLock,
-	AutovacuumScheduleLock,
-	SyncScanLock,
-	RelationMappingLock,
-	AsyncCtlLock,
-	AsyncQueueLock,
-	SerializableXactHashLock,
-	SerializableFinishedListLock,
-	SerializablePredicateLockListLock,
-	OldSerXidLock,
-	SyncRepLock,
-	/* Individual lock IDs end here */
-	FirstBufMappingLock,
-	FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
-	FirstPredicateLockMgrLock = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
-
-	/* must be last except for MaxDynamicLWLock: */
-	NumFixedLWLocks = FirstPredicateLockMgrLock + NUM_PREDICATELOCK_PARTITIONS,
-
-	MaxDynamicLWLock = 1000000000
-} LWLockId;
-
+#include "storage/flexlock.h"
 
 typedef enum LWLockMode
 {
@@ -97,22 +22,9 @@ typedef enum LWLockMode
 	LW_SHARED
 } LWLockMode;
 
-
-#ifdef LOCK_DEBUG
-extern bool Trace_lwlocks;
-#endif
-
-extern LWLockId LWLockAssign(void);
-extern void LWLockAcquire(LWLockId lockid, LWLockMode mode);
-extern bool LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode);
-extern void LWLockRelease(LWLockId lockid);
-extern void LWLockReleaseAll(void);
-extern bool LWLockHeldByMe(LWLockId lockid);
-
-extern int	NumLWLocks(void);
-extern Size LWLockShmemSize(void);
-extern void CreateLWLocks(void);
-
-extern void RequestAddinLWLocks(int n);
+extern FlexLockId LWLockAssign(void);
+extern void LWLockAcquire(FlexLockId lockid, LWLockMode mode);
+extern bool LWLockConditionalAcquire(FlexLockId lockid, LWLockMode mode);
+extern void LWLockRelease(FlexLockId lockid);
 
 #endif   /* LWLOCK_H */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 6e798b1..7e8630d 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -114,10 +114,10 @@ struct PGPROC
 	 */
 	bool		recoveryConflictPending;
 
-	/* Info about LWLock the process is currently waiting for, if any. */
-	bool		lwWaiting;		/* true if waiting for an LW lock */
-	bool		lwExclusive;	/* true if waiting for exclusive access */
-	struct PGPROC *lwWaitLink;	/* next waiter for same LW lock */
+	/* Info about FlexLock the process is currently waiting for, if any. */
+	int			flWaitResult;	/* result of wait, or 0 if still waiting */
+	int			flWaitMode;		/* lock mode sought */
+	struct PGPROC *flWaitLink;	/* next waiter for same FlexLock */
 
 	/* Info about lock the process is currently waiting for, if any. */
 	/* waitLock and waitProcLock are NULL if not currently waiting. */
@@ -147,7 +147,7 @@ struct PGPROC
 	struct XidCache subxids;	/* cache for subtransaction XIDs */
 
 	/* Per-backend LWLock.  Protects fields below. */
-	LWLockId	backendLock;	/* protects the fields below */
+	FlexLockId	backendLock;	/* protects the fields below */
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	uint64		fpLockBits;		/* lock modes held for each fast-path slot */

procarraylock-v1.patchapplication/octet-stream; name=procarraylock-v1.patchDownload

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 32985a4..d6bba6f 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -40,6 +40,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "utils/acl.h"
 #include "utils/attoptcache.h"
 #include "utils/datum.h"
@@ -222,9 +223,9 @@ analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
 	/*
 	 * OK, let's do it.  First let other backends know I'm in ANALYZE.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	MyProc->vacuumFlags |= PROC_IN_ANALYZE;
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/*
 	 * Do the normal non-recursive ANALYZE.
@@ -249,9 +250,9 @@ analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
 	 * Reset my PGPROC flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	MyProc->vacuumFlags &= ~PROC_IN_ANALYZE;
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index f42504c..823dab9 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -39,6 +39,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
@@ -892,11 +893,11 @@ vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool do_toast, bool for_wraparound)
 		 * MyProc->xid/xmin, else OldestXmin might appear to go backwards,
 		 * which is probably Not Good.
 		 */
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		ProcArrayLockAcquire(PAL_EXCLUSIVE);
 		MyProc->vacuumFlags |= PROC_IN_VACUUM;
 		if (for_wraparound)
 			MyProc->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 	}
 
 	/*
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 1a48485..39c5080 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -52,6 +52,7 @@
 #include "access/twophase.h"
 #include "miscadmin.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/snapmgr.h"
@@ -254,7 +255,7 @@ ProcArrayAdd(PGPROC *proc)
 {
 	ProcArrayStruct *arrayP = procArray;
 
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	if (arrayP->numProcs >= arrayP->maxProcs)
 	{
@@ -263,7 +264,7 @@ ProcArrayAdd(PGPROC *proc)
 		 * fixed supply of PGPROC structs too, and so we should have failed
 		 * earlier.)
 		 */
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 		ereport(FATAL,
 				(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
 				 errmsg("sorry, too many clients already")));
@@ -272,7 +273,7 @@ ProcArrayAdd(PGPROC *proc)
 	arrayP->procs[arrayP->numProcs] = proc;
 	arrayP->numProcs++;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -297,7 +298,7 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 		DisplayXidCache();
 #endif
 
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	if (TransactionIdIsValid(latestXid))
 	{
@@ -321,13 +322,13 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 			arrayP->procs[index] = arrayP->procs[arrayP->numProcs - 1];
 			arrayP->procs[arrayP->numProcs - 1] = NULL; /* for debugging */
 			arrayP->numProcs--;
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 			return;
 		}
 	}
 
 	/* Ooops */
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	elog(LOG, "failed to find proc %p in ProcArray", proc);
 }
@@ -351,54 +352,15 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 {
 	if (TransactionIdIsValid(latestXid))
 	{
-		/*
-		 * We must lock ProcArrayLock while clearing proc->xid, so that we do
-		 * not exit the set of "running" transactions while someone else is
-		 * taking a snapshot.  See discussion in
-		 * src/backend/access/transam/README.
-		 */
-		Assert(TransactionIdIsValid(proc->xid));
-
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-		proc->xid = InvalidTransactionId;
-		proc->lxid = InvalidLocalTransactionId;
-		proc->xmin = InvalidTransactionId;
-		/* must be cleared with xid/xmin: */
-		proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		proc->inCommit = false; /* be sure this is cleared in abort */
-		proc->recoveryConflictPending = false;
-
-		/* Clear the subtransaction-XID cache too while holding the lock */
-		proc->subxids.nxids = 0;
-		proc->subxids.overflowed = false;
-
-		/* Also advance global latestCompletedXid while holding the lock */
-		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-								  latestXid))
-			ShmemVariableCache->latestCompletedXid = latestXid;
-
-		LWLockRelease(ProcArrayLock);
+		Assert(proc == MyProc);
+		ProcArrayLockClearTransaction(latestXid);		
 	}
 	else
-	{
-		/*
-		 * If we have no XID, we don't need to lock, since we won't affect
-		 * anyone else's calculation of a snapshot.  We might change their
-		 * estimate of global xmin, but that's OK.
-		 */
-		Assert(!TransactionIdIsValid(proc->xid));
-
-		proc->lxid = InvalidLocalTransactionId;
 		proc->xmin = InvalidTransactionId;
-		/* must be cleared with xid/xmin: */
-		proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		proc->inCommit = false; /* be sure this is cleared in abort */
-		proc->recoveryConflictPending = false;
 
-		Assert(proc->subxids.nxids == 0);
-		Assert(proc->subxids.overflowed == false);
-	}
+	proc->lxid = InvalidLocalTransactionId;
+	proc->inCommit = false; /* be sure this is cleared in abort */
+	proc->recoveryConflictPending = false;
 }
 
 
@@ -528,7 +490,7 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	/*
 	 * Nobody else is running yet, but take locks anyhow
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	/*
 	 * KnownAssignedXids is sorted so we cannot just add the xids, we have to
@@ -635,7 +597,7 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	Assert(TransactionIdIsNormal(ShmemVariableCache->latestCompletedXid));
 	Assert(TransactionIdIsValid(ShmemVariableCache->nextXid));
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
 	if (standbyState == STANDBY_SNAPSHOT_READY)
@@ -690,7 +652,7 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
 	/*
 	 * Uses same locking as transaction commit
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	/*
 	 * Remove subxids from known-assigned-xacts.
@@ -703,7 +665,7 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
 	if (TransactionIdPrecedes(procArray->lastOverflowedXid, max_xid))
 		procArray->lastOverflowedXid = max_xid;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -795,7 +757,7 @@ TransactionIdIsInProgress(TransactionId xid)
 					 errmsg("out of memory")));
 	}
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	/*
 	 * Now that we have the lock, we can check latestCompletedXid; if the
@@ -803,7 +765,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	 */
 	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid, xid))
 	{
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 		xc_by_latest_xid_inc();
 		return true;
 	}
@@ -829,7 +791,7 @@ TransactionIdIsInProgress(TransactionId xid)
 		 */
 		if (TransactionIdEquals(pxid, xid))
 		{
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 			xc_by_main_xid_inc();
 			return true;
 		}
@@ -851,7 +813,7 @@ TransactionIdIsInProgress(TransactionId xid)
 
 			if (TransactionIdEquals(cxid, xid))
 			{
-				LWLockRelease(ProcArrayLock);
+				ProcArrayLockRelease();
 				xc_by_child_xid_inc();
 				return true;
 			}
@@ -879,7 +841,7 @@ TransactionIdIsInProgress(TransactionId xid)
 
 		if (KnownAssignedXidExists(xid))
 		{
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 			xc_by_known_assigned_inc();
 			return true;
 		}
@@ -895,7 +857,7 @@ TransactionIdIsInProgress(TransactionId xid)
 			nxids = KnownAssignedXidsGet(xids, xid);
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/*
 	 * If none of the relevant caches overflowed, we know the Xid is not
@@ -961,7 +923,7 @@ TransactionIdIsActive(TransactionId xid)
 	if (TransactionIdPrecedes(xid, RecentXmin))
 		return false;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (i = 0; i < arrayP->numProcs; i++)
 	{
@@ -983,7 +945,7 @@ TransactionIdIsActive(TransactionId xid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1046,7 +1008,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 	/* Cannot look for individual databases during recovery */
 	Assert(allDbs || !RecoveryInProgress());
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	/*
 	 * We initialize the MIN() calculation with latestCompletedXid + 1. This
@@ -1099,7 +1061,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 		 */
 		TransactionId kaxmin = KnownAssignedXidsGetOldestXmin();
 
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 
 		if (TransactionIdIsNormal(kaxmin) &&
 			TransactionIdPrecedes(kaxmin, result))
@@ -1110,7 +1072,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 		/*
 		 * No other information needed, so release the lock immediately.
 		 */
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 
 		/*
 		 * Compute the cutoff XID by subtracting vacuum_defer_cleanup_age,
@@ -1239,7 +1201,7 @@ GetSnapshotData(Snapshot snapshot)
 	 * It is sufficient to get shared lock on ProcArrayLock, even if we are
 	 * going to set MyProc->xmin.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	/* xmax is always latestCompletedXid + 1 */
 	xmax = ShmemVariableCache->latestCompletedXid;
@@ -1375,7 +1337,7 @@ GetSnapshotData(Snapshot snapshot)
 	if (!TransactionIdIsValid(MyProc->xmin))
 		MyProc->xmin = TransactionXmin = xmin;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/*
 	 * Update globalxmin to include actual process xids.  This is a slightly
@@ -1432,7 +1394,7 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
 		return false;
 
 	/* Get lock so source xact can't end while we're doing this */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1476,7 +1438,7 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
 		break;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1550,7 +1512,7 @@ GetRunningTransactionData(void)
 	 * Ensure that no xids enter or leave the procarray while we obtain
 	 * snapshot.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 	LWLockAcquire(XidGenLock, LW_SHARED);
 
 	latestCompletedXid = ShmemVariableCache->latestCompletedXid;
@@ -1611,7 +1573,7 @@ GetRunningTransactionData(void)
 	CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
 
 	/* We don't release XidGenLock here, the caller is responsible for that */
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
 	Assert(TransactionIdIsValid(CurrentRunningXacts->oldestRunningXid));
@@ -1644,7 +1606,7 @@ GetOldestActiveTransactionId(void)
 
 	Assert(!RecoveryInProgress());
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	oldestRunningXid = ShmemVariableCache->nextXid;
 
@@ -1672,7 +1634,7 @@ GetOldestActiveTransactionId(void)
 		 */
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return oldestRunningXid;
 }
@@ -1705,7 +1667,7 @@ GetTransactionsInCommit(TransactionId **xids_p)
 	xids = (TransactionId *) palloc(arrayP->maxProcs * sizeof(TransactionId));
 	nxids = 0;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1718,7 +1680,7 @@ GetTransactionsInCommit(TransactionId **xids_p)
 			xids[nxids++] = pxid;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	*xids_p = xids;
 	return nxids;
@@ -1740,7 +1702,7 @@ HaveTransactionsInCommit(TransactionId *xids, int nxids)
 	ProcArrayStruct *arrayP = procArray;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1766,7 +1728,7 @@ HaveTransactionsInCommit(TransactionId *xids, int nxids)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1788,7 +1750,7 @@ BackendPidGetProc(int pid)
 	if (pid == 0)				/* never match dummy PGPROCs */
 		return NULL;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1801,7 +1763,7 @@ BackendPidGetProc(int pid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1829,7 +1791,7 @@ BackendXidGetPid(TransactionId xid)
 	if (xid == InvalidTransactionId)	/* never match invalid xid */
 		return 0;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1842,7 +1804,7 @@ BackendXidGetPid(TransactionId xid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1897,7 +1859,7 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
 	vxids = (VirtualTransactionId *)
 		palloc(sizeof(VirtualTransactionId) * arrayP->maxProcs);
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1933,7 +1895,7 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	*nvxids = count;
 	return vxids;
@@ -1992,7 +1954,7 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
 					 errmsg("out of memory")));
 	}
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2025,7 +1987,7 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/* add the terminator */
 	vxids[count].backendId = InvalidBackendId;
@@ -2046,7 +2008,7 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
 	int			index;
 	pid_t		pid = 0;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2072,7 +2034,7 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return pid;
 }
@@ -2146,7 +2108,7 @@ CountDBBackends(Oid databaseid)
 	int			count = 0;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2159,7 +2121,7 @@ CountDBBackends(Oid databaseid)
 			count++;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return count;
 }
@@ -2175,7 +2137,7 @@ CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending)
 	pid_t		pid = 0;
 
 	/* tell all backends to die */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2200,7 +2162,7 @@ CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -2213,7 +2175,7 @@ CountUserBackends(Oid roleid)
 	int			count = 0;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2225,7 +2187,7 @@ CountUserBackends(Oid roleid)
 			count++;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return count;
 }
@@ -2273,7 +2235,7 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 
 		*nbackends = *nprepared = 0;
 
-		LWLockAcquire(ProcArrayLock, LW_SHARED);
+		ProcArrayLockAcquire(PAL_SHARED);
 
 		for (index = 0; index < arrayP->numProcs; index++)
 		{
@@ -2297,7 +2259,7 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 			}
 		}
 
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 
 		if (!found)
 			return false;		/* no conflicting backends, so done */
@@ -2350,7 +2312,7 @@ XidCacheRemoveRunningXids(TransactionId xid,
 	 * to abort subtransactions, but pending closer analysis we'd best be
 	 * conservative.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	/*
 	 * Under normal circumstances xid and xids[] will be in increasing order,
@@ -2398,7 +2360,7 @@ XidCacheRemoveRunningXids(TransactionId xid,
 							  latestXid))
 		ShmemVariableCache->latestCompletedXid = latestXid;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 #ifdef XIDCACHE_DEBUG
@@ -2565,7 +2527,7 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
 	/*
 	 * Uses same locking as transaction commit
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	KnownAssignedXidsRemoveTree(xid, nsubxids, subxids);
 
@@ -2574,7 +2536,7 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
 							  max_xid))
 		ShmemVariableCache->latestCompletedXid = max_xid;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -2584,9 +2546,9 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
 void
 ExpireAllKnownAssignedTransactionIds(void)
 {
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	KnownAssignedXidsRemovePreceding(InvalidTransactionId);
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -2596,9 +2558,9 @@ ExpireAllKnownAssignedTransactionIds(void)
 void
 ExpireOldKnownAssignedTransactionIds(TransactionId xid)
 {
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	KnownAssignedXidsRemovePreceding(xid);
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 
@@ -2820,7 +2782,7 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 	{
 		/* must hold lock to compress */
 		if (!exclusive_lock)
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+			ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 		KnownAssignedXidsCompress(true);
 
@@ -2828,7 +2790,7 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 		/* note: we no longer care about the tail pointer */
 
 		if (!exclusive_lock)
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 
 		/*
 		 * If it still won't fit then we're out of memory
diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index 3730e51..27eaa97 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = flexlock.o lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o \
-	predicate.o
+	procarraylock.o predicate.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/storage/lmgr/flexlock.c b/src/backend/storage/lmgr/flexlock.c
index 7f657b3..c88bd24 100644
--- a/src/backend/storage/lmgr/flexlock.c
+++ b/src/backend/storage/lmgr/flexlock.c
@@ -30,6 +30,7 @@
 #include "storage/lwlock.h"
 #include "storage/predicate.h"
 #include "storage/proc.h"
+#include "storage/procarraylock.h"
 #include "storage/spin.h"
 #include "utils/elog.h"
 
@@ -177,9 +178,14 @@ CreateFlexLocks(void)
 
 	FlexLockArray = (FlexLockPadded *) ptr;
 
-	/* All of the "fixed" FlexLocks are LWLocks. */
+	/* All of the "fixed" FlexLocks are LWLocks - except ProcArrayLock. */
 	for (id = 0, lock = FlexLockArray; id < NumFixedFlexLocks; id++, lock++)
-		FlexLockInit(&lock->flex, FLEXLOCK_TYPE_LWLOCK);
+	{
+		if (id == ProcArrayLock)
+			FlexLockInit(&lock->flex, FLEXLOCK_TYPE_PROCARRAYLOCK);
+		else
+			FlexLockInit(&lock->flex, FLEXLOCK_TYPE_LWLOCK);
+	}
 
 	/*
 	 * Initialize the dynamic-allocation counter, which is stored just before
@@ -324,13 +330,20 @@ FlexLockReleaseAll(void)
 {
 	while (num_held_flexlocks > 0)
 	{
+		FlexLockId	id;
+		FlexLock   *flex;
+
 		HOLD_INTERRUPTS();		/* match the upcoming RESUME_INTERRUPTS */
 
-		/*
-		 * FLEXTODO: When we have multiple types of flex locks, this will
-		 * need to call the appropriate release function for each lock type.
-		 */
-		LWLockRelease(held_flexlocks[num_held_flexlocks - 1]);
+		id = held_flexlocks[num_held_flexlocks - 1];
+		flex = &FlexLockArray[id].flex;
+		if (flex->locktype == FLEXLOCK_TYPE_LWLOCK)
+			LWLockRelease(id);
+		else
+		{
+			Assert(id == ProcArrayLock);
+			ProcArrayLockRelease();
+		}
 	}
 }
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 57da345..510a4c2 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -45,6 +45,7 @@
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "storage/procsignal.h"
 #include "storage/spin.h"
 #include "utils/timestamp.h"
@@ -1046,7 +1047,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		{
 			PGPROC	   *autovac = GetBlockingAutoVacuumPgproc();
 
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+			ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 			/*
 			 * Only do it if the worker is not working to protect against Xid
@@ -1062,7 +1063,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 					 pid);
 
 				/* don't hold the lock across the kill() syscall */
-				LWLockRelease(ProcArrayLock);
+				ProcArrayLockRelease();
 
 				/* send the autovacuum worker Back to Old Kent Road */
 				if (kill(pid, SIGINT) < 0)
@@ -1074,7 +1075,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 				}
 			}
 			else
-				LWLockRelease(ProcArrayLock);
+				ProcArrayLockRelease();
 
 			/* prevent signal from being resent more than once */
 			allow_autovacuum_cancel = false;
diff --git a/src/backend/storage/lmgr/procarraylock.c b/src/backend/storage/lmgr/procarraylock.c
new file mode 100644
index 0000000..6838ed6
--- /dev/null
+++ b/src/backend/storage/lmgr/procarraylock.c
@@ -0,0 +1,341 @@
+/*-------------------------------------------------------------------------
+ *
+ * procarraylock.c
+ *	  Lock management for the ProcArray
+ *
+ * Because the ProcArray data structure is highly trafficked, it is
+ * critical that mutual exclusion for ProcArray options be as efficient
+ * as possible.  A particular problem is transaction end (commit or abort)
+ * which cannot be done in parallel with snapshot acquisition.  We
+ * therefore include some special hacks to deal with this case efficiently.
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/lmgr/procarraylock.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "pg_trace.h"
+#include "access/transam.h"
+#include "storage/flexlock_internals.h"
+#include "storage/ipc.h"
+#include "storage/procarraylock.h"
+#include "storage/proc.h"
+#include "storage/spin.h"
+
+typedef struct ProcArrayLockStruct
+{
+	FlexLock	flex;			/* common FlexLock infrastructure */
+	char		exclusive;		/* # of exclusive holders (0 or 1) */
+	int			shared;			/* # of shared holders (0..MaxBackends) */
+	PGPROC	   *ending;			/* transactions wishing to clear state */
+	TransactionId	latest_ending_xid;	/* latest ending XID */
+} ProcArrayLockStruct;
+
+/* There is only one ProcArrayLock. */
+#define	ProcArrayLockPointer() \
+	(AssertMacro(FlexLockArray[ProcArrayLock].flex.locktype == \
+		FLEXLOCK_TYPE_PROCARRAYLOCK), \
+	 (volatile ProcArrayLockStruct *) &FlexLockArray[ProcArrayLock])
+
+/*
+ * ProcArrayLockAcquire - acquire a lightweight lock in the specified mode
+ *
+ * If the lock is not available, sleep until it is.
+ *
+ * Side effect: cancel/die interrupts are held off until lock release.
+ */
+void
+ProcArrayLockAcquire(ProcArrayLockMode mode)
+{
+	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
+	PGPROC	   *proc = MyProc;
+	bool		retry = false;
+	int			extraWaits = 0;
+
+	/*
+	 * We can't wait if we haven't got a PGPROC.  This should only occur
+	 * during bootstrap or shared memory initialization.  Put an Assert here
+	 * to catch unsafe coding practices.
+	 */
+	Assert(!(proc == NULL && IsUnderPostmaster));
+
+	/*
+	 * Lock out cancel/die interrupts until we exit the code section protected
+	 * by the ProcArrayLock.  This ensures that interrupts will not interfere
+     * with manipulations of data structures in shared memory.
+	 */
+	HOLD_INTERRUPTS();
+
+	/*
+	 * Loop here to try to acquire lock after each time we are signaled by
+	 * ProcArrayLockRelease.  See comments in LWLockAcquire for an explanation
+	 * of why do we not attempt to hand off the lock directly.
+	 */
+	for (;;)
+	{
+		bool		mustwait;
+
+		/* Acquire mutex.  Time spent holding mutex should be short! */
+		SpinLockAcquire(&lock->flex.mutex);
+
+		/* If retrying, allow LWLockRelease to release waiters again */
+		if (retry)
+			lock->flex.releaseOK = true;
+
+		/* If I can get the lock, do so quickly. */
+		if (mode == PAL_EXCLUSIVE)
+		{
+			if (lock->exclusive == 0 && lock->shared == 0)
+			{
+				lock->exclusive++;
+				mustwait = false;
+			}
+			else
+				mustwait = true;
+		}
+		else
+		{
+			if (lock->exclusive == 0)
+			{
+				lock->shared++;
+				mustwait = false;
+			}
+			else
+				mustwait = true;
+		}
+
+		if (!mustwait)
+			break;				/* got the lock */
+
+		/* Add myself to wait queue. */
+		FlexLockJoinWaitQueue(lock, (int) mode);
+
+		/* Can release the mutex now */
+		SpinLockRelease(&lock->flex.mutex);
+
+		/* Wait until awakened. */
+		extraWaits += FlexLockWait(ProcArrayLock, mode);
+
+		/* Now loop back and try to acquire lock again. */
+		retry = true;
+	}
+
+	/* We are done updating shared state of the lock itself. */
+	SpinLockRelease(&lock->flex.mutex);
+
+	TRACE_POSTGRESQL_FLEXLOCK_ACQUIRE(lockid, mode);
+
+	/* Add lock to list of locks held by this backend */
+	FlexLockRemember(ProcArrayLock);
+
+	/*
+	 * Fix the process wait semaphore's count for any absorbed wakeups.
+	 */
+	while (extraWaits-- > 0)
+		PGSemaphoreUnlock(&proc->sem);
+}
+
+/*
+ * ProcArrayLockClearTransaction - safely clear transaction details
+ *
+ * This can't be done while ProcArrayLock is held, but it's so fast that
+ * we can afford to do it while holding the spinlock, rather than acquiring
+ * and releasing the lock.
+ */
+void
+ProcArrayLockClearTransaction(TransactionId latestXid)
+{
+	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
+	PGPROC	   *proc = MyProc;
+	int			extraWaits = 0;
+	bool		mustwait;
+
+	HOLD_INTERRUPTS();
+
+	/* Acquire mutex.  Time spent holding mutex should be short! */
+	SpinLockAcquire(&lock->flex.mutex);
+
+	if (lock->exclusive == 0 && lock->shared == 0)
+	{
+		{
+			volatile PGPROC *vproc = proc;
+			/* If there are no lockers, clar the critical PGPROC fields. */
+			vproc->xid = InvalidTransactionId;
+	        vproc->xmin = InvalidTransactionId;
+	        /* must be cleared with xid/xmin: */
+	        vproc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
+			vproc->subxids.nxids = 0;
+			vproc->subxids.overflowed = false;
+		}
+		mustwait = false;
+
+        /* Also advance global latestCompletedXid while holding the lock */
+        if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
+                                  latestXid))
+            ShmemVariableCache->latestCompletedXid = latestXid;
+	}
+	else
+	{
+		/* Rats, must wait. */
+		proc->flWaitLink = lock->ending;
+		lock->ending = proc;
+		if (!TransactionIdIsValid(lock->latest_ending_xid) ||
+				TransactionIdPrecedes(lock->latest_ending_xid, latestXid)) 
+			lock->latest_ending_xid = latestXid;
+		mustwait = true;
+	}
+
+	/* Can release the mutex now */
+	SpinLockRelease(&lock->flex.mutex);
+
+	/*
+	 * If we were not able to perfom the operation immediately, we must wait.
+	 * But we need not retry after being awoken, because the last lock holder
+	 * to release the lock will do the work first, on our behalf.
+	 */
+	if (mustwait)
+	{
+		extraWaits += FlexLockWait(ProcArrayLock, 2);
+		while (extraWaits-- > 0)
+			PGSemaphoreUnlock(&proc->sem);
+	}
+
+	RESUME_INTERRUPTS();
+}
+
+/*
+ * ProcArrayLockRelease - release a previously acquired lock
+ */
+void
+ProcArrayLockRelease(void)
+{
+	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
+	PGPROC	   *head;
+	PGPROC	   *ending = NULL;
+	PGPROC	   *proc;
+
+	FlexLockForget(ProcArrayLock);
+
+	/* Acquire mutex.  Time spent holding mutex should be short! */
+	SpinLockAcquire(&lock->flex.mutex);
+
+	/* Release my hold on lock */
+	if (lock->exclusive > 0)
+		lock->exclusive--;
+	else
+	{
+		Assert(lock->shared > 0);
+		lock->shared--;
+	}
+
+	/*
+	 * If the lock is now free, but there are some transactions trying to
+	 * end, we must clear the critical PGPROC fields for them, and save a
+	 * list of them so we can wake them up.
+	 */
+	if (lock->exclusive == 0 && lock->shared == 0 && lock->ending != NULL)
+	{
+		volatile PGPROC *vproc;
+
+		ending = lock->ending;
+		vproc = ending;
+
+		while (vproc != NULL)
+		{
+        	vproc->xid = InvalidTransactionId;
+	        vproc->xmin = InvalidTransactionId;
+	        /* must be cleared with xid/xmin: */
+	        vproc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
+			vproc->subxids.nxids = 0;
+			vproc->subxids.overflowed = false;
+			vproc = vproc->flWaitLink;
+		}
+
+		/* Also advance global latestCompletedXid */
+		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
+								  lock->latest_ending_xid))
+			ShmemVariableCache->latestCompletedXid = lock->latest_ending_xid;
+
+		/* Reset lock state. */
+		lock->ending = NULL;
+		lock->latest_ending_xid = InvalidTransactionId;
+	}
+
+	/*
+	 * See if I need to awaken any waiters.  If I released a non-last shared
+	 * hold, there cannot be anything to do.  Also, do not awaken any waiters
+	 * if someone has already awakened waiters that haven't yet acquired the
+	 * lock.
+	 */
+	head = lock->flex.head;
+	if (head != NULL)
+	{
+		if (lock->exclusive == 0 && lock->shared == 0 && lock->flex.releaseOK)
+		{
+			/*
+			 * Remove the to-be-awakened PGPROCs from the queue.  If the front
+			 * waiter wants exclusive lock, awaken him only. Otherwise awaken
+			 * as many waiters as want shared access.
+			 */
+			proc = head;
+			if (proc->flWaitMode != LW_EXCLUSIVE)
+			{
+				while (proc->flWaitLink != NULL &&
+					   proc->flWaitLink->flWaitMode != LW_EXCLUSIVE)
+					proc = proc->flWaitLink;
+			}
+			/* proc is now the last PGPROC to be released */
+			lock->flex.head = proc->flWaitLink;
+			proc->flWaitLink = NULL;
+			/* prevent additional wakeups until retryer gets to run */
+			lock->flex.releaseOK = false;
+		}
+		else
+		{
+			/* lock is still held, can't awaken anything */
+			head = NULL;
+		}
+	}
+
+	/* We are done updating shared state of the lock itself. */
+	SpinLockRelease(&lock->flex.mutex);
+
+	TRACE_POSTGRESQL_FLEXLOCK_RELEASE(lockid);
+
+	/*
+	 * Awaken any waiters I removed from the queue.
+	 */
+	while (head != NULL)
+	{
+		FlexLockDebug("LWLockRelease", lockid, "release waiter");
+		proc = head;
+		head = proc->flWaitLink;
+		proc->flWaitLink = NULL;
+		proc->flWaitResult = 1;		/* any non-zero value will do */
+		PGSemaphoreUnlock(&proc->sem);
+	}
+
+	/*
+	 * Also awaken any processes whose critical PGPROC fields I cleared
+	 */
+	while (ending != NULL)
+	{
+		FlexLockDebug("LWLockRelease", lockid, "release ending");
+		proc = ending;
+		ending = proc->flWaitLink;
+		proc->flWaitLink = NULL;
+		proc->flWaitResult = 1;		/* any non-zero value will do */
+		PGSemaphoreUnlock(&proc->sem);
+	}
+
+	/*
+	 * Now okay to allow cancel/die interrupts.
+	 */
+	RESUME_INTERRUPTS();
+}
diff --git a/src/include/storage/flexlock_internals.h b/src/include/storage/flexlock_internals.h
index 5f78da7..d1bca45 100644
--- a/src/include/storage/flexlock_internals.h
+++ b/src/include/storage/flexlock_internals.h
@@ -43,6 +43,7 @@ typedef struct FlexLock
 } FlexLock;
 
 #define FLEXLOCK_TYPE_LWLOCK			'l'
+#define FLEXLOCK_TYPE_PROCARRAYLOCK		'p'
 
 typedef union FlexLockPadded
 {
diff --git a/src/include/storage/procarraylock.h b/src/include/storage/procarraylock.h
new file mode 100644
index 0000000..678ca6f
--- /dev/null
+++ b/src/include/storage/procarraylock.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * procarraylock.h
+ *	  Lock management for the ProcArray
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/lwlock.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PROCARRAYLOCK_H
+#define PROCARRAYLOCK_H
+
+#include "storage/flexlock.h"
+
+typedef enum ProcArrayLockMode
+{
+	PAL_EXCLUSIVE,
+	PAL_SHARED
+} ProcArrayLockMode;
+
+extern void ProcArrayLockAcquire(ProcArrayLockMode mode);
+extern void ProcArrayLockClearTransaction(TransactionId latestXid);
+extern void ProcArrayLockRelease(void);
+
+#endif   /* PROCARRAYLOCK_H */

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 14 years ago

In reply to: Robert Haas (#1)

Re: FlexLocks

Robert Haas <robertmhaas@gmail.com> wrote:

I'm not necessarily saying that any of these particular
things are what we want to do, just throwing out the idea that we
may want a variety of lock types that are similar to lightweight
locks but with subtly different behavior, yet with common
infrastructure for error handling and wait queue management.

The locking in the clog area is pretty funky. I bet we could craft
a special flavor of FlexLock to make that cleaner. And I would be
surprised if some creative thinking didn't yield a far better FL
scheme for SSI than we can manage with existing LW locks.

Your description makes sense to me, and your numbers prove the value
of the concept. Whether there's anything in the implementation I
would quibble about will take some review time.

-Kevin

Simon Riggs

simon@2ndQuadrant.com

about 14 years ago

In reply to: Robert Haas (#1)

Re: FlexLocks

On Tue, Nov 15, 2011 at 1:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:

It basically
works like a regular LWLock, except that it has a special operation to
optimize ProcArrayEndTransaction(). In the uncontended case, instead
of acquiring and releasing the lock, it just grabs the lock, observes
that there is no contention, clears the critical PGPROC fields (which
isn't noticeably slower than updating the state of the lock would be)
and releases the spin lock. There's then no need to reacquire the
spinlock to "release" the lock; we're done. In the contended case,
the backend wishing to end adds itself to a queue of ending
transactions. When ProcArrayLock is released, the last person out
clears the PGPROC structures for all the waiters and wakes them all
up; they don't need to reacquire the lock, because the work they
wished to perform while holding it is already done. Thus, in the
*worst* case, ending transactions only need to acquire the spinlock
protecting ProcArrayLock half as often (once instead of twice), and in
the best case (where backends have to keep retrying only to repeatedly
fail to get the lock) it's far better than that.

Which is the same locking avoidance technique we already use for sync
rep and for the new group commit patch.

I've been saying for some time that we should use the same technique
for ProcArray and clog also, so we only need to queue once rather than
queue three times at end of each transaction.

I'm not really enthused by the idea of completely rewriting lwlocks
for this. Seems like specialised code is likely to be best, as well as
having less collateral damage.

With that in mind, should we try to fuse the group commit with the
procarraylock approach, so we just queue once and get woken when all
the activities have been handled? If the first woken proc performs the
actions then wakes people further down the queue it could work quite
well.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Simon Riggs (#3)

Re: FlexLocks

On Tue, Nov 15, 2011 at 1:40 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Which is the same locking avoidance technique we already use for sync
rep and for the new group commit patch.

Yep...

I've been saying for some time that we should use the same technique
for ProcArray and clog also, so we only need to queue once rather than
queue three times at end of each transaction.

I'm not really enthused by the idea of completely rewriting lwlocks
for this. Seems like specialised code is likely to be best, as well as
having less collateral damage.

Well, the problem that I have with that is that we're going to end up
with a lot of specialized code, particularly around error recovery.
This code doesn't remove the need for ProcArrayLock to be taken in
exclusive mode, and I don't think there IS any easy way to remove the
need for that to happen sometimes. So we have to deal with the
possibility that an ERROR might occur while we hold the lock, which
means we have to release the lock and clean up our state. That means
every place that has a call to LWLockReleaseAll() will now also need
to cleanup ProperlyCleanUpProcArrayLockStuff(). And the next time we
need to add some kind of specialized lock, we'll need to do the same
thing again. It seems to me that that rapidly gets unmanageable, not
to mention *slow*. We need some kind of common infrastructure for
releasing locks, and this is an attempt to create such a thing. I'm
not opposed to doing it some other way, but I think doing each one as
a one-off isn't going to work out very well.

Also, in this particular case, I really do want shared and exclusive
locks on ProcArrayLock to stick around; I just want one additional
operation as well. It's a lot less duplicated code to do that this
way than it is to write something from scratch. The FlexLock patch
may look big, but it's mostly renaming and rearranging; it's really
not adding much code.

With that in mind, should we try to fuse the group commit with the
procarraylock approach, so we just queue once and get woken when all
the activities have been handled? If the first woken proc performs the
actions then wakes people further down the queue it could work quite
well.

Well, there's too much work there to use the same approach I took
here: we can't very well hold onto the LWLock spinlock while flushing
WAL or waiting for synchronous replication. Fusing together some
parts of the commit sequence might be the right approach (I don't
know), but honestly my gut feeling is that the first thing we need to
do is go in the opposite direction and break up WALInsertLock into
multiple locks that allow better parallelization of WAL insertion. Of
course if someone finds a way to fuse the whole commit sequence
together in some way that improves performance, fantastic, but having
tried a lot of things before I came up with this approach, I'm a bit
reluctant to abandon it in favor of an approach that hasn't been coded
or tested yet. I think we should pursue this approach for now, and we
can always revise it later if someone comes up with something even
better. As a practical matter, the test results show that with these
patches, ProcArrayLock is NOT a bottleneck at 32 cores, which seems
like enough reason to be pretty happy with it, modulo implementation
details.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Alvaro Herrera

alvherre@commandprompt.com

about 14 years ago

In reply to: Robert Haas (#4)

Re: FlexLocks

Excerpts from Robert Haas's message of mar nov 15 17:16:31 -0300 2011:

On Tue, Nov 15, 2011 at 1:40 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Which is the same locking avoidance technique we already use for sync
rep and for the new group commit patch.

Yep...

I've been saying for some time that we should use the same technique
for ProcArray and clog also, so we only need to queue once rather than
queue three times at end of each transaction.

I'm not really enthused by the idea of completely rewriting lwlocks
for this. Seems like specialised code is likely to be best, as well as
having less collateral damage.

Well, the problem that I have with that is that we're going to end up
with a lot of specialized code, particularly around error recovery.
This code doesn't remove the need for ProcArrayLock to be taken in
exclusive mode, and I don't think there IS any easy way to remove the
need for that to happen sometimes. So we have to deal with the
possibility that an ERROR might occur while we hold the lock, which
means we have to release the lock and clean up our state. That means
every place that has a call to LWLockReleaseAll() will now also need
to cleanup ProperlyCleanUpProcArrayLockStuff(). And the next time we
need to add some kind of specialized lock, we'll need to do the same
thing again. It seems to me that that rapidly gets unmanageable, not
to mention *slow*. We need some kind of common infrastructure for
releasing locks, and this is an attempt to create such a thing. I'm
not opposed to doing it some other way, but I think doing each one as
a one-off isn't going to work out very well.

I agree. In fact, I would think that we should look into rewriting the
sync rep locking (and group commit) on top of flexlocks, not the other
way around. As Kevin says nearby it's likely that we could find some
way to rewrite the SLRU (clog and such) locking protocol using these new
things too.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 14 years ago

In reply to: Alvaro Herrera (#5)

Re: FlexLocks

Alvaro Herrera <alvherre@commandprompt.com> wrote:

As Kevin says nearby it's likely that we could find some way to
rewrite the SLRU (clog and such) locking protocol using these new
things too.

Yeah, I really meant all SLRU, not just clog. And having seen what
Robert has done here, I'm kinda glad I haven't gotten around to
trying to reduce LW lock contention yet, even though we're getting
dangerously far into the release cycle -- I think it can be done
much better with the, er, flexibility offered by the FlexLock patch.

-Kevin

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Kevin Grittner (#6)

Re: FlexLocks

On Tue, Nov 15, 2011 at 3:47 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Alvaro Herrera <alvherre@commandprompt.com> wrote:

As Kevin says nearby it's likely that we could find some way to
rewrite the SLRU (clog and such) locking protocol using these new
things too.

Yeah, I really meant all SLRU, not just clog. And having seen what
Robert has done here, I'm kinda glad I haven't gotten around to
trying to reduce LW lock contention yet, even though we're getting
dangerously far into the release cycle -- I think it can be done
much better with the, er, flexibility offered by the FlexLock patch.

I've had a thought that the SLRU machinery could benefit from having
the concept of a "pin", which it currently doesn't. I'm not certain
whether that thought is correct.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Dan Ports

drkp@csail.mit.edu

about 14 years ago

In reply to: Kevin Grittner (#2)

Re: FlexLocks

On Tue, Nov 15, 2011 at 10:55:49AM -0600, Kevin Grittner wrote:

And I would be
surprised if some creative thinking didn't yield a far better FL
scheme for SSI than we can manage with existing LW locks.

One place I could see it being useful is for
SerializableFinishedListLock, which protects the queue of committed
sxacts that can't yet be cleaned up. When committing a transaction, it
gets added to the list, and then scans the queue to find and clean up
any sxacts that are no longer needed. If there's contention, we don't
need multiple backends doing that scan; it's enough for one to complete
it on everybody's behalf.

I haven't thought it through, but it may also help with the other
contention bottleneck on that lock: that every transaction needs to add
itself to the cleanup list when it commits.

Mostly unrelatedly, the other thing that's looking like it would be really
useful would be some support for atomic integer operations. This would
be useful for some SSI things like writableSxactCount, and some things
elsewhere like the strong lock count in the regular lock manager.
I've been toying with the idea of creating an AtomicInteger type with a
few operations like increment-and-get, compare-and-set, swap, etc. This
would be implemented using the appropriate hardware operations on
platforms that support them (x86_64, perhaps others) and fall back on a
spinlock implementation on other platforms. I'll probably give it a try
and see what it looks like, but if anyone has any thoughts, let me know.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

Simon Riggs

simon@2ndQuadrant.com

about 14 years ago

In reply to: Alvaro Herrera (#5)

Re: FlexLocks

On Tue, Nov 15, 2011 at 8:33 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

I'm not really enthused by the idea of completely rewriting lwlocks
for this. Seems like specialised code is likely to be best, as well as
having less collateral damage.

Well, the problem that I have with that is that we're going to end up
with a lot of specialized code, particularly around error recovery.
This code doesn't remove the need for ProcArrayLock to be taken in
exclusive mode, and I don't think there IS any easy way to remove the
need for that to happen sometimes. So we have to deal with the
possibility that an ERROR might occur while we hold the lock, which
means we have to release the lock and clean up our state. That means
every place that has a call to LWLockReleaseAll() will now also need
to cleanup ProperlyCleanUpProcArrayLockStuff(). And the next time we
need to add some kind of specialized lock, we'll need to do the same
thing again. It seems to me that that rapidly gets unmanageable, not
to mention *slow*. We need some kind of common infrastructure for
releasing locks, and this is an attempt to create such a thing. I'm
not opposed to doing it some other way, but I think doing each one as
a one-off isn't going to work out very well.

I agree. In fact, I would think that we should look into rewriting the
sync rep locking (and group commit) on top of flexlocks, not the other
way around. As Kevin says nearby it's likely that we could find some
way to rewrite the SLRU (clog and such) locking protocol using these new
things too.

I see the commonality between ProcArray locking and Sync Rep/ Group
Commit locking. It's basically the same design, so although it wasn't
my first thought, I agree.

I did originally write that using spinlocks, but that was objected to.
Presumably the same objection would hold here also, but if it doesn't
that's good.

Mixing the above 3 things together is enough for me; I just don't see
the reason to do a global search and replace on the lwlock name in
order to do that. This is 2 patches at same time, 1 we clearly need, 1
I'm not sure about. Perhaps some more explanation about the flexlocks
structure and design will smooth that unease.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#10

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 14 years ago

In reply to: Simon Riggs (#9)

Re: FlexLocks

Simon Riggs <simon@2ndQuadrant.com> wrote:

I just don't see the reason to do a global search and replace on
the lwlock name

I was going to review further before commenting on that, but since
it has now come up -- it seems odd that a source file which uses
only LW locks needs to change so much for the FlexLock
implementation. I'm not sure source code which uses the next layer
up from FlexLock should need to be aware of it quite so much. I
assume that this was done to avoid adding a layer to some code where
it could cause an unnecessary performance hit, but maybe it would be
worth just wrapping with a macro to isolate the levels where that's
all it takes.

For example, if these two macros were defined, predicate.c wouldn't
have needed any modifications, and I suspect that is true of many
other files (although possibly needing a few other macros):

#define LWLockId FlexLockId
#define LWLockHeldByMe(lock) FlexLockHeldByMe(lock)

Particularly with the function call it seems like it's a mistake to
assume that test will always be the same between LW locks and flex
locks. There may be a better way to do it than the above, but I
think a worthy goal would be to impose zero source code changes on
code which continues to use "traditional" lightweight locks.

-Kevin

#11

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Kevin Grittner (#10)

Re: FlexLocks

On Wed, Nov 16, 2011 at 10:26 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

For example, if these two macros were defined, predicate.c wouldn't
have needed any modifications, and I suspect that is true of many
other files (although possibly needing a few other macros):

#define LWLockId FlexLockId
#define LWLockHeldByMe(lock) FlexLockHeldByMe(lock)

Particularly with the function call it seems like it's a mistake to
assume that test will always be the same between LW locks and flex
locks. There may be a better way to do it than the above, but I
think a worthy goal would be to impose zero source code changes on
code which continues to use "traditional" lightweight locks.

Well, it would certainly be easy enough to add those macros, and I'm
not necessarily opposed to it, but I fear it could end up being a bit
confusing in the long run. If we adopt this infrastructure, then I
expect knowledge of different types of FlexLocks to gradually
propagate through the system. Now, you're always going to use
LWLockAcquire() and LWLockRelease() to acquire and release an LWLock,
but a FlexLockId isn't guaranteed to be an LWLockId - any given value
might also refer to a FlexLock of some other type. If we let everyone
continue to refer to those things as LWLockIds, then it seems like
only a matter of time before someone has a variable that's declared as
LWLockId but actually doesn't refer to an LWLock at all. I think it's
better to bite the bullet and do the renaming up front, rather than
having to think about it every time you modify some code that uses
LWLockId or LWLockHeldByMe and say to yourself, "oh, wait a minute, is
this really guaranteed to be an LWLock?"

For LWLockHeldByMe, a sensible compromise might be to add a function
that asserts that the FlexLockId passed as an argument is in fact
pointing to an LWLock, and then calls FlexLockHeldByMe() and returns
the result. That way you'd presumably noticed if you used the more
specific function when you needed the more general one (because,
hopefully, the regression tests would fail). But I'm not seeing any
obvious way of providing a similar degree of insulation against
abusing LWLockId.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12

Tom Lane

tgl@sss.pgh.pa.us

about 14 years ago

In reply to: Kevin Grittner (#10)

Re: FlexLocks

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Simon Riggs <simon@2ndQuadrant.com> wrote:

I just don't see the reason to do a global search and replace on
the lwlock name

I was going to review further before commenting on that, but since
it has now come up -- it seems odd that a source file which uses
only LW locks needs to change so much for the FlexLock
implementation.

Yeah, -1 on wideranging source changes for me too. There is no reason
that the current LWLock API need change. (I'm not saying that it has
to be same ABI though --- macro wrappers would be fine.)

regards, tom lane

#13

Tom Lane

tgl@sss.pgh.pa.us

about 14 years ago

In reply to: Robert Haas (#11)

Re: FlexLocks

Robert Haas <robertmhaas@gmail.com> writes:

Well, it would certainly be easy enough to add those macros, and I'm
not necessarily opposed to it, but I fear it could end up being a bit
confusing in the long run. If we adopt this infrastructure, then I
expect knowledge of different types of FlexLocks to gradually
propagate through the system. Now, you're always going to use
LWLockAcquire() and LWLockRelease() to acquire and release an LWLock,
but a FlexLockId isn't guaranteed to be an LWLockId - any given value
might also refer to a FlexLock of some other type. If we let everyone
continue to refer to those things as LWLockIds, then it seems like
only a matter of time before someone has a variable that's declared as
LWLockId but actually doesn't refer to an LWLock at all. I think it's
better to bite the bullet and do the renaming up front, rather than
having to think about it every time you modify some code that uses
LWLockId or LWLockHeldByMe and say to yourself, "oh, wait a minute, is
this really guaranteed to be an LWLock?"

In that case, I think you've chosen an unfortunate naming convention
and should rethink it. There is not any benefit to be gained from a
global search and replace here, and as somebody who spends quite enough
time dealing with cross-branch coding differences already, I'm going to
put my foot down about introducing a useless one.

Perhaps it would be better to think of this as "they're all lightweight
locks, but some have different locking policies". Or "we're taking a
different type of lock on this particular lock" --- that would match up
rather better with the way we think about heavyweight locks.

regards, tom lane

#14

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Tom Lane (#13)

Re: FlexLocks

On Wed, Nov 16, 2011 at 10:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Well, it would certainly be easy enough to add those macros, and I'm
not necessarily opposed to it, but I fear it could end up being a bit
confusing in the long run. If we adopt this infrastructure, then I
expect knowledge of different types of FlexLocks to gradually
propagate through the system. Now, you're always going to use
LWLockAcquire() and LWLockRelease() to acquire and release an LWLock,
but a FlexLockId isn't guaranteed to be an LWLockId - any given value
might also refer to a FlexLock of some other type. If we let everyone
continue to refer to those things as LWLockIds, then it seems like
only a matter of time before someone has a variable that's declared as
LWLockId but actually doesn't refer to an LWLock at all. I think it's
better to bite the bullet and do the renaming up front, rather than
having to think about it every time you modify some code that uses
LWLockId or LWLockHeldByMe and say to yourself, "oh, wait a minute, is
this really guaranteed to be an LWLock?"

In that case, I think you've chosen an unfortunate naming convention
and should rethink it. There is not any benefit to be gained from a
global search and replace here, and as somebody who spends quite enough
time dealing with cross-branch coding differences already, I'm going to
put my foot down about introducing a useless one.

Perhaps it would be better to think of this as "they're all lightweight
locks, but some have different locking policies". Or "we're taking a
different type of lock on this particular lock" --- that would match up
rather better with the way we think about heavyweight locks.

I struggled a lot with how to name this, and I'm not going to pretend
that what I came up with is necessarily ideal. But the basic idea
here is that all FlexLocks share the following properties in common:

- they are identified by a FlexLockId
- they are released by FlexLockReleaseAll
- they make use of the lwlock-related fields (renamed in the patch) in
PGPROC for sleep and wakeup handling
- they have a type indicator, a mutex, a retry flag, and a wait queue

But the following things are different per-type:

- acquire, conditional acquire (if any), and release functions
- available lock modes
- additional data fields that are part of the lock

Now, it seemed to me that if I was going to split the LWLock facililty
into two layers, either the upper layer could be LWLocks, or the lower
layer could be LWLocks, but they couldn't both be LWLocks. Since we
use LWLockAcquire() and LWLockRelease() all over the place but only
make reference to LWLockId in comparatively few places, it seemed to
me to be by far the less invasive renaming to make the upper layer be
LWLocks and the lower layer be something else.

Now maybe there is some better way to do this, but at the moment, I'm
not seeing it. If we call them all LWLocks, but only some of them
support LWLockAcquire(), then that's going to be pretty weird. The
situation is not really analagous to heavy-weight locks, where every
lock supports every lock mode, but in practice only table locks make
use of them all. In this particular case, we do not want to clutter
up the vanilla LWLock implementation with a series of special cases
that are only useful for a minority of locks in the system. That will
cause them to stop being lightweight, which misses the point; and it
will be ugly as hell, because the exact frammishes needed will
doubtless vary from one lock to another, and having just one lock type
that supports every single one of those frammishes is certainly a bad
idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15

Greg Stark

stark@mit.edu

about 14 years ago

In reply to: Robert Haas (#11)

Re: FlexLocks

On Wed, Nov 16, 2011 at 3:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Now, you're always going to use
LWLockAcquire() and LWLockRelease() to acquire and release an LWLock,
but a FlexLockId isn't guaranteed to be an LWLockId - any given value
might also refer to a FlexLock of some other type. If we let everyone
continue to refer to those things as LWLockIds, then it seems like
only a matter of time before someone has a variable that's declared as
LWLockId but actually doesn't refer to an LWLock at all. I think it's
better to bite the bullet and do the renaming up front, rather than
having to think about it every time you modify some code that uses
LWLockId or LWLockHeldByMe and say to yourself, "oh, wait a minute, is
this really guaranteed to be an LWLock?"

But that's an advantage to having a distinct API layer for LWLock
instead of having callers directly call FlexLock methods. The LWLock
macros can AssertMacro that the lockid they were passed are actually
LWLocks and not some other type of lock. That would require assigning
FlexLockIds that are recognizably LWLocks but that's not implausible
is it?

--
greg

#16

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 14 years ago

In reply to: Robert Haas (#14)

Re: FlexLocks

Robert Haas <robertmhaas@gmail.com> wrote:

Now maybe there is some better way to do this, but at the moment,
I'm not seeing it. If we call them all LWLocks, but only some of
them support LWLockAcquire(), then that's going to be pretty
weird.

Is there any way to typedef our way out of it, such that a LWLock
*is a* FlexLock, but a FlexLock isn't a LWLock? If we could do
that, you couldn't use just a plain old FlexLock in LWLockAcquire(),
but you could do the cleanups, etc., that you want.

-Kevin

#17

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Greg Stark (#15)

Re: FlexLocks

On Wed, Nov 16, 2011 at 11:14 AM, Greg Stark <stark@mit.edu> wrote:

On Wed, Nov 16, 2011 at 3:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Now, you're always going to use
LWLockAcquire() and LWLockRelease() to acquire and release an LWLock,
but a FlexLockId isn't guaranteed to be an LWLockId - any given value
might also refer to a FlexLock of some other type. If we let everyone
continue to refer to those things as LWLockIds, then it seems like
only a matter of time before someone has a variable that's declared as
LWLockId but actually doesn't refer to an LWLock at all. I think it's
better to bite the bullet and do the renaming up front, rather than
having to think about it every time you modify some code that uses
LWLockId or LWLockHeldByMe and say to yourself, "oh, wait a minute, is
this really guaranteed to be an LWLock?"

But that's an advantage to having a distinct API layer for LWLock
instead of having callers directly call FlexLock methods. The LWLock
macros can AssertMacro that the lockid they were passed are actually
LWLocks and not some other type of lock. That would require assigning
FlexLockIds that are recognizably LWLocks but that's not implausible
is it?

Well, that works for the most part. You still need a few generic
functions, like FlexLockReleaseAll(), which releases all FlexLocks of
all types, not just those of some particular type. And it doesn't
solve the problem with FlexLockId, which can potentially refer to a
FlexLock of any type, not just a LWLock.

I think we might be getting slightly more excited about this problem
than it actually deserves. Excluding lwlock.{c,h}, the new files
added by this patch, and the documentation changes, this patch adds
103 lines and removes 101. We can uncontroversially reduce each
numbers by 14 by adding a separate LWLockHeldByMe() function that does
the same thing as FlexLockHeldByMe() but also asserts the lock type.
That would leave us adding 89 lines of code and removing 87.

If we (against my better judgement) take the position that we must
continue to use LWLockId rather than FlexLockId as the type name in
any place that only treats with LWLocks we could reduce each of those
numbers by an additional 34, giving new totals of 55 and 53 lines of
changes outside the lwlock/flexlock code itself rather than 89 and 87.
I humbly submit that this is not really enough to get excited about.
We've make far more sweeping notational changes than that more than
once - even, I think, with some regularity.

This may seem invasive because it's touching LWLocks, and we use those
everywhere, but in practice the code footprint is very small because
typical usage is just LWLockAcquire(BlahLock) and then
LWLockRelease(BlahLock). And I'm not proposing to change that usage
in any way; avoiding any change in that area was, in fact, one of my
main design goals.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Kevin Grittner (#16)

Re: FlexLocks

On Wed, Nov 16, 2011 at 11:17 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

Now maybe there is some better way to do this, but at the moment,
I'm not seeing it. If we call them all LWLocks, but only some of
them support LWLockAcquire(), then that's going to be pretty
weird.

Is there any way to typedef our way out of it, such that a LWLock
*is a* FlexLock, but a FlexLock isn't a LWLock? If we could do
that, you couldn't use just a plain old FlexLock in LWLockAcquire(),
but you could do the cleanups, etc., that you want.

Well, if we just say:

typedef FlexLockId LWLockId;

...that's about equivalent to the #define from the compiler's point of
view. We could alternatively change one or the other of them to be a
struct with one member, but I think the cure might be worse than the
disease. By my count, we are talking about saving perhaps as many as
34 lines of code changes here, and that's only if complicating the
type handling doesn't require any changes to places that are untouched
at present, which I suspect it would.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 14 years ago

In reply to: Robert Haas (#18)

Re: FlexLocks

Robert Haas <robertmhaas@gmail.com> wrote:

Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

Is there any way to typedef our way out of it [?]

Well, if we just say:

typedef FlexLockId LWLockId;

...that's about equivalent to the #define from the compiler's
point of view.

Bummer -- I was hoping there was some equivalent to "subclassing"
that I just didn't know about. :-(

We could alternatively change one or the other of them to be a
struct with one member, but I think the cure might be worse than
the disease. By my count, we are talking about saving perhaps as
many as 34 lines of code changes here, and that's only if
complicating the type handling doesn't require any changes to
places that are untouched at present, which I suspect it would.

So I stepped through all the changes of this type, and I notice that
most of them are in areas where we've talked about likely benefits
of creating new FlexLock variants instead of staying with LWLocks;
if any of that is done (as seems likely), it further reduces the
impact from 34 lines. If we take care of LWLockHeldByMe() as you
describe, I'll concede the FlexLockId changes.

-Kevin

#20

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Kevin Grittner (#19)

2 attachment(s)

Re: FlexLocks

On Wed, Nov 16, 2011 at 12:25 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

We could alternatively change one or the other of them to be a
struct with one member, but I think the cure might be worse than
the disease. By my count, we are talking about saving perhaps as
many as 34 lines of code changes here, and that's only if
complicating the type handling doesn't require any changes to
places that are untouched at present, which I suspect it would.

So I stepped through all the changes of this type, and I notice that
most of them are in areas where we've talked about likely benefits
of creating new FlexLock variants instead of staying with LWLocks;
if any of that is done (as seems likely), it further reduces the
impact from 34 lines. If we take care of LWLockHeldByMe() as you
describe, I'll concede the FlexLockId changes.

Updated patches attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

flexlock-v2.patchapplication/octet-stream; name=flexlock-v2.patchDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 8dc3054..51b24d0 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -105,7 +105,7 @@ typedef struct pgssEntry
  */
 typedef struct pgssSharedState
 {
-	LWLockId	lock;			/* protects hashtable search/modification */
+	FlexLockId	lock;			/* protects hashtable search/modification */
 	int			query_size;		/* max query length in bytes */
 } pgssSharedState;
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d1e628f..8517b36 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6199,14 +6199,14 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
      </varlistentry>
 
      <varlistentry>
-      <term><varname>trace_lwlocks</varname> (<type>boolean</type>)</term>
+      <term><varname>trace_flexlocks</varname> (<type>boolean</type>)</term>
       <indexterm>
-       <primary><varname>trace_lwlocks</> configuration parameter</primary>
+       <primary><varname>trace_flexlocks</> configuration parameter</primary>
       </indexterm>
       <listitem>
        <para>
-        If on, emit information about lightweight lock usage.  Lightweight
-        locks are intended primarily to provide mutual exclusion of access
+        If on, emit information about FlexLock usage.  FlexLocks
+        are intended primarily to provide mutual exclusion of access
         to shared-memory data structures.
        </para>
        <para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b9dc1d2..98ed0d3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1724,49 +1724,49 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
       or kilobytes of memory used for an internal sort.</entry>
     </row>
     <row>
-     <entry>lwlock-acquire</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock has been acquired.
-      arg0 is the LWLock's ID.
-      arg1 is the requested lock mode, either exclusive or shared.</entry>
+     <entry>flexlock-acquire</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock has been acquired.
+      arg0 is the FlexLock's ID.
+      arg1 is the requested lock mode.</entry>
     </row>
     <row>
-     <entry>lwlock-release</entry>
-     <entry>(LWLockId)</entry>
-     <entry>Probe that fires when an LWLock has been released (but note
+     <entry>flexlock-release</entry>
+     <entry>(FlexLockId)</entry>
+     <entry>Probe that fires when a FlexLock has been released (but note
       that any released waiters have not yet been awakened).
-      arg0 is the LWLock's ID.</entry>
+      arg0 is the FlexLock's ID.</entry>
     </row>
     <row>
-     <entry>lwlock-wait-start</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock was not immediately available and
+     <entry>flexlock-wait-start</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock was not immediately available and
       a server process has begun to wait for the lock to become available.
-      arg0 is the LWLock's ID.
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
-     <entry>lwlock-wait-done</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
+     <entry>flexlock-wait-done</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
      <entry>Probe that fires when a server process has been released from its
-      wait for an LWLock (it does not actually have the lock yet).
-      arg0 is the LWLock's ID.
+      wait for an FlexLock (it does not actually have the lock yet).
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
-     <entry>lwlock-condacquire</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock was successfully acquired when the
-      caller specified no waiting.
-      arg0 is the LWLock's ID.
+     <entry>flexlock-condacquire</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock was successfully acquired when
+      the caller specified no waiting.
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
-     <entry>lwlock-condacquire-fail</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock was not successfully acquired when
-      the caller specified no waiting.
-      arg0 is the LWLock's ID.
+     <entry>flexlock-condacquire-fail</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock was not successfully acquired
+      when the caller specified no waiting.
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
@@ -1813,11 +1813,11 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
      <entry>unsigned int</entry>
     </row>
     <row>
-     <entry>LWLockId</entry>
+     <entry>FlexLockId</entry>
      <entry>int</entry>
     </row>
     <row>
-     <entry>LWLockMode</entry>
+     <entry>FlexLockMode</entry>
      <entry>int</entry>
     </row>
     <row>
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index f7caa34..09d5862 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -151,7 +151,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(bool));		/* page_dirty[] */
 	sz += MAXALIGN(nslots * sizeof(int));		/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));		/* page_lru_count[] */
-	sz += MAXALIGN(nslots * sizeof(LWLockId));	/* buffer_locks[] */
+	sz += MAXALIGN(nslots * sizeof(FlexLockId));		/* buffer_locks[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -161,7 +161,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLockId ctllock, const char *subdir)
+			  FlexLockId ctllock, const char *subdir)
 {
 	SlruShared	shared;
 	bool		found;
@@ -202,8 +202,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		offset += MAXALIGN(nslots * sizeof(int));
 		shared->page_lru_count = (int *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(int));
-		shared->buffer_locks = (LWLockId *) (ptr + offset);
-		offset += MAXALIGN(nslots * sizeof(LWLockId));
+		shared->buffer_locks = (FlexLockId *) (ptr + offset);
+		offset += MAXALIGN(nslots * sizeof(FlexLockId));
 
 		if (nlsns > 0)
 		{
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 477982d..d5d1ee9 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -318,9 +318,9 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 	gxact->proc.roleId = owner;
 	gxact->proc.inCommit = false;
 	gxact->proc.vacuumFlags = 0;
-	gxact->proc.lwWaiting = false;
-	gxact->proc.lwExclusive = false;
-	gxact->proc.lwWaitLink = NULL;
+	gxact->proc.flWaitResult = 0;
+	gxact->proc.flWaitMode = 0;
+	gxact->proc.flWaitLink = NULL;
 	gxact->proc.waitLock = NULL;
 	gxact->proc.waitProcLock = NULL;
 	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index c151d3b..19b708c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2248,7 +2248,7 @@ AbortTransaction(void)
 	 * Releasing LW locks is critical since we might try to grab them again
 	 * while cleaning up!
 	 */
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 
 	/* Clean up buffer I/O and buffer context locks, too */
 	AbortBufferIO();
@@ -4138,7 +4138,7 @@ AbortSubTransaction(void)
 	 * FIXME This may be incorrect --- Are there some locks we should keep?
 	 * Buffer locks, for example?  I don't think so but I'm not sure.
 	 */
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 
 	AbortBufferIO();
 	UnlockBuffers();
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 6bf2421..9ceee91 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -562,13 +562,13 @@ bootstrap_signals(void)
  * Begin shutdown of an auxiliary process.	This is approximately the equivalent
  * of ShutdownPostgres() in postinit.c.  We can't run transactions in an
  * auxiliary process, so most of the work of AbortTransaction() is not needed,
- * but we do need to make sure we've released any LWLocks we are holding.
+ * but we do need to make sure we've released any flex locks we are holding.
  * (This is only critical during an error exit.)
  */
 static void
 ShutdownAuxiliaryProcess(int code, Datum arg)
 {
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index cacedab..f33f573 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -176,9 +176,10 @@ BackgroundWriterMain(void)
 		/*
 		 * These operations are really just a minimal subset of
 		 * AbortTransaction().	We don't have very many resources to worry
-		 * about in bgwriter, but we do have LWLocks, buffers, and temp files.
+		 * about in bgwriter, but we do have flex locks, buffers, and temp
+		 * files.
 		 */
-		LWLockReleaseAll();
+		FlexLockReleaseAll();
 		AbortBufferIO();
 		UnlockBuffers();
 		/* buffer pins are released here: */
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e9ae1e8..49f07a7 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -281,9 +281,10 @@ CheckpointerMain(void)
 		/*
 		 * These operations are really just a minimal subset of
 		 * AbortTransaction().	We don't have very many resources to worry
-		 * about in checkpointer, but we do have LWLocks, buffers, and temp files.
+		 * about in checkpointer, but we do have flex locks, buffers, and temp
+		 * files.
 		 */
-		LWLockReleaseAll();
+		FlexLockReleaseAll();
 		AbortBufferIO();
 		UnlockBuffers();
 		/* buffer pins are released here: */
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6758083..14b4368 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -109,6 +109,7 @@
 #include "postmaster/syslogger.h"
 #include "replication/walsender.h"
 #include "storage/fd.h"
+#include "storage/flexlock_internals.h"
 #include "storage/ipc.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
@@ -404,8 +405,6 @@ typedef struct
 typedef int InheritableSocket;
 #endif
 
-typedef struct LWLock LWLock;	/* ugly kluge */
-
 /*
  * Structure contains all variables passed to exec:ed backends
  */
@@ -426,7 +425,7 @@ typedef struct
 	slock_t    *ShmemLock;
 	VariableCache ShmemVariableCache;
 	Backend    *ShmemBackendArray;
-	LWLock	   *LWLockArray;
+	FlexLock   *FlexLockArray;
 	slock_t    *ProcStructLock;
 	PROC_HDR   *ProcGlobal;
 	PGPROC	   *AuxiliaryProcs;
@@ -4675,7 +4674,6 @@ MaxLivePostmasterChildren(void)
  * functions
  */
 extern slock_t *ShmemLock;
-extern LWLock *LWLockArray;
 extern slock_t *ProcStructLock;
 extern PGPROC *AuxiliaryProcs;
 extern PMSignalData *PMSignalState;
@@ -4720,7 +4718,7 @@ save_backend_variables(BackendParameters *param, Port *port,
 	param->ShmemVariableCache = ShmemVariableCache;
 	param->ShmemBackendArray = ShmemBackendArray;
 
-	param->LWLockArray = LWLockArray;
+	param->FlexLockArray = FlexLockArray;
 	param->ProcStructLock = ProcStructLock;
 	param->ProcGlobal = ProcGlobal;
 	param->AuxiliaryProcs = AuxiliaryProcs;
@@ -4943,7 +4941,7 @@ restore_backend_variables(BackendParameters *param, Port *port)
 	ShmemVariableCache = param->ShmemVariableCache;
 	ShmemBackendArray = param->ShmemBackendArray;
 
-	LWLockArray = param->LWLockArray;
+	FlexLockArray = param->FlexLockArray;
 	ProcStructLock = param->ProcStructLock;
 	ProcGlobal = param->ProcGlobal;
 	AuxiliaryProcs = param->AuxiliaryProcs;
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 157728e..587443d 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -167,9 +167,9 @@ WalWriterMain(void)
 		/*
 		 * These operations are really just a minimal subset of
 		 * AbortTransaction().	We don't have very many resources to worry
-		 * about in walwriter, but we do have LWLocks, and perhaps buffers?
+		 * about in walwriter, but we do have flex locks, and perhaps buffers?
 		 */
-		LWLockReleaseAll();
+		FlexLockReleaseAll();
 		AbortBufferIO();
 		UnlockBuffers();
 		/* buffer pins are released here: */
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e59af33..07356ec 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -141,7 +141,7 @@ PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
 	{
 		BufferTag	newTag;		/* identity of requested block */
 		uint32		newHash;	/* hash value for newTag */
-		LWLockId	newPartitionLock;	/* buffer partition lock for it */
+		FlexLockId	newPartitionLock;	/* buffer partition lock for it */
 		int			buf_id;
 
 		/* create a tag so we can lookup the buffer */
@@ -512,10 +512,10 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 {
 	BufferTag	newTag;			/* identity of requested block */
 	uint32		newHash;		/* hash value for newTag */
-	LWLockId	newPartitionLock;		/* buffer partition lock for it */
+	FlexLockId	newPartitionLock;		/* buffer partition lock for it */
 	BufferTag	oldTag;			/* previous identity of selected buffer */
 	uint32		oldHash;		/* hash value for oldTag */
-	LWLockId	oldPartitionLock;		/* buffer partition lock for it */
+	FlexLockId	oldPartitionLock;		/* buffer partition lock for it */
 	BufFlags	oldFlags;
 	int			buf_id;
 	volatile BufferDesc *buf;
@@ -855,7 +855,7 @@ InvalidateBuffer(volatile BufferDesc *buf)
 {
 	BufferTag	oldTag;
 	uint32		oldHash;		/* hash value for oldTag */
-	LWLockId	oldPartitionLock;		/* buffer partition lock for it */
+	FlexLockId	oldPartitionLock;		/* buffer partition lock for it */
 	BufFlags	oldFlags;
 
 	/* Save the original buffer tag before dropping the spinlock */
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 56c0bd8..02ee8d8 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -113,7 +113,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, SUBTRANSShmemSize());
 		size = add_size(size, TwoPhaseShmemSize());
 		size = add_size(size, MultiXactShmemSize());
-		size = add_size(size, LWLockShmemSize());
+		size = add_size(size, FlexLockShmemSize());
 		size = add_size(size, ProcArrayShmemSize());
 		size = add_size(size, BackendStatusShmemSize());
 		size = add_size(size, SInvalShmemSize());
@@ -179,7 +179,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	 * needed for InitShmemIndex.
 	 */
 	if (!IsUnderPostmaster)
-		CreateLWLocks();
+		CreateFlexLocks();
 
 	/*
 	 * Set up shmem.c index hashtable
diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index e12a854..3730e51 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/storage/lmgr
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o predicate.o
+OBJS = flexlock.o lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o \
+	predicate.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/storage/lmgr/flexlock.c b/src/backend/storage/lmgr/flexlock.c
new file mode 100644
index 0000000..7f657b3
--- /dev/null
+++ b/src/backend/storage/lmgr/flexlock.c
@@ -0,0 +1,353 @@
+/*-------------------------------------------------------------------------
+ *
+ * flexlock.c
+ *	  Low-level routines for managing flex locks.
+ *
+ * Flex locks are intended primarily to provide mutual exclusion of access
+ * to shared-memory data structures.  Most, but not all, flex locks are
+ * lightweight locks (LWLocks).  This file contains support routines that
+ * are used for all types of flex locks, including lwlocks.  User-level
+ * locking should be done with the full lock manager --- which depends on
+ * LWLocks to protect its shared state.
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/lmgr/flexlock.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "access/clog.h"
+#include "access/multixact.h"
+#include "access/subtrans.h"
+#include "commands/async.h"
+#include "storage/flexlock_internals.h"
+#include "storage/lwlock.h"
+#include "storage/predicate.h"
+#include "storage/proc.h"
+#include "storage/spin.h"
+#include "utils/elog.h"
+
+/*
+ * We use this structure to keep track of flex locks held, for release
+ * during error recovery.  The maximum size could be determined at runtime
+ * if necessary, but it seems unlikely that more than a few locks could
+ * ever be held simultaneously.
+ */
+#define MAX_SIMUL_FLEXLOCKS	100
+
+int	num_held_flexlocks = 0;
+FlexLockId held_flexlocks[MAX_SIMUL_FLEXLOCKS];
+
+static int	lock_addin_request = 0;
+static bool lock_addin_request_allowed = true;
+
+#ifdef LOCK_DEBUG
+bool		Trace_flexlocks = false;
+#endif
+
+/*
+ * This points to the array of FlexLocks in shared memory.  Backends inherit
+ * the pointer by fork from the postmaster (except in the EXEC_BACKEND case,
+ * where we have special measures to pass it down).
+ */
+FlexLockPadded *FlexLockArray = NULL;
+
+/* We use the ShmemLock spinlock to protect LWLockAssign */
+extern slock_t *ShmemLock;
+
+static void FlexLockInit(FlexLock *flex, char locktype);
+
+/*
+ * Compute number of FlexLocks to allocate.
+ */
+int
+NumFlexLocks(void)
+{
+	int			numLocks;
+
+	/*
+	 * Possibly this logic should be spread out among the affected modules,
+	 * the same way that shmem space estimation is done.  But for now, there
+	 * are few enough users of FlexLocks that we can get away with just keeping
+	 * the knowledge here.
+	 */
+
+	/* Predefined FlexLocks */
+	numLocks = (int) NumFixedFlexLocks;
+
+	/* bufmgr.c needs two for each shared buffer */
+	numLocks += 2 * NBuffers;
+
+	/* proc.c needs one for each backend or auxiliary process */
+	numLocks += MaxBackends + NUM_AUXILIARY_PROCS;
+
+	/* clog.c needs one per CLOG buffer */
+	numLocks += NUM_CLOG_BUFFERS;
+
+	/* subtrans.c needs one per SubTrans buffer */
+	numLocks += NUM_SUBTRANS_BUFFERS;
+
+	/* multixact.c needs two SLRU areas */
+	numLocks += NUM_MXACTOFFSET_BUFFERS + NUM_MXACTMEMBER_BUFFERS;
+
+	/* async.c needs one per Async buffer */
+	numLocks += NUM_ASYNC_BUFFERS;
+
+	/* predicate.c needs one per old serializable xid buffer */
+	numLocks += NUM_OLDSERXID_BUFFERS;
+
+	/*
+	 * Add any requested by loadable modules; for backwards-compatibility
+	 * reasons, allocate at least NUM_USER_DEFINED_FLEXLOCKS of them even if
+	 * there are no explicit requests.
+	 */
+	lock_addin_request_allowed = false;
+	numLocks += Max(lock_addin_request, NUM_USER_DEFINED_FLEXLOCKS);
+
+	return numLocks;
+}
+
+
+/*
+ * RequestAddinFlexLocks
+ *		Request that extra FlexLocks be allocated for use by
+ *		a loadable module.
+ *
+ * This is only useful if called from the _PG_init hook of a library that
+ * is loaded into the postmaster via shared_preload_libraries.	Once
+ * shared memory has been allocated, calls will be ignored.  (We could
+ * raise an error, but it seems better to make it a no-op, so that
+ * libraries containing such calls can be reloaded if needed.)
+ */
+void
+RequestAddinFlexLocks(int n)
+{
+	if (IsUnderPostmaster || !lock_addin_request_allowed)
+		return;					/* too late */
+	lock_addin_request += n;
+}
+
+
+/*
+ * Compute shmem space needed for FlexLocks.
+ */
+Size
+FlexLockShmemSize(void)
+{
+	Size		size;
+	int			numLocks = NumFlexLocks();
+
+	/* Space for the FlexLock array. */
+	size = mul_size(numLocks, FLEX_LOCK_BYTES);
+
+	/* Space for dynamic allocation counter, plus room for alignment. */
+	size = add_size(size, 2 * sizeof(int) + FLEX_LOCK_BYTES);
+
+	return size;
+}
+
+/*
+ * Allocate shmem space for FlexLocks and initialize the locks.
+ */
+void
+CreateFlexLocks(void)
+{
+	int			numLocks = NumFlexLocks();
+	Size		spaceLocks = FlexLockShmemSize();
+	FlexLockPadded *lock;
+	int		   *FlexLockCounter;
+	char	   *ptr;
+	int			id;
+
+	/* Allocate and zero space */
+	ptr = (char *) ShmemAlloc(spaceLocks);
+	memset(ptr, 0, spaceLocks);
+
+	/* Leave room for dynamic allocation counter */
+	ptr += 2 * sizeof(int);
+
+	/* Ensure desired alignment of FlexLock array */
+	ptr += FLEX_LOCK_BYTES - ((uintptr_t) ptr) % FLEX_LOCK_BYTES;
+
+	FlexLockArray = (FlexLockPadded *) ptr;
+
+	/* All of the "fixed" FlexLocks are LWLocks. */
+	for (id = 0, lock = FlexLockArray; id < NumFixedFlexLocks; id++, lock++)
+		FlexLockInit(&lock->flex, FLEXLOCK_TYPE_LWLOCK);
+
+	/*
+	 * Initialize the dynamic-allocation counter, which is stored just before
+	 * the first FlexLock.
+	 */
+	FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+	FlexLockCounter[0] = (int) NumFixedFlexLocks;
+	FlexLockCounter[1] = numLocks;
+}
+
+/*
+ * FlexLockAssign - assign a dynamically-allocated FlexLock number
+ *
+ * We interlock this using the same spinlock that is used to protect
+ * ShmemAlloc().  Interlocking is not really necessary during postmaster
+ * startup, but it is needed if any user-defined code tries to allocate
+ * LWLocks after startup.
+ */
+FlexLockId
+FlexLockAssign(char locktype)
+{
+	FlexLockId	result;
+
+	/* use volatile pointer to prevent code rearrangement */
+	volatile int *FlexLockCounter;
+
+	FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+	SpinLockAcquire(ShmemLock);
+	if (FlexLockCounter[0] >= FlexLockCounter[1])
+	{
+		SpinLockRelease(ShmemLock);
+		elog(ERROR, "no more FlexLockIds available");
+	}
+	result = (FlexLockId) (FlexLockCounter[0]++);
+	SpinLockRelease(ShmemLock);
+
+	FlexLockInit(&FlexLockArray[result].flex, locktype);
+
+	return result;
+}
+
+/*
+ * Initialize a FlexLock.
+ */
+static void
+FlexLockInit(FlexLock *flex, char locktype)
+{
+	SpinLockInit(&flex->mutex);
+	flex->releaseOK = true;
+	flex->locktype = locktype;
+	/*
+	 * We might need to think a little harder about what should happen here
+	 * if some future type of FlexLock requires more initialization than this.
+	 * For now, this will suffice.
+	 */
+}
+
+/*
+ * Remove lock from list of locks held.  Usually, but not always, it will
+ * be the latest-acquired lock; so search array backwards.
+ */
+void
+FlexLockRemember(FlexLockId id)
+{
+	if (num_held_flexlocks >= MAX_SIMUL_FLEXLOCKS)
+		elog(PANIC, "too many FlexLocks taken");
+	held_flexlocks[num_held_flexlocks++] = id;
+}
+
+/*
+ * Remove lock from list of locks held.  Usually, but not always, it will
+ * be the latest-acquired lock; so search array backwards.
+ */
+void
+FlexLockForget(FlexLockId id)
+{
+	int			i;
+
+	for (i = num_held_flexlocks; --i >= 0;)
+	{
+		if (id == held_flexlocks[i])
+			break;
+	}
+	if (i < 0)
+		elog(ERROR, "lock %d is not held", (int) id);
+	num_held_flexlocks--;
+	for (; i < num_held_flexlocks; i++)
+		held_flexlocks[i] = held_flexlocks[i + 1];
+}
+
+/*
+ * FlexLockWait - wait until awakened
+ *
+ * Since we share the process wait semaphore with the regular lock manager
+ * and ProcWaitForSignal, and we may need to acquire a FlexLock while one of
+ * those is pending, it is possible that we get awakened for a reason other
+ * than being signaled by a FlexLock release.  If so, loop back and wait again.
+ *
+ * Returns the number of "extra" waits absorbed so that, once we've gotten the
+ * FlexLock, we can re-increment the sema by the number of additional signals
+ * received, so that the lock manager or signal manager will see the received
+ * signal when it next waits.
+ */
+int
+FlexLockWait(FlexLockId id, int mode)
+{
+	int		extraWaits = 0;
+
+	FlexLockDebug("LWLockAcquire", id, "waiting");
+	TRACE_POSTGRESQL_FLEXLOCK_WAIT_START(id, mode);
+
+	for (;;)
+   	{
+		/* "false" means cannot accept cancel/die interrupt here. */
+		PGSemaphoreLock(&MyProc->sem, false);
+		/*
+		 * FLEXTODO: I think we should return this, instead of ignoring it.
+		 * Any non-zero value means "wake up".
+		 */
+		if (MyProc->flWaitResult)
+			break;
+		extraWaits++;
+   	}
+
+	TRACE_POSTGRESQL_FLEXLOCK_WAIT_DONE(id, mode);
+	FlexLockDebug("LWLockAcquire", id, "awakened");
+
+	return extraWaits;
+}
+
+/*
+ * FlexLockReleaseAll - release all currently-held locks
+ *
+ * Used to clean up after ereport(ERROR). An important difference between this
+ * function and retail LWLockRelease calls is that InterruptHoldoffCount is
+ * unchanged by this operation.  This is necessary since InterruptHoldoffCount
+ * has been set to an appropriate level earlier in error recovery. We could
+ * decrement it below zero if we allow it to drop for each released lock!
+ */
+void
+FlexLockReleaseAll(void)
+{
+	while (num_held_flexlocks > 0)
+	{
+		HOLD_INTERRUPTS();		/* match the upcoming RESUME_INTERRUPTS */
+
+		/*
+		 * FLEXTODO: When we have multiple types of flex locks, this will
+		 * need to call the appropriate release function for each lock type.
+		 */
+		LWLockRelease(held_flexlocks[num_held_flexlocks - 1]);
+	}
+}
+
+/*
+ * FlexLockHeldByMe - test whether my process currently holds a lock
+ *
+ * This is meant as debug support only.  We do not consider the lock mode.
+ */
+bool
+FlexLockHeldByMe(FlexLockId id)
+{
+	int			i;
+
+	for (i = 0; i < num_held_flexlocks; i++)
+	{
+		if (held_flexlocks[i] == id)
+			return true;
+	}
+	return false;
+}
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 905502f..adc5fd9 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -591,7 +591,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	bool		found;
 	ResourceOwner owner;
 	uint32		hashcode;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	int			status;
 	bool		log_lock = false;
 
@@ -1546,7 +1546,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	LOCALLOCK  *locallock;
 	LOCK	   *lock;
 	PROCLOCK   *proclock;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	bool		wakeupNeeded;
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
@@ -1912,7 +1912,7 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 	 */
 	for (partition = 0; partition < NUM_LOCK_PARTITIONS; partition++)
 	{
-		LWLockId	partitionLock = FirstLockMgrLock + partition;
+		FlexLockId	partitionLock = FirstLockMgrLock + partition;
 		SHM_QUEUE  *procLocks = &(MyProc->myProcLocks[partition]);
 
 		proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks,
@@ -2197,7 +2197,7 @@ static bool
 FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag,
 					  uint32 hashcode)
 {
-	LWLockId		partitionLock = LockHashPartitionLock(hashcode);
+	FlexLockId		partitionLock = LockHashPartitionLock(hashcode);
 	Oid				relid = locktag->locktag_field2;
 	uint32			i;
 
@@ -2281,7 +2281,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	LockMethod		lockMethodTable = LockMethods[DEFAULT_LOCKMETHOD];
 	LOCKTAG		   *locktag = &locallock->tag.lock;
 	PROCLOCK	   *proclock = NULL;
-	LWLockId		partitionLock = LockHashPartitionLock(locallock->hashcode);
+	FlexLockId		partitionLock = LockHashPartitionLock(locallock->hashcode);
 	Oid				relid = locktag->locktag_field2;
 	uint32			f;
 
@@ -2382,7 +2382,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
 	SHM_QUEUE  *procLocks;
 	PROCLOCK   *proclock;
 	uint32		hashcode;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	int			count = 0;
 	int			fast_count = 0;
 
@@ -2593,7 +2593,7 @@ LockRefindAndRelease(LockMethod lockMethodTable, PGPROC *proc,
 	PROCLOCKTAG proclocktag;
 	uint32		hashcode;
 	uint32		proclock_hashcode;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	bool		wakeupNeeded;
 
 	hashcode = LockTagHashCode(locktag);
@@ -2827,7 +2827,7 @@ PostPrepare_Locks(TransactionId xid)
 	 */
 	for (partition = 0; partition < NUM_LOCK_PARTITIONS; partition++)
 	{
-		LWLockId	partitionLock = FirstLockMgrLock + partition;
+		FlexLockId	partitionLock = FirstLockMgrLock + partition;
 		SHM_QUEUE  *procLocks = &(MyProc->myProcLocks[partition]);
 
 		proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks,
@@ -3342,7 +3342,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 	uint32		hashcode;
 	uint32		proclock_hashcode;
 	int			partition;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	LockMethod	lockMethodTable;
 
 	Assert(len == sizeof(TwoPhaseLockRecord));
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 079eb29..ce6c931 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -21,74 +21,23 @@
  */
 #include "postgres.h"
 
-#include "access/clog.h"
-#include "access/multixact.h"
-#include "access/subtrans.h"
-#include "commands/async.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "storage/flexlock_internals.h"
 #include "storage/ipc.h"
-#include "storage/predicate.h"
 #include "storage/proc.h"
 #include "storage/spin.h"
 
-
-/* We use the ShmemLock spinlock to protect LWLockAssign */
-extern slock_t *ShmemLock;
-
-
 typedef struct LWLock
 {
-	slock_t		mutex;			/* Protects LWLock and queue of PGPROCs */
-	bool		releaseOK;		/* T if ok to release waiters */
+	FlexLock	flex;			/* common FlexLock infrastructure */
 	char		exclusive;		/* # of exclusive holders (0 or 1) */
 	int			shared;			/* # of shared holders (0..MaxBackends) */
-	PGPROC	   *head;			/* head of list of waiting PGPROCs */
-	PGPROC	   *tail;			/* tail of list of waiting PGPROCs */
-	/* tail is undefined when head is NULL */
 } LWLock;
 
-/*
- * All the LWLock structs are allocated as an array in shared memory.
- * (LWLockIds are indexes into the array.)	We force the array stride to
- * be a power of 2, which saves a few cycles in indexing, but more
- * importantly also ensures that individual LWLocks don't cross cache line
- * boundaries.	This reduces cache contention problems, especially on AMD
- * Opterons.  (Of course, we have to also ensure that the array start
- * address is suitably aligned.)
- *
- * LWLock is between 16 and 32 bytes on all known platforms, so these two
- * cases are sufficient.
- */
-#define LWLOCK_PADDED_SIZE	(sizeof(LWLock) <= 16 ? 16 : 32)
-
-typedef union LWLockPadded
-{
-	LWLock		lock;
-	char		pad[LWLOCK_PADDED_SIZE];
-} LWLockPadded;
-
-/*
- * This points to the array of LWLocks in shared memory.  Backends inherit
- * the pointer by fork from the postmaster (except in the EXEC_BACKEND case,
- * where we have special measures to pass it down).
- */
-NON_EXEC_STATIC LWLockPadded *LWLockArray = NULL;
-
-
-/*
- * We use this structure to keep track of locked LWLocks for release
- * during error recovery.  The maximum size could be determined at runtime
- * if necessary, but it seems unlikely that more than a few locks could
- * ever be held simultaneously.
- */
-#define MAX_SIMUL_LWLOCKS	100
-
-static int	num_held_lwlocks = 0;
-static LWLockId held_lwlocks[MAX_SIMUL_LWLOCKS];
-
-static int	lock_addin_request = 0;
-static bool lock_addin_request_allowed = true;
+#define	LWLockPointer(lockid) \
+	(AssertMacro(FlexLockArray[lockid].flex.locktype == FLEXLOCK_TYPE_LWLOCK), \
+	 (volatile LWLock *) &FlexLockArray[lockid])
 
 #ifdef LWLOCK_STATS
 static int	counts_for_pid = 0;
@@ -98,27 +47,17 @@ static int *block_counts;
 #endif
 
 #ifdef LOCK_DEBUG
-bool		Trace_lwlocks = false;
-
 inline static void
-PRINT_LWDEBUG(const char *where, LWLockId lockid, const volatile LWLock *lock)
+PRINT_LWDEBUG(const char *where, FlexLockId lockid, const volatile LWLock *lock)
 {
-	if (Trace_lwlocks)
+	if (Trace_flexlocks)
 		elog(LOG, "%s(%d): excl %d shared %d head %p rOK %d",
 			 where, (int) lockid,
-			 (int) lock->exclusive, lock->shared, lock->head,
-			 (int) lock->releaseOK);
-}
-
-inline static void
-LOG_LWDEBUG(const char *where, LWLockId lockid, const char *msg)
-{
-	if (Trace_lwlocks)
-		elog(LOG, "%s(%d): %s", where, (int) lockid, msg);
+			 (int) lock->exclusive, lock->shared, lock->flex.head,
+			 (int) lock->flex.releaseOK);
 }
 #else							/* not LOCK_DEBUG */
 #define PRINT_LWDEBUG(a,b,c)
-#define LOG_LWDEBUG(a,b,c)
 #endif   /* LOCK_DEBUG */
 
 #ifdef LWLOCK_STATS
@@ -127,8 +66,8 @@ static void
 print_lwlock_stats(int code, Datum arg)
 {
 	int			i;
-	int		   *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	int			numLocks = LWLockCounter[1];
+	int		   *FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+	int			numLocks = FlexLockCounter[1];
 
 	/* Grab an LWLock to keep different backends from mixing reports */
 	LWLockAcquire(0, LW_EXCLUSIVE);
@@ -145,173 +84,15 @@ print_lwlock_stats(int code, Datum arg)
 }
 #endif   /* LWLOCK_STATS */
 
-
 /*
- * Compute number of LWLocks to allocate.
+ * LWLockAssign - initialize a new lwlock and return its ID
  */
-int
-NumLWLocks(void)
-{
-	int			numLocks;
-
-	/*
-	 * Possibly this logic should be spread out among the affected modules,
-	 * the same way that shmem space estimation is done.  But for now, there
-	 * are few enough users of LWLocks that we can get away with just keeping
-	 * the knowledge here.
-	 */
-
-	/* Predefined LWLocks */
-	numLocks = (int) NumFixedLWLocks;
-
-	/* bufmgr.c needs two for each shared buffer */
-	numLocks += 2 * NBuffers;
-
-	/* proc.c needs one for each backend or auxiliary process */
-	numLocks += MaxBackends + NUM_AUXILIARY_PROCS;
-
-	/* clog.c needs one per CLOG buffer */
-	numLocks += NUM_CLOG_BUFFERS;
-
-	/* subtrans.c needs one per SubTrans buffer */
-	numLocks += NUM_SUBTRANS_BUFFERS;
-
-	/* multixact.c needs two SLRU areas */
-	numLocks += NUM_MXACTOFFSET_BUFFERS + NUM_MXACTMEMBER_BUFFERS;
-
-	/* async.c needs one per Async buffer */
-	numLocks += NUM_ASYNC_BUFFERS;
-
-	/* predicate.c needs one per old serializable xid buffer */
-	numLocks += NUM_OLDSERXID_BUFFERS;
-
-	/*
-	 * Add any requested by loadable modules; for backwards-compatibility
-	 * reasons, allocate at least NUM_USER_DEFINED_LWLOCKS of them even if
-	 * there are no explicit requests.
-	 */
-	lock_addin_request_allowed = false;
-	numLocks += Max(lock_addin_request, NUM_USER_DEFINED_LWLOCKS);
-
-	return numLocks;
-}
-
-
-/*
- * RequestAddinLWLocks
- *		Request that extra LWLocks be allocated for use by
- *		a loadable module.
- *
- * This is only useful if called from the _PG_init hook of a library that
- * is loaded into the postmaster via shared_preload_libraries.	Once
- * shared memory has been allocated, calls will be ignored.  (We could
- * raise an error, but it seems better to make it a no-op, so that
- * libraries containing such calls can be reloaded if needed.)
- */
-void
-RequestAddinLWLocks(int n)
-{
-	if (IsUnderPostmaster || !lock_addin_request_allowed)
-		return;					/* too late */
-	lock_addin_request += n;
-}
-
-
-/*
- * Compute shmem space needed for LWLocks.
- */
-Size
-LWLockShmemSize(void)
-{
-	Size		size;
-	int			numLocks = NumLWLocks();
-
-	/* Space for the LWLock array. */
-	size = mul_size(numLocks, sizeof(LWLockPadded));
-
-	/* Space for dynamic allocation counter, plus room for alignment. */
-	size = add_size(size, 2 * sizeof(int) + LWLOCK_PADDED_SIZE);
-
-	return size;
-}
-
-
-/*
- * Allocate shmem space for LWLocks and initialize the locks.
- */
-void
-CreateLWLocks(void)
-{
-	int			numLocks = NumLWLocks();
-	Size		spaceLocks = LWLockShmemSize();
-	LWLockPadded *lock;
-	int		   *LWLockCounter;
-	char	   *ptr;
-	int			id;
-
-	/* Allocate space */
-	ptr = (char *) ShmemAlloc(spaceLocks);
-
-	/* Leave room for dynamic allocation counter */
-	ptr += 2 * sizeof(int);
-
-	/* Ensure desired alignment of LWLock array */
-	ptr += LWLOCK_PADDED_SIZE - ((uintptr_t) ptr) % LWLOCK_PADDED_SIZE;
-
-	LWLockArray = (LWLockPadded *) ptr;
-
-	/*
-	 * Initialize all LWLocks to "unlocked" state
-	 */
-	for (id = 0, lock = LWLockArray; id < numLocks; id++, lock++)
-	{
-		SpinLockInit(&lock->lock.mutex);
-		lock->lock.releaseOK = true;
-		lock->lock.exclusive = 0;
-		lock->lock.shared = 0;
-		lock->lock.head = NULL;
-		lock->lock.tail = NULL;
-	}
-
-	/*
-	 * Initialize the dynamic-allocation counter, which is stored just before
-	 * the first LWLock.
-	 */
-	LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	LWLockCounter[0] = (int) NumFixedLWLocks;
-	LWLockCounter[1] = numLocks;
-}
-
-
-/*
- * LWLockAssign - assign a dynamically-allocated LWLock number
- *
- * We interlock this using the same spinlock that is used to protect
- * ShmemAlloc().  Interlocking is not really necessary during postmaster
- * startup, but it is needed if any user-defined code tries to allocate
- * LWLocks after startup.
- */
-LWLockId
+FlexLockId
 LWLockAssign(void)
 {
-	LWLockId	result;
-
-	/* use volatile pointer to prevent code rearrangement */
-	volatile int *LWLockCounter;
-
-	LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	SpinLockAcquire(ShmemLock);
-	if (LWLockCounter[0] >= LWLockCounter[1])
-	{
-		SpinLockRelease(ShmemLock);
-		elog(ERROR, "no more LWLockIds available");
-	}
-	result = (LWLockId) (LWLockCounter[0]++);
-	SpinLockRelease(ShmemLock);
-	return result;
+	return FlexLockAssign(FLEXLOCK_TYPE_LWLOCK);
 }
 
-
 /*
  * LWLockAcquire - acquire a lightweight lock in the specified mode
  *
@@ -320,9 +101,9 @@ LWLockAssign(void)
  * Side effect: cancel/die interrupts are held off until lock release.
  */
 void
-LWLockAcquire(LWLockId lockid, LWLockMode mode)
+LWLockAcquire(FlexLockId lockid, LWLockMode mode)
 {
-	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	volatile LWLock *lock = LWLockPointer(lockid);
 	PGPROC	   *proc = MyProc;
 	bool		retry = false;
 	int			extraWaits = 0;
@@ -333,8 +114,8 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 	/* Set up local count state first time through in a given process */
 	if (counts_for_pid != MyProcPid)
 	{
-		int		   *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-		int			numLocks = LWLockCounter[1];
+		int		   *FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+		int			numLocks = FlexLockCounter[1];
 
 		sh_acquire_counts = calloc(numLocks, sizeof(int));
 		ex_acquire_counts = calloc(numLocks, sizeof(int));
@@ -356,10 +137,6 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 	 */
 	Assert(!(proc == NULL && IsUnderPostmaster));
 
-	/* Ensure we will have room to remember the lock */
-	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
-		elog(ERROR, "too many LWLocks taken");
-
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -388,11 +165,11 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 		bool		mustwait;
 
 		/* Acquire mutex.  Time spent holding mutex should be short! */
-		SpinLockAcquire(&lock->mutex);
+		SpinLockAcquire(&lock->flex.mutex);
 
 		/* If retrying, allow LWLockRelease to release waiters again */
 		if (retry)
-			lock->releaseOK = true;
+			lock->flex.releaseOK = true;
 
 		/* If I can get the lock, do so quickly. */
 		if (mode == LW_EXCLUSIVE)
@@ -419,72 +196,30 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 		if (!mustwait)
 			break;				/* got the lock */
 
-		/*
-		 * Add myself to wait queue.
-		 *
-		 * If we don't have a PGPROC structure, there's no way to wait. This
-		 * should never occur, since MyProc should only be null during shared
-		 * memory initialization.
-		 */
-		if (proc == NULL)
-			elog(PANIC, "cannot wait without a PGPROC structure");
-
-		proc->lwWaiting = true;
-		proc->lwExclusive = (mode == LW_EXCLUSIVE);
-		proc->lwWaitLink = NULL;
-		if (lock->head == NULL)
-			lock->head = proc;
-		else
-			lock->tail->lwWaitLink = proc;
-		lock->tail = proc;
+		/* Add myself to wait queue. */
+		FlexLockJoinWaitQueue(lock, (int) mode);
 
 		/* Can release the mutex now */
-		SpinLockRelease(&lock->mutex);
-
-		/*
-		 * Wait until awakened.
-		 *
-		 * Since we share the process wait semaphore with the regular lock
-		 * manager and ProcWaitForSignal, and we may need to acquire an LWLock
-		 * while one of those is pending, it is possible that we get awakened
-		 * for a reason other than being signaled by LWLockRelease. If so,
-		 * loop back and wait again.  Once we've gotten the LWLock,
-		 * re-increment the sema by the number of additional signals received,
-		 * so that the lock manager or signal manager will see the received
-		 * signal when it next waits.
-		 */
-		LOG_LWDEBUG("LWLockAcquire", lockid, "waiting");
+		SpinLockRelease(&lock->flex.mutex);
+
+		/* Wait until awakened. */
+		extraWaits += FlexLockWait(lockid, mode);
 
 #ifdef LWLOCK_STATS
 		block_counts[lockid]++;
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);
-
-		for (;;)
-		{
-			/* "false" means cannot accept cancel/die interrupt here. */
-			PGSemaphoreLock(&proc->sem, false);
-			if (!proc->lwWaiting)
-				break;
-			extraWaits++;
-		}
-
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode);
-
-		LOG_LWDEBUG("LWLockAcquire", lockid, "awakened");
-
 		/* Now loop back and try to acquire lock again. */
 		retry = true;
 	}
 
 	/* We are done updating shared state of the lock itself. */
-	SpinLockRelease(&lock->mutex);
+	SpinLockRelease(&lock->flex.mutex);
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(lockid, mode);
+	TRACE_POSTGRESQL_FLEXLOCK_ACQUIRE(lockid, mode);
 
 	/* Add lock to list of locks held by this backend */
-	held_lwlocks[num_held_lwlocks++] = lockid;
+	FlexLockRemember(lockid);
 
 	/*
 	 * Fix the process wait semaphore's count for any absorbed wakeups.
@@ -501,17 +236,13 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
  * If successful, cancel/die interrupts are held off until lock release.
  */
 bool
-LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
+LWLockConditionalAcquire(FlexLockId lockid, LWLockMode mode)
 {
-	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	volatile LWLock *lock = LWLockPointer(lockid);
 	bool		mustwait;
 
 	PRINT_LWDEBUG("LWLockConditionalAcquire", lockid, lock);
 
-	/* Ensure we will have room to remember the lock */
-	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
-		elog(ERROR, "too many LWLocks taken");
-
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -520,7 +251,7 @@ LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
 	HOLD_INTERRUPTS();
 
 	/* Acquire mutex.  Time spent holding mutex should be short! */
-	SpinLockAcquire(&lock->mutex);
+	SpinLockAcquire(&lock->flex.mutex);
 
 	/* If I can get the lock, do so quickly. */
 	if (mode == LW_EXCLUSIVE)
@@ -545,20 +276,20 @@ LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
 	}
 
 	/* We are done updating shared state of the lock itself. */
-	SpinLockRelease(&lock->mutex);
+	SpinLockRelease(&lock->flex.mutex);
 
 	if (mustwait)
 	{
 		/* Failed to get lock, so release interrupt holdoff */
 		RESUME_INTERRUPTS();
-		LOG_LWDEBUG("LWLockConditionalAcquire", lockid, "failed");
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(lockid, mode);
+		FlexLockDebug("LWLockConditionalAcquire", lockid, "failed");
+		TRACE_POSTGRESQL_FLEXLOCK_CONDACQUIRE_FAIL(lockid, mode);
 	}
 	else
 	{
 		/* Add lock to list of locks held by this backend */
-		held_lwlocks[num_held_lwlocks++] = lockid;
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(lockid, mode);
+		FlexLockRemember(lockid);
+		TRACE_POSTGRESQL_FLEXLOCK_CONDACQUIRE(lockid, mode);
 	}
 
 	return !mustwait;
@@ -568,32 +299,18 @@ LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
  * LWLockRelease - release a previously acquired lock
  */
 void
-LWLockRelease(LWLockId lockid)
+LWLockRelease(FlexLockId lockid)
 {
-	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	volatile LWLock *lock = LWLockPointer(lockid);
 	PGPROC	   *head;
 	PGPROC	   *proc;
-	int			i;
 
 	PRINT_LWDEBUG("LWLockRelease", lockid, lock);
 
-	/*
-	 * Remove lock from list of locks held.  Usually, but not always, it will
-	 * be the latest-acquired lock; so search array backwards.
-	 */
-	for (i = num_held_lwlocks; --i >= 0;)
-	{
-		if (lockid == held_lwlocks[i])
-			break;
-	}
-	if (i < 0)
-		elog(ERROR, "lock %d is not held", (int) lockid);
-	num_held_lwlocks--;
-	for (; i < num_held_lwlocks; i++)
-		held_lwlocks[i] = held_lwlocks[i + 1];
+	FlexLockForget(lockid);
 
 	/* Acquire mutex.  Time spent holding mutex should be short! */
-	SpinLockAcquire(&lock->mutex);
+	SpinLockAcquire(&lock->flex.mutex);
 
 	/* Release my hold on lock */
 	if (lock->exclusive > 0)
@@ -610,10 +327,10 @@ LWLockRelease(LWLockId lockid)
 	 * if someone has already awakened waiters that haven't yet acquired the
 	 * lock.
 	 */
-	head = lock->head;
+	head = lock->flex.head;
 	if (head != NULL)
 	{
-		if (lock->exclusive == 0 && lock->shared == 0 && lock->releaseOK)
+		if (lock->exclusive == 0 && lock->shared == 0 && lock->flex.releaseOK)
 		{
 			/*
 			 * Remove the to-be-awakened PGPROCs from the queue.  If the front
@@ -621,17 +338,17 @@ LWLockRelease(LWLockId lockid)
 			 * as many waiters as want shared access.
 			 */
 			proc = head;
-			if (!proc->lwExclusive)
+			if (proc->flWaitMode != LW_EXCLUSIVE)
 			{
-				while (proc->lwWaitLink != NULL &&
-					   !proc->lwWaitLink->lwExclusive)
-					proc = proc->lwWaitLink;
+				while (proc->flWaitLink != NULL &&
+					   proc->flWaitLink->flWaitMode != LW_EXCLUSIVE)
+					proc = proc->flWaitLink;
 			}
 			/* proc is now the last PGPROC to be released */
-			lock->head = proc->lwWaitLink;
-			proc->lwWaitLink = NULL;
+			lock->flex.head = proc->flWaitLink;
+			proc->flWaitLink = NULL;
 			/* prevent additional wakeups until retryer gets to run */
-			lock->releaseOK = false;
+			lock->flex.releaseOK = false;
 		}
 		else
 		{
@@ -641,20 +358,20 @@ LWLockRelease(LWLockId lockid)
 	}
 
 	/* We are done updating shared state of the lock itself. */
-	SpinLockRelease(&lock->mutex);
+	SpinLockRelease(&lock->flex.mutex);
 
-	TRACE_POSTGRESQL_LWLOCK_RELEASE(lockid);
+	TRACE_POSTGRESQL_FLEXLOCK_RELEASE(lockid);
 
 	/*
 	 * Awaken any waiters I removed from the queue.
 	 */
 	while (head != NULL)
 	{
-		LOG_LWDEBUG("LWLockRelease", lockid, "release waiter");
+		FlexLockDebug("LWLockRelease", lockid, "release waiter");
 		proc = head;
-		head = proc->lwWaitLink;
-		proc->lwWaitLink = NULL;
-		proc->lwWaiting = false;
+		head = proc->flWaitLink;
+		proc->flWaitLink = NULL;
+		proc->flWaitResult = 1;		/* any non-zero value will do */
 		PGSemaphoreUnlock(&proc->sem);
 	}
 
@@ -664,43 +381,17 @@ LWLockRelease(LWLockId lockid)
 	RESUME_INTERRUPTS();
 }
 
-
-/*
- * LWLockReleaseAll - release all currently-held locks
- *
- * Used to clean up after ereport(ERROR). An important difference between this
- * function and retail LWLockRelease calls is that InterruptHoldoffCount is
- * unchanged by this operation.  This is necessary since InterruptHoldoffCount
- * has been set to an appropriate level earlier in error recovery. We could
- * decrement it below zero if we allow it to drop for each released lock!
- */
-void
-LWLockReleaseAll(void)
-{
-	while (num_held_lwlocks > 0)
-	{
-		HOLD_INTERRUPTS();		/* match the upcoming RESUME_INTERRUPTS */
-
-		LWLockRelease(held_lwlocks[num_held_lwlocks - 1]);
-	}
-}
-
-
 /*
  * LWLockHeldByMe - test whether my process currently holds a lock
  *
- * This is meant as debug support only.  We do not distinguish whether the
- * lock is held shared or exclusive.
+ * The following convenience routine might not be worthwhile but for the fact
+ * that we've had a function by this name since long before FlexLocks existed.
+ * Callers who want to check whether an arbitrary FlexLock (that may or may not
+ * be an LWLock) is held can use FlexLockHeldByMe directly.
  */
 bool
-LWLockHeldByMe(LWLockId lockid)
+LWLockHeldByMe(FlexLockId lockid)
 {
-	int			i;
-
-	for (i = 0; i < num_held_lwlocks; i++)
-	{
-		if (held_lwlocks[i] == lockid)
-			return true;
-	}
-	return false;
+	AssertMacro(FlexLockArray[lockid].flex.locktype == FLEXLOCK_TYPE_LWLOCK);
+	return FlexLockHeldByMe(lockid);
 }
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 345f6f5..15978a4 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -239,7 +239,7 @@
 #define PredicateLockHashPartition(hashcode) \
 	((hashcode) % NUM_PREDICATELOCK_PARTITIONS)
 #define PredicateLockHashPartitionLock(hashcode) \
-	((LWLockId) (FirstPredicateLockMgrLock + PredicateLockHashPartition(hashcode)))
+	((FlexLockId) (FirstPredicateLockMgrLock + PredicateLockHashPartition(hashcode)))
 
 #define NPREDICATELOCKTARGETENTS() \
 	mul_size(max_predicate_locks_per_xact, add_size(MaxBackends, max_prepared_xacts))
@@ -1840,7 +1840,7 @@ PageIsPredicateLocked(Relation relation, BlockNumber blkno)
 {
 	PREDICATELOCKTARGETTAG targettag;
 	uint32		targettaghash;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	PREDICATELOCKTARGET *target;
 
 	SET_PREDICATELOCKTARGETTAG_PAGE(targettag,
@@ -2073,7 +2073,7 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 		if (TargetTagIsCoveredBy(oldtargettag, *newtargettag))
 		{
 			uint32		oldtargettaghash;
-			LWLockId	partitionLock;
+			FlexLockId	partitionLock;
 			PREDICATELOCK *rmpredlock;
 
 			oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
@@ -2285,7 +2285,7 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	PREDICATELOCKTARGET *target;
 	PREDICATELOCKTAG locktag;
 	PREDICATELOCK *lock;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	bool		found;
 
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
@@ -2586,10 +2586,10 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 								  bool removeOld)
 {
 	uint32		oldtargettaghash;
-	LWLockId	oldpartitionLock;
+	FlexLockId	oldpartitionLock;
 	PREDICATELOCKTARGET *oldtarget;
 	uint32		newtargettaghash;
-	LWLockId	newpartitionLock;
+	FlexLockId	newpartitionLock;
 	bool		found;
 	bool		outOfShmem = false;
 
@@ -3578,7 +3578,7 @@ ClearOldPredicateLocks(void)
 			PREDICATELOCKTARGET *target;
 			PREDICATELOCKTARGETTAG targettag;
 			uint32		targettaghash;
-			LWLockId	partitionLock;
+			FlexLockId	partitionLock;
 
 			tag = predlock->tag;
 			target = tag.myTarget;
@@ -3656,7 +3656,7 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 		PREDICATELOCKTARGET *target;
 		PREDICATELOCKTARGETTAG targettag;
 		uint32		targettaghash;
-		LWLockId	partitionLock;
+		FlexLockId	partitionLock;
 
 		nextpredlock = (PREDICATELOCK *)
 			SHMQueueNext(&(sxact->predicateLocks),
@@ -4034,7 +4034,7 @@ static void
 CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 {
 	uint32		targettaghash;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	PREDICATELOCKTARGET *target;
 	PREDICATELOCK *predlock;
 	PREDICATELOCK *mypredlock = NULL;
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index eda3a98..57da345 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -325,9 +325,9 @@ InitProcess(void)
 	/* NB -- autovac launcher intentionally does not set IS_AUTOVACUUM */
 	if (IsAutoVacuumWorkerProcess())
 		MyProc->vacuumFlags |= PROC_IS_AUTOVACUUM;
-	MyProc->lwWaiting = false;
-	MyProc->lwExclusive = false;
-	MyProc->lwWaitLink = NULL;
+	MyProc->flWaitResult = 0;
+	MyProc->flWaitMode = 0;
+	MyProc->flWaitLink = NULL;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
 #ifdef USE_ASSERT_CHECKING
@@ -479,9 +479,9 @@ InitAuxiliaryProcess(void)
 	MyProc->roleId = InvalidOid;
 	MyProc->inCommit = false;
 	MyProc->vacuumFlags = 0;
-	MyProc->lwWaiting = false;
-	MyProc->lwExclusive = false;
-	MyProc->lwWaitLink = NULL;
+	MyProc->flWaitMode = 0;
+	MyProc->flWaitResult = 0;
+	MyProc->flWaitLink = NULL;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
 #ifdef USE_ASSERT_CHECKING
@@ -607,7 +607,7 @@ IsWaitingForLock(void)
 void
 LockWaitCancel(void)
 {
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 
 	/* Nothing to do if we weren't waiting for a lock */
 	if (lockAwaited == NULL)
@@ -718,11 +718,11 @@ ProcKill(int code, Datum arg)
 #endif
 
 	/*
-	 * Release any LW locks I am holding.  There really shouldn't be any, but
-	 * it's cheap to check again before we cut the knees off the LWLock
+	 * Release any felx locks I am holding.  There really shouldn't be any, but
+	 * it's cheap to check again before we cut the knees off the flex lock
 	 * facility by releasing our PGPROC ...
 	 */
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 
 	/* Release ownership of the process's latch, too */
 	DisownLatch(&MyProc->procLatch);
@@ -779,8 +779,8 @@ AuxiliaryProcKill(int code, Datum arg)
 
 	Assert(MyProc == auxproc);
 
-	/* Release any LW locks I am holding (see notes above) */
-	LWLockReleaseAll();
+	/* Release any flex locks I am holding (see notes above) */
+	FlexLockReleaseAll();
 
 	/* Release ownership of the process's latch, too */
 	DisownLatch(&MyProc->procLatch);
@@ -865,7 +865,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 	LOCK	   *lock = locallock->lock;
 	PROCLOCK   *proclock = locallock->proclock;
 	uint32		hashcode = locallock->hashcode;
-	LWLockId	partitionLock = LockHashPartitionLock(hashcode);
+	FlexLockId	partitionLock = LockHashPartitionLock(hashcode);
 	PROC_QUEUE *waitQueue = &(lock->waitProcs);
 	LOCKMASK	myHeldLocks = MyProc->heldLocks;
 	bool		early_deadlock = false;
diff --git a/src/backend/utils/misc/check_guc b/src/backend/utils/misc/check_guc
index 293fb03..1a19e36 100755
--- a/src/backend/utils/misc/check_guc
+++ b/src/backend/utils/misc/check_guc
@@ -19,7 +19,7 @@
 INTENTIONALLY_NOT_INCLUDED="autocommit debug_deadlocks \
 is_superuser lc_collate lc_ctype lc_messages lc_monetary lc_numeric lc_time \
 pre_auth_delay role seed server_encoding server_version server_version_int \
-session_authorization trace_lock_oidmin trace_lock_table trace_locks trace_lwlocks \
+session_authorization trace_lock_oidmin trace_lock_table trace_locks trace_flexlocks \
 trace_notify trace_userlocks transaction_isolation transaction_read_only \
 zero_damaged_pages"
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index da7b6d4..52de233 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -59,6 +59,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/flexlock_internals.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
 #include "storage/predicate.h"
@@ -1071,12 +1072,12 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 	{
-		{"trace_lwlocks", PGC_SUSET, DEVELOPER_OPTIONS,
+		{"trace_flexlocks", PGC_SUSET, DEVELOPER_OPTIONS,
 			gettext_noop("No description available."),
 			NULL,
 			GUC_NOT_IN_SAMPLE
 		},
-		&Trace_lwlocks,
+		&Trace_flexlocks,
 		false,
 		NULL, NULL, NULL
 	},
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 71c5ab0..5b9cfe6 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -15,8 +15,8 @@
  * in probe definitions, as they cause compilation errors on Mac OS X 10.5.
  */
 #define LocalTransactionId unsigned int
-#define LWLockId int
-#define LWLockMode int
+#define FlexLockId int
+#define FlexLockMode int
 #define LOCKMODE int
 #define BlockNumber unsigned int
 #define Oid unsigned int
@@ -29,12 +29,12 @@ provider postgresql {
 	probe transaction__commit(LocalTransactionId);
 	probe transaction__abort(LocalTransactionId);
 
-	probe lwlock__acquire(LWLockId, LWLockMode);
-	probe lwlock__release(LWLockId);
-	probe lwlock__wait__start(LWLockId, LWLockMode);
-	probe lwlock__wait__done(LWLockId, LWLockMode);
-	probe lwlock__condacquire(LWLockId, LWLockMode);
-	probe lwlock__condacquire__fail(LWLockId, LWLockMode);
+	probe flexlock__acquire(FlexLockId, FlexLockMode);
+	probe flexlock__release(FlexLockId);
+	probe flexlock__wait__start(FlexLockId, FlexLockMode);
+	probe flexlock__wait__done(FlexLockId, FlexLockMode);
+	probe flexlock__condacquire(FlexLockId, FlexLockMode);
+	probe flexlock__condacquire__fail(FlexLockId, FlexLockMode);
 
 	probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
 	probe lock__wait__done(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index e48743f..680a87f 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -55,7 +55,7 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLockId	ControlLock;
+	FlexLockId	ControlLock;
 
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
@@ -69,7 +69,7 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int		   *page_number;
 	int		   *page_lru_count;
-	LWLockId   *buffer_locks;
+	FlexLockId *buffer_locks;
 
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
@@ -136,7 +136,7 @@ typedef SlruCtlData *SlruCtl;
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLockId ctllock, const char *subdir);
+			  FlexLockId ctllock, const char *subdir);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int pageno);
 extern int SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				  TransactionId xid);
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 6c8e312..d3b74db 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -49,9 +49,9 @@
 #define SEQ_MINVALUE	(-SEQ_MAXVALUE)
 
 /*
- * Number of spare LWLocks to allocate for user-defined add-on code.
+ * Number of spare FlexLocks to allocate for user-defined add-on code.
  */
-#define NUM_USER_DEFINED_LWLOCKS	4
+#define NUM_USER_DEFINED_FLEXLOCKS	4
 
 /*
  * Define this if you want to allow the lo_import and lo_export SQL
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index b7d4ea5..ac7f665 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -103,7 +103,7 @@ typedef struct buftag
 #define BufTableHashPartition(hashcode) \
 	((hashcode) % NUM_BUFFER_PARTITIONS)
 #define BufMappingPartitionLock(hashcode) \
-	((LWLockId) (FirstBufMappingLock + BufTableHashPartition(hashcode)))
+	((FlexLockId) (FirstBufMappingLock + BufTableHashPartition(hashcode)))
 
 /*
  *	BufferDesc -- shared descriptor/state data for a single shared buffer.
@@ -143,8 +143,8 @@ typedef struct sbufdesc
 	int			buf_id;			/* buffer's index number (from 0) */
 	int			freeNext;		/* link in freelist chain */
 
-	LWLockId	io_in_progress_lock;	/* to wait for I/O to complete */
-	LWLockId	content_lock;	/* to lock access to buffer contents */
+	FlexLockId	io_in_progress_lock;	/* to wait for I/O to complete */
+	FlexLockId	content_lock;	/* to lock access to buffer contents */
 } BufferDesc;
 
 #define BufferDescriptorGetBuffer(bdesc) ((bdesc)->buf_id + 1)
diff --git a/src/include/storage/flexlock.h b/src/include/storage/flexlock.h
new file mode 100644
index 0000000..612c21a
--- /dev/null
+++ b/src/include/storage/flexlock.h
@@ -0,0 +1,102 @@
+/*-------------------------------------------------------------------------
+ *
+ * flexlock.h
+ *	  Flex lock manager
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/flexlock.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FLEXLOCK_H
+#define FLEXLOCK_H
+
+/*
+ * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
+ * here, but we need them to set up enum FlexLockId correctly, and having
+ * this file include lock.h or bufmgr.h would be backwards.
+ */
+
+/* Number of partitions of the shared buffer mapping hashtable */
+#define NUM_BUFFER_PARTITIONS  16
+
+/* Number of partitions the shared lock tables are divided into */
+#define LOG2_NUM_LOCK_PARTITIONS  4
+#define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)
+
+/* Number of partitions the shared predicate lock tables are divided into */
+#define LOG2_NUM_PREDICATELOCK_PARTITIONS  4
+#define NUM_PREDICATELOCK_PARTITIONS  (1 << LOG2_NUM_PREDICATELOCK_PARTITIONS)
+
+/*
+ * We have a number of predefined FlexLocks, plus a bunch of locks that are
+ * dynamically assigned (e.g., for shared buffers).  The FlexLock structures
+ * live in shared memory (since they contain shared data) and are identified
+ * by values of this enumerated type.  We abuse the notion of an enum somewhat
+ * by allowing values not listed in the enum declaration to be assigned.
+ * The extra value MaxDynamicFlexLock is there to keep the compiler from
+ * deciding that the enum can be represented as char or short ...
+ *
+ * If you remove a lock, please replace it with a placeholder. This retains
+ * the lock numbering, which is helpful for DTrace and other external
+ * debugging scripts.
+ */
+typedef enum FlexLockId
+{
+	BufFreelistLock,
+	ShmemIndexLock,
+	OidGenLock,
+	XidGenLock,
+	ProcArrayLock,
+	SInvalReadLock,
+	SInvalWriteLock,
+	WALInsertLock,
+	WALWriteLock,
+	ControlFileLock,
+	CheckpointLock,
+	CLogControlLock,
+	SubtransControlLock,
+	MultiXactGenLock,
+	MultiXactOffsetControlLock,
+	MultiXactMemberControlLock,
+	RelCacheInitLock,
+	BgWriterCommLock,
+	TwoPhaseStateLock,
+	TablespaceCreateLock,
+	BtreeVacuumLock,
+	AddinShmemInitLock,
+	AutovacuumLock,
+	AutovacuumScheduleLock,
+	SyncScanLock,
+	RelationMappingLock,
+	AsyncCtlLock,
+	AsyncQueueLock,
+	SerializableXactHashLock,
+	SerializableFinishedListLock,
+	SerializablePredicateLockListLock,
+	OldSerXidLock,
+	SyncRepLock,
+	/* Individual lock IDs end here */
+	FirstBufMappingLock,
+	FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
+	FirstPredicateLockMgrLock = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
+
+	/* must be last except for MaxDynamicFlexLock: */
+	NumFixedFlexLocks = FirstPredicateLockMgrLock + NUM_PREDICATELOCK_PARTITIONS,
+
+	MaxDynamicFlexLock = 1000000000
+} FlexLockId;
+
+/* Shared memory setup. */
+extern int	NumFlexLocks(void);
+extern Size FlexLockShmemSize(void);
+extern void RequestAddinFlexLocks(int n);
+extern void CreateFlexLocks(void);
+
+/* Error recovery and debugging support functions. */
+extern void FlexLockReleaseAll(void);
+extern bool FlexLockHeldByMe(FlexLockId id);
+
+#endif   /* FLEXLOCK_H */
diff --git a/src/include/storage/flexlock_internals.h b/src/include/storage/flexlock_internals.h
new file mode 100644
index 0000000..5f78da7
--- /dev/null
+++ b/src/include/storage/flexlock_internals.h
@@ -0,0 +1,88 @@
+/*-------------------------------------------------------------------------
+ *
+ * flexlock_internals.h
+ *	  Flex lock internals.  Only files which implement a FlexLock
+ *    type should need to include this.  Merging this with flexlock.h
+ *    creates a circular header dependency, but even if it didn't, this
+ *    is cleaner.
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/flexlock_internals.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FLEXLOCK_INTERNALS_H
+#define FLEXLOCK_INTERNALS_H
+
+#include "pg_trace.h"
+#include "storage/flexlock.h"
+#include "storage/proc.h"
+#include "storage/s_lock.h"
+
+/*
+ * Individual FlexLock implementations each get this many bytes to store
+ * its state; of course, a given implementation could also allocate additional
+ * shmem elsewhere, but we provide this many bytes within the array.  The
+ * header fields common to all FlexLock types are included in this number.
+ * A power of two should probably be chosen, to avoid alignment issues and
+ * cache line splitting.  It might be useful to increase this on systems where
+ * a cache line is more than 64 bytes in size.
+ */
+#define FLEX_LOCK_BYTES		64
+
+typedef struct FlexLock
+{
+	char		locktype;		/* see FLEXLOCK_TYPE_* constants */
+	slock_t		mutex;			/* Protects FlexLock state and wait queues */
+	bool		releaseOK;		/* T if ok to release waiters */
+	PGPROC	   *head;			/* head of list of waiting PGPROCs */
+	PGPROC	   *tail;			/* tail of list of waiting PGPROCs */
+	/* tail is undefined when head is NULL */
+} FlexLock;
+
+#define FLEXLOCK_TYPE_LWLOCK			'l'
+
+typedef union FlexLockPadded
+{
+	FlexLock	flex;
+	char		pad[FLEX_LOCK_BYTES];
+} FlexLockPadded;
+
+extern FlexLockPadded *FlexLockArray;
+
+extern FlexLockId FlexLockAssign(char locktype);
+extern void FlexLockRemember(FlexLockId id);
+extern void FlexLockForget(FlexLockId id);
+extern int FlexLockWait(FlexLockId id, int mode);
+
+/*
+ * We must join the wait queue while holding the spinlock, so we define this
+ * as a macro, for speed.
+ */
+#define FlexLockJoinWaitQueue(lock, mode) \
+	do { \
+		Assert(MyProc != NULL); \
+		MyProc->flWaitResult = 0; \
+		MyProc->flWaitMode = mode; \
+		MyProc->flWaitLink = NULL; \
+		if (lock->flex.head == NULL) \
+			lock->flex.head = MyProc; \
+		else \
+			lock->flex.tail->flWaitLink = MyProc; \
+		lock->flex.tail = MyProc; \
+	} while (0)
+
+#ifdef LOCK_DEBUG
+extern bool	Trace_flexlocks;
+#define FlexLockDebug(where, id, msg) \
+	do { \
+		if (Trace_flexlocks) \
+			elog(LOG, "%s(%d): %s", where, (int) id, msg); \
+	} while (0)
+#else
+#define FlexLockDebug(where, id, msg)
+#endif
+
+#endif   /* FLEXLOCK_H */
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index e106ad5..ba87db2 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -471,7 +471,7 @@ typedef enum
 #define LockHashPartition(hashcode) \
 	((hashcode) % NUM_LOCK_PARTITIONS)
 #define LockHashPartitionLock(hashcode) \
-	((LWLockId) (FirstLockMgrLock + LockHashPartition(hashcode)))
+	((FlexLockId) (FirstLockMgrLock + LockHashPartition(hashcode)))
 
 
 /*
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 438a48d..f68cddc 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -14,82 +14,7 @@
 #ifndef LWLOCK_H
 #define LWLOCK_H
 
-/*
- * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
- * here, but we need them to set up enum LWLockId correctly, and having
- * this file include lock.h or bufmgr.h would be backwards.
- */
-
-/* Number of partitions of the shared buffer mapping hashtable */
-#define NUM_BUFFER_PARTITIONS  16
-
-/* Number of partitions the shared lock tables are divided into */
-#define LOG2_NUM_LOCK_PARTITIONS  4
-#define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)
-
-/* Number of partitions the shared predicate lock tables are divided into */
-#define LOG2_NUM_PREDICATELOCK_PARTITIONS  4
-#define NUM_PREDICATELOCK_PARTITIONS  (1 << LOG2_NUM_PREDICATELOCK_PARTITIONS)
-
-/*
- * We have a number of predefined LWLocks, plus a bunch of LWLocks that are
- * dynamically assigned (e.g., for shared buffers).  The LWLock structures
- * live in shared memory (since they contain shared data) and are identified
- * by values of this enumerated type.  We abuse the notion of an enum somewhat
- * by allowing values not listed in the enum declaration to be assigned.
- * The extra value MaxDynamicLWLock is there to keep the compiler from
- * deciding that the enum can be represented as char or short ...
- *
- * If you remove a lock, please replace it with a placeholder. This retains
- * the lock numbering, which is helpful for DTrace and other external
- * debugging scripts.
- */
-typedef enum LWLockId
-{
-	BufFreelistLock,
-	ShmemIndexLock,
-	OidGenLock,
-	XidGenLock,
-	ProcArrayLock,
-	SInvalReadLock,
-	SInvalWriteLock,
-	WALInsertLock,
-	WALWriteLock,
-	ControlFileLock,
-	CheckpointLock,
-	CLogControlLock,
-	SubtransControlLock,
-	MultiXactGenLock,
-	MultiXactOffsetControlLock,
-	MultiXactMemberControlLock,
-	RelCacheInitLock,
-	BgWriterCommLock,
-	TwoPhaseStateLock,
-	TablespaceCreateLock,
-	BtreeVacuumLock,
-	AddinShmemInitLock,
-	AutovacuumLock,
-	AutovacuumScheduleLock,
-	SyncScanLock,
-	RelationMappingLock,
-	AsyncCtlLock,
-	AsyncQueueLock,
-	SerializableXactHashLock,
-	SerializableFinishedListLock,
-	SerializablePredicateLockListLock,
-	OldSerXidLock,
-	SyncRepLock,
-	/* Individual lock IDs end here */
-	FirstBufMappingLock,
-	FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
-	FirstPredicateLockMgrLock = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
-
-	/* must be last except for MaxDynamicLWLock: */
-	NumFixedLWLocks = FirstPredicateLockMgrLock + NUM_PREDICATELOCK_PARTITIONS,
-
-	MaxDynamicLWLock = 1000000000
-} LWLockId;
-
+#include "storage/flexlock.h"
 
 typedef enum LWLockMode
 {
@@ -97,22 +22,10 @@ typedef enum LWLockMode
 	LW_SHARED
 } LWLockMode;
 
-
-#ifdef LOCK_DEBUG
-extern bool Trace_lwlocks;
-#endif
-
-extern LWLockId LWLockAssign(void);
-extern void LWLockAcquire(LWLockId lockid, LWLockMode mode);
-extern bool LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode);
-extern void LWLockRelease(LWLockId lockid);
-extern void LWLockReleaseAll(void);
-extern bool LWLockHeldByMe(LWLockId lockid);
-
-extern int	NumLWLocks(void);
-extern Size LWLockShmemSize(void);
-extern void CreateLWLocks(void);
-
-extern void RequestAddinLWLocks(int n);
+extern FlexLockId LWLockAssign(void);
+extern void LWLockAcquire(FlexLockId lockid, LWLockMode mode);
+extern bool LWLockConditionalAcquire(FlexLockId lockid, LWLockMode mode);
+extern void LWLockRelease(FlexLockId lockid);
+extern bool LWLockHeldByMe(FlexLockId lockid);
 
 #endif   /* LWLOCK_H */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 6e798b1..7e8630d 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -114,10 +114,10 @@ struct PGPROC
 	 */
 	bool		recoveryConflictPending;
 
-	/* Info about LWLock the process is currently waiting for, if any. */
-	bool		lwWaiting;		/* true if waiting for an LW lock */
-	bool		lwExclusive;	/* true if waiting for exclusive access */
-	struct PGPROC *lwWaitLink;	/* next waiter for same LW lock */
+	/* Info about FlexLock the process is currently waiting for, if any. */
+	int			flWaitResult;	/* result of wait, or 0 if still waiting */
+	int			flWaitMode;		/* lock mode sought */
+	struct PGPROC *flWaitLink;	/* next waiter for same FlexLock */
 
 	/* Info about lock the process is currently waiting for, if any. */
 	/* waitLock and waitProcLock are NULL if not currently waiting. */
@@ -147,7 +147,7 @@ struct PGPROC
 	struct XidCache subxids;	/* cache for subtransaction XIDs */
 
 	/* Per-backend LWLock.  Protects fields below. */
-	LWLockId	backendLock;	/* protects the fields below */
+	FlexLockId	backendLock;	/* protects the fields below */
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	uint64		fpLockBits;		/* lock modes held for each fast-path slot */

procarraylock-v1.patchapplication/octet-stream; name=procarraylock-v1.patchDownload

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 32985a4..d6bba6f 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -40,6 +40,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "utils/acl.h"
 #include "utils/attoptcache.h"
 #include "utils/datum.h"
@@ -222,9 +223,9 @@ analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
 	/*
 	 * OK, let's do it.  First let other backends know I'm in ANALYZE.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	MyProc->vacuumFlags |= PROC_IN_ANALYZE;
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/*
 	 * Do the normal non-recursive ANALYZE.
@@ -249,9 +250,9 @@ analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
 	 * Reset my PGPROC flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	MyProc->vacuumFlags &= ~PROC_IN_ANALYZE;
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index f42504c..823dab9 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -39,6 +39,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
@@ -892,11 +893,11 @@ vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool do_toast, bool for_wraparound)
 		 * MyProc->xid/xmin, else OldestXmin might appear to go backwards,
 		 * which is probably Not Good.
 		 */
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		ProcArrayLockAcquire(PAL_EXCLUSIVE);
 		MyProc->vacuumFlags |= PROC_IN_VACUUM;
 		if (for_wraparound)
 			MyProc->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 	}
 
 	/*
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 1a48485..39c5080 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -52,6 +52,7 @@
 #include "access/twophase.h"
 #include "miscadmin.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/snapmgr.h"
@@ -254,7 +255,7 @@ ProcArrayAdd(PGPROC *proc)
 {
 	ProcArrayStruct *arrayP = procArray;
 
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	if (arrayP->numProcs >= arrayP->maxProcs)
 	{
@@ -263,7 +264,7 @@ ProcArrayAdd(PGPROC *proc)
 		 * fixed supply of PGPROC structs too, and so we should have failed
 		 * earlier.)
 		 */
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 		ereport(FATAL,
 				(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
 				 errmsg("sorry, too many clients already")));
@@ -272,7 +273,7 @@ ProcArrayAdd(PGPROC *proc)
 	arrayP->procs[arrayP->numProcs] = proc;
 	arrayP->numProcs++;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -297,7 +298,7 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 		DisplayXidCache();
 #endif
 
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	if (TransactionIdIsValid(latestXid))
 	{
@@ -321,13 +322,13 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 			arrayP->procs[index] = arrayP->procs[arrayP->numProcs - 1];
 			arrayP->procs[arrayP->numProcs - 1] = NULL; /* for debugging */
 			arrayP->numProcs--;
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 			return;
 		}
 	}
 
 	/* Ooops */
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	elog(LOG, "failed to find proc %p in ProcArray", proc);
 }
@@ -351,54 +352,15 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 {
 	if (TransactionIdIsValid(latestXid))
 	{
-		/*
-		 * We must lock ProcArrayLock while clearing proc->xid, so that we do
-		 * not exit the set of "running" transactions while someone else is
-		 * taking a snapshot.  See discussion in
-		 * src/backend/access/transam/README.
-		 */
-		Assert(TransactionIdIsValid(proc->xid));
-
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-		proc->xid = InvalidTransactionId;
-		proc->lxid = InvalidLocalTransactionId;
-		proc->xmin = InvalidTransactionId;
-		/* must be cleared with xid/xmin: */
-		proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		proc->inCommit = false; /* be sure this is cleared in abort */
-		proc->recoveryConflictPending = false;
-
-		/* Clear the subtransaction-XID cache too while holding the lock */
-		proc->subxids.nxids = 0;
-		proc->subxids.overflowed = false;
-
-		/* Also advance global latestCompletedXid while holding the lock */
-		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-								  latestXid))
-			ShmemVariableCache->latestCompletedXid = latestXid;
-
-		LWLockRelease(ProcArrayLock);
+		Assert(proc == MyProc);
+		ProcArrayLockClearTransaction(latestXid);		
 	}
 	else
-	{
-		/*
-		 * If we have no XID, we don't need to lock, since we won't affect
-		 * anyone else's calculation of a snapshot.  We might change their
-		 * estimate of global xmin, but that's OK.
-		 */
-		Assert(!TransactionIdIsValid(proc->xid));
-
-		proc->lxid = InvalidLocalTransactionId;
 		proc->xmin = InvalidTransactionId;
-		/* must be cleared with xid/xmin: */
-		proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		proc->inCommit = false; /* be sure this is cleared in abort */
-		proc->recoveryConflictPending = false;
 
-		Assert(proc->subxids.nxids == 0);
-		Assert(proc->subxids.overflowed == false);
-	}
+	proc->lxid = InvalidLocalTransactionId;
+	proc->inCommit = false; /* be sure this is cleared in abort */
+	proc->recoveryConflictPending = false;
 }
 
 
@@ -528,7 +490,7 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	/*
 	 * Nobody else is running yet, but take locks anyhow
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	/*
 	 * KnownAssignedXids is sorted so we cannot just add the xids, we have to
@@ -635,7 +597,7 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	Assert(TransactionIdIsNormal(ShmemVariableCache->latestCompletedXid));
 	Assert(TransactionIdIsValid(ShmemVariableCache->nextXid));
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
 	if (standbyState == STANDBY_SNAPSHOT_READY)
@@ -690,7 +652,7 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
 	/*
 	 * Uses same locking as transaction commit
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	/*
 	 * Remove subxids from known-assigned-xacts.
@@ -703,7 +665,7 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
 	if (TransactionIdPrecedes(procArray->lastOverflowedXid, max_xid))
 		procArray->lastOverflowedXid = max_xid;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -795,7 +757,7 @@ TransactionIdIsInProgress(TransactionId xid)
 					 errmsg("out of memory")));
 	}
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	/*
 	 * Now that we have the lock, we can check latestCompletedXid; if the
@@ -803,7 +765,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	 */
 	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid, xid))
 	{
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 		xc_by_latest_xid_inc();
 		return true;
 	}
@@ -829,7 +791,7 @@ TransactionIdIsInProgress(TransactionId xid)
 		 */
 		if (TransactionIdEquals(pxid, xid))
 		{
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 			xc_by_main_xid_inc();
 			return true;
 		}
@@ -851,7 +813,7 @@ TransactionIdIsInProgress(TransactionId xid)
 
 			if (TransactionIdEquals(cxid, xid))
 			{
-				LWLockRelease(ProcArrayLock);
+				ProcArrayLockRelease();
 				xc_by_child_xid_inc();
 				return true;
 			}
@@ -879,7 +841,7 @@ TransactionIdIsInProgress(TransactionId xid)
 
 		if (KnownAssignedXidExists(xid))
 		{
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 			xc_by_known_assigned_inc();
 			return true;
 		}
@@ -895,7 +857,7 @@ TransactionIdIsInProgress(TransactionId xid)
 			nxids = KnownAssignedXidsGet(xids, xid);
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/*
 	 * If none of the relevant caches overflowed, we know the Xid is not
@@ -961,7 +923,7 @@ TransactionIdIsActive(TransactionId xid)
 	if (TransactionIdPrecedes(xid, RecentXmin))
 		return false;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (i = 0; i < arrayP->numProcs; i++)
 	{
@@ -983,7 +945,7 @@ TransactionIdIsActive(TransactionId xid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1046,7 +1008,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 	/* Cannot look for individual databases during recovery */
 	Assert(allDbs || !RecoveryInProgress());
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	/*
 	 * We initialize the MIN() calculation with latestCompletedXid + 1. This
@@ -1099,7 +1061,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 		 */
 		TransactionId kaxmin = KnownAssignedXidsGetOldestXmin();
 
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 
 		if (TransactionIdIsNormal(kaxmin) &&
 			TransactionIdPrecedes(kaxmin, result))
@@ -1110,7 +1072,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 		/*
 		 * No other information needed, so release the lock immediately.
 		 */
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 
 		/*
 		 * Compute the cutoff XID by subtracting vacuum_defer_cleanup_age,
@@ -1239,7 +1201,7 @@ GetSnapshotData(Snapshot snapshot)
 	 * It is sufficient to get shared lock on ProcArrayLock, even if we are
 	 * going to set MyProc->xmin.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	/* xmax is always latestCompletedXid + 1 */
 	xmax = ShmemVariableCache->latestCompletedXid;
@@ -1375,7 +1337,7 @@ GetSnapshotData(Snapshot snapshot)
 	if (!TransactionIdIsValid(MyProc->xmin))
 		MyProc->xmin = TransactionXmin = xmin;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/*
 	 * Update globalxmin to include actual process xids.  This is a slightly
@@ -1432,7 +1394,7 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
 		return false;
 
 	/* Get lock so source xact can't end while we're doing this */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1476,7 +1438,7 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
 		break;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1550,7 +1512,7 @@ GetRunningTransactionData(void)
 	 * Ensure that no xids enter or leave the procarray while we obtain
 	 * snapshot.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 	LWLockAcquire(XidGenLock, LW_SHARED);
 
 	latestCompletedXid = ShmemVariableCache->latestCompletedXid;
@@ -1611,7 +1573,7 @@ GetRunningTransactionData(void)
 	CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
 
 	/* We don't release XidGenLock here, the caller is responsible for that */
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
 	Assert(TransactionIdIsValid(CurrentRunningXacts->oldestRunningXid));
@@ -1644,7 +1606,7 @@ GetOldestActiveTransactionId(void)
 
 	Assert(!RecoveryInProgress());
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	oldestRunningXid = ShmemVariableCache->nextXid;
 
@@ -1672,7 +1634,7 @@ GetOldestActiveTransactionId(void)
 		 */
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return oldestRunningXid;
 }
@@ -1705,7 +1667,7 @@ GetTransactionsInCommit(TransactionId **xids_p)
 	xids = (TransactionId *) palloc(arrayP->maxProcs * sizeof(TransactionId));
 	nxids = 0;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1718,7 +1680,7 @@ GetTransactionsInCommit(TransactionId **xids_p)
 			xids[nxids++] = pxid;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	*xids_p = xids;
 	return nxids;
@@ -1740,7 +1702,7 @@ HaveTransactionsInCommit(TransactionId *xids, int nxids)
 	ProcArrayStruct *arrayP = procArray;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1766,7 +1728,7 @@ HaveTransactionsInCommit(TransactionId *xids, int nxids)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1788,7 +1750,7 @@ BackendPidGetProc(int pid)
 	if (pid == 0)				/* never match dummy PGPROCs */
 		return NULL;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1801,7 +1763,7 @@ BackendPidGetProc(int pid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1829,7 +1791,7 @@ BackendXidGetPid(TransactionId xid)
 	if (xid == InvalidTransactionId)	/* never match invalid xid */
 		return 0;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1842,7 +1804,7 @@ BackendXidGetPid(TransactionId xid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1897,7 +1859,7 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
 	vxids = (VirtualTransactionId *)
 		palloc(sizeof(VirtualTransactionId) * arrayP->maxProcs);
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1933,7 +1895,7 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	*nvxids = count;
 	return vxids;
@@ -1992,7 +1954,7 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
 					 errmsg("out of memory")));
 	}
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2025,7 +1987,7 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/* add the terminator */
 	vxids[count].backendId = InvalidBackendId;
@@ -2046,7 +2008,7 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
 	int			index;
 	pid_t		pid = 0;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2072,7 +2034,7 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return pid;
 }
@@ -2146,7 +2108,7 @@ CountDBBackends(Oid databaseid)
 	int			count = 0;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2159,7 +2121,7 @@ CountDBBackends(Oid databaseid)
 			count++;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return count;
 }
@@ -2175,7 +2137,7 @@ CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending)
 	pid_t		pid = 0;
 
 	/* tell all backends to die */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2200,7 +2162,7 @@ CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -2213,7 +2175,7 @@ CountUserBackends(Oid roleid)
 	int			count = 0;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2225,7 +2187,7 @@ CountUserBackends(Oid roleid)
 			count++;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return count;
 }
@@ -2273,7 +2235,7 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 
 		*nbackends = *nprepared = 0;
 
-		LWLockAcquire(ProcArrayLock, LW_SHARED);
+		ProcArrayLockAcquire(PAL_SHARED);
 
 		for (index = 0; index < arrayP->numProcs; index++)
 		{
@@ -2297,7 +2259,7 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 			}
 		}
 
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 
 		if (!found)
 			return false;		/* no conflicting backends, so done */
@@ -2350,7 +2312,7 @@ XidCacheRemoveRunningXids(TransactionId xid,
 	 * to abort subtransactions, but pending closer analysis we'd best be
 	 * conservative.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	/*
 	 * Under normal circumstances xid and xids[] will be in increasing order,
@@ -2398,7 +2360,7 @@ XidCacheRemoveRunningXids(TransactionId xid,
 							  latestXid))
 		ShmemVariableCache->latestCompletedXid = latestXid;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 #ifdef XIDCACHE_DEBUG
@@ -2565,7 +2527,7 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
 	/*
 	 * Uses same locking as transaction commit
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	KnownAssignedXidsRemoveTree(xid, nsubxids, subxids);
 
@@ -2574,7 +2536,7 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
 							  max_xid))
 		ShmemVariableCache->latestCompletedXid = max_xid;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -2584,9 +2546,9 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
 void
 ExpireAllKnownAssignedTransactionIds(void)
 {
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	KnownAssignedXidsRemovePreceding(InvalidTransactionId);
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -2596,9 +2558,9 @@ ExpireAllKnownAssignedTransactionIds(void)
 void
 ExpireOldKnownAssignedTransactionIds(TransactionId xid)
 {
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	KnownAssignedXidsRemovePreceding(xid);
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 
@@ -2820,7 +2782,7 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 	{
 		/* must hold lock to compress */
 		if (!exclusive_lock)
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+			ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 		KnownAssignedXidsCompress(true);
 
@@ -2828,7 +2790,7 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 		/* note: we no longer care about the tail pointer */
 
 		if (!exclusive_lock)
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 
 		/*
 		 * If it still won't fit then we're out of memory
diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index 3730e51..27eaa97 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = flexlock.o lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o \
-	predicate.o
+	procarraylock.o predicate.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/storage/lmgr/flexlock.c b/src/backend/storage/lmgr/flexlock.c
index 7f657b3..c88bd24 100644
--- a/src/backend/storage/lmgr/flexlock.c
+++ b/src/backend/storage/lmgr/flexlock.c
@@ -30,6 +30,7 @@
 #include "storage/lwlock.h"
 #include "storage/predicate.h"
 #include "storage/proc.h"
+#include "storage/procarraylock.h"
 #include "storage/spin.h"
 #include "utils/elog.h"
 
@@ -177,9 +178,14 @@ CreateFlexLocks(void)
 
 	FlexLockArray = (FlexLockPadded *) ptr;
 
-	/* All of the "fixed" FlexLocks are LWLocks. */
+	/* All of the "fixed" FlexLocks are LWLocks - except ProcArrayLock. */
 	for (id = 0, lock = FlexLockArray; id < NumFixedFlexLocks; id++, lock++)
-		FlexLockInit(&lock->flex, FLEXLOCK_TYPE_LWLOCK);
+	{
+		if (id == ProcArrayLock)
+			FlexLockInit(&lock->flex, FLEXLOCK_TYPE_PROCARRAYLOCK);
+		else
+			FlexLockInit(&lock->flex, FLEXLOCK_TYPE_LWLOCK);
+	}
 
 	/*
 	 * Initialize the dynamic-allocation counter, which is stored just before
@@ -324,13 +330,20 @@ FlexLockReleaseAll(void)
 {
 	while (num_held_flexlocks > 0)
 	{
+		FlexLockId	id;
+		FlexLock   *flex;
+
 		HOLD_INTERRUPTS();		/* match the upcoming RESUME_INTERRUPTS */
 
-		/*
-		 * FLEXTODO: When we have multiple types of flex locks, this will
-		 * need to call the appropriate release function for each lock type.
-		 */
-		LWLockRelease(held_flexlocks[num_held_flexlocks - 1]);
+		id = held_flexlocks[num_held_flexlocks - 1];
+		flex = &FlexLockArray[id].flex;
+		if (flex->locktype == FLEXLOCK_TYPE_LWLOCK)
+			LWLockRelease(id);
+		else
+		{
+			Assert(id == ProcArrayLock);
+			ProcArrayLockRelease();
+		}
 	}
 }
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 57da345..510a4c2 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -45,6 +45,7 @@
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "storage/procsignal.h"
 #include "storage/spin.h"
 #include "utils/timestamp.h"
@@ -1046,7 +1047,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		{
 			PGPROC	   *autovac = GetBlockingAutoVacuumPgproc();
 
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+			ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 			/*
 			 * Only do it if the worker is not working to protect against Xid
@@ -1062,7 +1063,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 					 pid);
 
 				/* don't hold the lock across the kill() syscall */
-				LWLockRelease(ProcArrayLock);
+				ProcArrayLockRelease();
 
 				/* send the autovacuum worker Back to Old Kent Road */
 				if (kill(pid, SIGINT) < 0)
@@ -1074,7 +1075,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 				}
 			}
 			else
-				LWLockRelease(ProcArrayLock);
+				ProcArrayLockRelease();
 
 			/* prevent signal from being resent more than once */
 			allow_autovacuum_cancel = false;
diff --git a/src/backend/storage/lmgr/procarraylock.c b/src/backend/storage/lmgr/procarraylock.c
new file mode 100644
index 0000000..6838ed6
--- /dev/null
+++ b/src/backend/storage/lmgr/procarraylock.c
@@ -0,0 +1,341 @@
+/*-------------------------------------------------------------------------
+ *
+ * procarraylock.c
+ *	  Lock management for the ProcArray
+ *
+ * Because the ProcArray data structure is highly trafficked, it is
+ * critical that mutual exclusion for ProcArray options be as efficient
+ * as possible.  A particular problem is transaction end (commit or abort)
+ * which cannot be done in parallel with snapshot acquisition.  We
+ * therefore include some special hacks to deal with this case efficiently.
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/lmgr/procarraylock.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "pg_trace.h"
+#include "access/transam.h"
+#include "storage/flexlock_internals.h"
+#include "storage/ipc.h"
+#include "storage/procarraylock.h"
+#include "storage/proc.h"
+#include "storage/spin.h"
+
+typedef struct ProcArrayLockStruct
+{
+	FlexLock	flex;			/* common FlexLock infrastructure */
+	char		exclusive;		/* # of exclusive holders (0 or 1) */
+	int			shared;			/* # of shared holders (0..MaxBackends) */
+	PGPROC	   *ending;			/* transactions wishing to clear state */
+	TransactionId	latest_ending_xid;	/* latest ending XID */
+} ProcArrayLockStruct;
+
+/* There is only one ProcArrayLock. */
+#define	ProcArrayLockPointer() \
+	(AssertMacro(FlexLockArray[ProcArrayLock].flex.locktype == \
+		FLEXLOCK_TYPE_PROCARRAYLOCK), \
+	 (volatile ProcArrayLockStruct *) &FlexLockArray[ProcArrayLock])
+
+/*
+ * ProcArrayLockAcquire - acquire a lightweight lock in the specified mode
+ *
+ * If the lock is not available, sleep until it is.
+ *
+ * Side effect: cancel/die interrupts are held off until lock release.
+ */
+void
+ProcArrayLockAcquire(ProcArrayLockMode mode)
+{
+	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
+	PGPROC	   *proc = MyProc;
+	bool		retry = false;
+	int			extraWaits = 0;
+
+	/*
+	 * We can't wait if we haven't got a PGPROC.  This should only occur
+	 * during bootstrap or shared memory initialization.  Put an Assert here
+	 * to catch unsafe coding practices.
+	 */
+	Assert(!(proc == NULL && IsUnderPostmaster));
+
+	/*
+	 * Lock out cancel/die interrupts until we exit the code section protected
+	 * by the ProcArrayLock.  This ensures that interrupts will not interfere
+     * with manipulations of data structures in shared memory.
+	 */
+	HOLD_INTERRUPTS();
+
+	/*
+	 * Loop here to try to acquire lock after each time we are signaled by
+	 * ProcArrayLockRelease.  See comments in LWLockAcquire for an explanation
+	 * of why do we not attempt to hand off the lock directly.
+	 */
+	for (;;)
+	{
+		bool		mustwait;
+
+		/* Acquire mutex.  Time spent holding mutex should be short! */
+		SpinLockAcquire(&lock->flex.mutex);
+
+		/* If retrying, allow LWLockRelease to release waiters again */
+		if (retry)
+			lock->flex.releaseOK = true;
+
+		/* If I can get the lock, do so quickly. */
+		if (mode == PAL_EXCLUSIVE)
+		{
+			if (lock->exclusive == 0 && lock->shared == 0)
+			{
+				lock->exclusive++;
+				mustwait = false;
+			}
+			else
+				mustwait = true;
+		}
+		else
+		{
+			if (lock->exclusive == 0)
+			{
+				lock->shared++;
+				mustwait = false;
+			}
+			else
+				mustwait = true;
+		}
+
+		if (!mustwait)
+			break;				/* got the lock */
+
+		/* Add myself to wait queue. */
+		FlexLockJoinWaitQueue(lock, (int) mode);
+
+		/* Can release the mutex now */
+		SpinLockRelease(&lock->flex.mutex);
+
+		/* Wait until awakened. */
+		extraWaits += FlexLockWait(ProcArrayLock, mode);
+
+		/* Now loop back and try to acquire lock again. */
+		retry = true;
+	}
+
+	/* We are done updating shared state of the lock itself. */
+	SpinLockRelease(&lock->flex.mutex);
+
+	TRACE_POSTGRESQL_FLEXLOCK_ACQUIRE(lockid, mode);
+
+	/* Add lock to list of locks held by this backend */
+	FlexLockRemember(ProcArrayLock);
+
+	/*
+	 * Fix the process wait semaphore's count for any absorbed wakeups.
+	 */
+	while (extraWaits-- > 0)
+		PGSemaphoreUnlock(&proc->sem);
+}
+
+/*
+ * ProcArrayLockClearTransaction - safely clear transaction details
+ *
+ * This can't be done while ProcArrayLock is held, but it's so fast that
+ * we can afford to do it while holding the spinlock, rather than acquiring
+ * and releasing the lock.
+ */
+void
+ProcArrayLockClearTransaction(TransactionId latestXid)
+{
+	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
+	PGPROC	   *proc = MyProc;
+	int			extraWaits = 0;
+	bool		mustwait;
+
+	HOLD_INTERRUPTS();
+
+	/* Acquire mutex.  Time spent holding mutex should be short! */
+	SpinLockAcquire(&lock->flex.mutex);
+
+	if (lock->exclusive == 0 && lock->shared == 0)
+	{
+		{
+			volatile PGPROC *vproc = proc;
+			/* If there are no lockers, clar the critical PGPROC fields. */
+			vproc->xid = InvalidTransactionId;
+	        vproc->xmin = InvalidTransactionId;
+	        /* must be cleared with xid/xmin: */
+	        vproc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
+			vproc->subxids.nxids = 0;
+			vproc->subxids.overflowed = false;
+		}
+		mustwait = false;
+
+        /* Also advance global latestCompletedXid while holding the lock */
+        if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
+                                  latestXid))
+            ShmemVariableCache->latestCompletedXid = latestXid;
+	}
+	else
+	{
+		/* Rats, must wait. */
+		proc->flWaitLink = lock->ending;
+		lock->ending = proc;
+		if (!TransactionIdIsValid(lock->latest_ending_xid) ||
+				TransactionIdPrecedes(lock->latest_ending_xid, latestXid)) 
+			lock->latest_ending_xid = latestXid;
+		mustwait = true;
+	}
+
+	/* Can release the mutex now */
+	SpinLockRelease(&lock->flex.mutex);
+
+	/*
+	 * If we were not able to perfom the operation immediately, we must wait.
+	 * But we need not retry after being awoken, because the last lock holder
+	 * to release the lock will do the work first, on our behalf.
+	 */
+	if (mustwait)
+	{
+		extraWaits += FlexLockWait(ProcArrayLock, 2);
+		while (extraWaits-- > 0)
+			PGSemaphoreUnlock(&proc->sem);
+	}
+
+	RESUME_INTERRUPTS();
+}
+
+/*
+ * ProcArrayLockRelease - release a previously acquired lock
+ */
+void
+ProcArrayLockRelease(void)
+{
+	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
+	PGPROC	   *head;
+	PGPROC	   *ending = NULL;
+	PGPROC	   *proc;
+
+	FlexLockForget(ProcArrayLock);
+
+	/* Acquire mutex.  Time spent holding mutex should be short! */
+	SpinLockAcquire(&lock->flex.mutex);
+
+	/* Release my hold on lock */
+	if (lock->exclusive > 0)
+		lock->exclusive--;
+	else
+	{
+		Assert(lock->shared > 0);
+		lock->shared--;
+	}
+
+	/*
+	 * If the lock is now free, but there are some transactions trying to
+	 * end, we must clear the critical PGPROC fields for them, and save a
+	 * list of them so we can wake them up.
+	 */
+	if (lock->exclusive == 0 && lock->shared == 0 && lock->ending != NULL)
+	{
+		volatile PGPROC *vproc;
+
+		ending = lock->ending;
+		vproc = ending;
+
+		while (vproc != NULL)
+		{
+        	vproc->xid = InvalidTransactionId;
+	        vproc->xmin = InvalidTransactionId;
+	        /* must be cleared with xid/xmin: */
+	        vproc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
+			vproc->subxids.nxids = 0;
+			vproc->subxids.overflowed = false;
+			vproc = vproc->flWaitLink;
+		}
+
+		/* Also advance global latestCompletedXid */
+		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
+								  lock->latest_ending_xid))
+			ShmemVariableCache->latestCompletedXid = lock->latest_ending_xid;
+
+		/* Reset lock state. */
+		lock->ending = NULL;
+		lock->latest_ending_xid = InvalidTransactionId;
+	}
+
+	/*
+	 * See if I need to awaken any waiters.  If I released a non-last shared
+	 * hold, there cannot be anything to do.  Also, do not awaken any waiters
+	 * if someone has already awakened waiters that haven't yet acquired the
+	 * lock.
+	 */
+	head = lock->flex.head;
+	if (head != NULL)
+	{
+		if (lock->exclusive == 0 && lock->shared == 0 && lock->flex.releaseOK)
+		{
+			/*
+			 * Remove the to-be-awakened PGPROCs from the queue.  If the front
+			 * waiter wants exclusive lock, awaken him only. Otherwise awaken
+			 * as many waiters as want shared access.
+			 */
+			proc = head;
+			if (proc->flWaitMode != LW_EXCLUSIVE)
+			{
+				while (proc->flWaitLink != NULL &&
+					   proc->flWaitLink->flWaitMode != LW_EXCLUSIVE)
+					proc = proc->flWaitLink;
+			}
+			/* proc is now the last PGPROC to be released */
+			lock->flex.head = proc->flWaitLink;
+			proc->flWaitLink = NULL;
+			/* prevent additional wakeups until retryer gets to run */
+			lock->flex.releaseOK = false;
+		}
+		else
+		{
+			/* lock is still held, can't awaken anything */
+			head = NULL;
+		}
+	}
+
+	/* We are done updating shared state of the lock itself. */
+	SpinLockRelease(&lock->flex.mutex);
+
+	TRACE_POSTGRESQL_FLEXLOCK_RELEASE(lockid);
+
+	/*
+	 * Awaken any waiters I removed from the queue.
+	 */
+	while (head != NULL)
+	{
+		FlexLockDebug("LWLockRelease", lockid, "release waiter");
+		proc = head;
+		head = proc->flWaitLink;
+		proc->flWaitLink = NULL;
+		proc->flWaitResult = 1;		/* any non-zero value will do */
+		PGSemaphoreUnlock(&proc->sem);
+	}
+
+	/*
+	 * Also awaken any processes whose critical PGPROC fields I cleared
+	 */
+	while (ending != NULL)
+	{
+		FlexLockDebug("LWLockRelease", lockid, "release ending");
+		proc = ending;
+		ending = proc->flWaitLink;
+		proc->flWaitLink = NULL;
+		proc->flWaitResult = 1;		/* any non-zero value will do */
+		PGSemaphoreUnlock(&proc->sem);
+	}
+
+	/*
+	 * Now okay to allow cancel/die interrupts.
+	 */
+	RESUME_INTERRUPTS();
+}
diff --git a/src/include/storage/flexlock_internals.h b/src/include/storage/flexlock_internals.h
index 5f78da7..d1bca45 100644
--- a/src/include/storage/flexlock_internals.h
+++ b/src/include/storage/flexlock_internals.h
@@ -43,6 +43,7 @@ typedef struct FlexLock
 } FlexLock;
 
 #define FLEXLOCK_TYPE_LWLOCK			'l'
+#define FLEXLOCK_TYPE_PROCARRAYLOCK		'p'
 
 typedef union FlexLockPadded
 {
diff --git a/src/include/storage/procarraylock.h b/src/include/storage/procarraylock.h
new file mode 100644
index 0000000..678ca6f
--- /dev/null
+++ b/src/include/storage/procarraylock.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * procarraylock.h
+ *	  Lock management for the ProcArray
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/lwlock.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PROCARRAYLOCK_H
+#define PROCARRAYLOCK_H
+
+#include "storage/flexlock.h"
+
+typedef enum ProcArrayLockMode
+{
+	PAL_EXCLUSIVE,
+	PAL_SHARED
+} ProcArrayLockMode;
+
+extern void ProcArrayLockAcquire(ProcArrayLockMode mode);
+extern void ProcArrayLockClearTransaction(TransactionId latestXid);
+extern void ProcArrayLockRelease(void);
+
+#endif   /* PROCARRAYLOCK_H */

#21

Pavan Deolasee

pavan.deolasee@gmail.com

about 14 years ago

In reply to: Robert Haas (#1)

Re: FlexLocks

On Tue, Nov 15, 2011 at 7:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:

The lower layer I called "FlexLocks",
and it's designed to allow a variety of locking implementations to be
built on top of it and reuse as much of the basic infrastructure as I
could figure out how to make reusable without hurting performance too
much. LWLocks become the anchor client of the FlexLock system; in
essence, most of flexlock.c is code that was removed from lwlock.c.
The second patch, procarraylock.c, uses that infrastructure to define
a new type of FlexLock specifically for ProcArrayLock. It basically
works like a regular LWLock, except that it has a special operation to
optimize ProcArrayEndTransaction(). In the uncontended case, instead
of acquiring and releasing the lock, it just grabs the lock, observes
that there is no contention, clears the critical PGPROC fields (which
isn't noticeably slower than updating the state of the lock would be)
and releases the spin lock.

(Robert, we already discussed this a bit privately, so apologies for
duplicating this here)

Another idea is to have some sort of shared work queue mechanism which
might turn out to be more manageable and extendable. What I am
thinking about is having a {Request, Response} kind of structure per
backend in shared memory. An obvious place to hold them is in PGPROC
for every backend. We the have a new API like LWLockExecute(lock,
mode, ReqRes). The caller first initializes the ReqRes structure with
the work it needs get done and then calls LWLockExecute with that.
IOW, the code flow would look like this:

<Initialize the Req/Res structure with request type and input data>
LWLockExecute(lock, mode, ReqRes)
<Consume Response and proceed further>

If the lock is available in the desired mode, LWLockExecute() will
internally finish the work and return immediately. If the lock is
contended, the process would sleep. When current holder of the lock
finishes its work and calls LWLockRelease() to release the lock, it
would not only find the processes to wake up, but would also go
through their pending work items and complete them before waking them
up. The Response area will be populated with the result.

I think this general mechanism will be useful for many users of
LWLock, especially those who do very trivial updates/reads from the
shared area, but still need synchronization. One example that Robert
has already found helping a lot if ProcArrayEndTransaction. Also, even
though both shared and exclusive waiters can use this mechanism, it
may make more sense to the exclusive waiters because of the
exclusivity. For sake of simplicity, we can choose to force a
semantics that when LWLockExecute returns, the work is guaranteed to
be done, either by self or some other backend. That will keep the code
simpler for users of this new API.

Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#22

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Pavan Deolasee (#21)

Re: FlexLocks

On Wed, Nov 16, 2011 at 11:16 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

On Tue, Nov 15, 2011 at 7:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:

The lower layer I called "FlexLocks",
and it's designed to allow a variety of locking implementations to be
built on top of it and reuse as much of the basic infrastructure as I
could figure out how to make reusable without hurting performance too
much. LWLocks become the anchor client of the FlexLock system; in
essence, most of flexlock.c is code that was removed from lwlock.c.
The second patch, procarraylock.c, uses that infrastructure to define
a new type of FlexLock specifically for ProcArrayLock. It basically
works like a regular LWLock, except that it has a special operation to
optimize ProcArrayEndTransaction(). In the uncontended case, instead
of acquiring and releasing the lock, it just grabs the lock, observes
that there is no contention, clears the critical PGPROC fields (which
isn't noticeably slower than updating the state of the lock would be)
and releases the spin lock.

(Robert, we already discussed this a bit privately, so apologies for
duplicating this here)

Another idea is to have some sort of shared work queue mechanism which
might turn out to be more manageable and extendable. What I am
thinking about is having a {Request, Response} kind of structure per
backend in shared memory. An obvious place to hold them is in PGPROC
for every backend. We the have a new API like LWLockExecute(lock,
mode, ReqRes). The caller first initializes the ReqRes structure with
the work it needs get done and then calls LWLockExecute with that.
IOW, the code flow would look like this:

<Initialize the Req/Res structure with request type and input data>
LWLockExecute(lock, mode, ReqRes)
<Consume Response and proceed further>

If the lock is available in the desired mode, LWLockExecute() will
internally finish the work and return immediately. If the lock is
contended, the process would sleep. When current holder of the lock
finishes its work and calls LWLockRelease() to release the lock, it
would not only find the processes to wake up, but would also go
through their pending work items and complete them before waking them
up. The Response area will be populated with the result.

I think this general mechanism will be useful for many users of
LWLock, especially those who do very trivial updates/reads from the
shared area, but still need synchronization. One example that Robert
has already found helping a lot if ProcArrayEndTransaction. Also, even
though both shared and exclusive waiters can use this mechanism, it
may make more sense to the exclusive waiters because of the
exclusivity. For sake of simplicity, we can choose to force a
semantics that when LWLockExecute returns, the work is guaranteed to
be done, either by self or some other backend. That will keep the code
simpler for users of this new API.

I am not convinced that that's a better API. I mean, consider
something like this:

/*
* OK, let's do it. First let other backends know I'm in ANALYZE.
*/
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);

I'm not sure exactly how you'd proposed to rewrite that, but I think
it's almost guaranteed to be more than three lines of code. Also, you
can't assume that the "work" can be done equally well by any backend.
In this case it could, because the PGPROC structures are all in shared
memory, but that won't work for something like GetSnapshotData(),
which needs to copy a nontrivial amount of data into backend-local
memory.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#23

Pavan Deolasee

pavan.deolasee@gmail.com

about 14 years ago

In reply to: Robert Haas (#22)

Re: FlexLocks

On Thu, Nov 17, 2011 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I am not convinced that that's a better API. I mean, consider
something like this:

/*
* OK, let's do it. First let other backends know I'm in ANALYZE.
*/
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);

I'm not sure exactly how you'd proposed to rewrite that, but I think
it's almost guaranteed to be more than three lines of code.

I would guess the ReqRes will look something like this where
ReqResRequest/Response would probably be union of all various requests
and responses, one for each type of request:

struct ReqRes {
ReqResRequestType reqtype;
ReqResRequest req;
ReqResResponse res;
}

The code above can be rewritten as:

reqRes.reqtype = RR_PROC_SET_VACUUMFLAGS;
reqRes.req.set_vacuumflags.flags = PROC_IN_ANALYZE;
LWLockExecute(ProcArrayLock, LW_EXCLUSIVE, &reqRes);

I mean, I agree it doesn't look exactly elegant and the number of
requests types and their handling may go up a lot, but we need to do
this only for those heavily contended locks. Other callers can
continue with the current code style. But with this general
infrastructure, there will be still be a way to do this.

Also, you
can't assume that the "work" can be done equally well by any backend.
In this case it could, because the PGPROC structures are all in shared
memory, but that won't work for something like GetSnapshotData(),
which needs to copy a nontrivial amount of data into backend-local
memory.

Yeah, I am not suggesting we should do (even though I think it should
be possible with appropriate input/output data) this everywhere. But
places where this can done, like end-transaction stuff, the
infrastructure might be quite useful.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#24

Pavan Deolasee

pavan.deolasee@gmail.com

about 14 years ago

In reply to: Pavan Deolasee (#23)

1 attachment(s)

Re: FlexLocks

On Thu, Nov 17, 2011 at 10:19 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

On Thu, Nov 17, 2011 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I am not convinced that that's a better API. I mean, consider
something like this:

/*
* OK, let's do it. First let other backends know I'm in ANALYZE.
*/
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);

I'm not sure exactly how you'd proposed to rewrite that, but I think
it's almost guaranteed to be more than three lines of code.

I would guess the ReqRes will look something like this where
ReqResRequest/Response would probably be union of all various requests
and responses, one for each type of request:

struct ReqRes {
ReqResRequestType reqtype;
ReqResRequest req;
ReqResResponse res;
}

The code above can be rewritten as:

reqRes.reqtype = RR_PROC_SET_VACUUMFLAGS;
reqRes.req.set_vacuumflags.flags = PROC_IN_ANALYZE;
LWLockExecute(ProcArrayLock, LW_EXCLUSIVE, &reqRes);

My apologies for hijacking the thread, but the work seems quite
related, so I thought I should post here instead of starting a new
thread.

Here is a WIP patch based on the idea of having a shared Q. A process
trying to access the shared memory protected by a LWLock, sets up the
task in its PGPROC and calls a new API LWLockExecute(). If the LWLock
is available, the task is performed immediately and the function
returns. Otherwise, the process queues up itself on the lock. When the
last shared lock holder or the exclusive lock holder call
LWLockRelease(), it scans through such pending tasks, executes them
via a callback mechanism and wakes all those processes along with any
other normal waiter(s) waiting on LWLockAcquire().

I have only coded for ProcArrayEndTransaction, but it should fairly
easy to extend the usage at some more places, especially those which
does some simple modifications to the protected area. I don't propose
to use the technique for every user of LWLock, but there can be some
obvious candidates, including this one that Robert found out.

I see 35-40% improvement for 32-80 clients on a 5 minutes pgbench -N
run with scale factor of 100 and permanent tables. This is on a
32-core HP IA box.

There are few things that need some deliberations. The pending tasks
are right now executed while holding the mutex (spinlock). This is
good and bad for obvious reasons. We can possibly change that so that
the work is done without holding the spinlock or leave to the caller
to choose the behavior. Doing it without holding the spinlock will
make the technique interesting for many more callers. We can also
rework the task execution so that pending similar requests from
multiple callers can be combined and executed with a single callback,
if the caller knows its safe to do so. I haven't thought through the
API/callback changes to support that, but its definitely possible and
could be quite useful in many cases. For example, status of many
transactions can be checked with a single lookup of the ProcArray. Or
WAL inserts from multiple processes can be combined and written at
once.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Attachments:

Shared-Q-v5.patchtext/x-patch; charset=US-ASCII; name=Shared-Q-v5.patchDownload

commit 24f8e349d085e646cb918c552cc8ead7d38f7013
Author: Pavan Deolasee <pavan@ubuntu.(none)>
Date:   Fri Nov 18 15:49:54 2011 +0530

    Implement a shared work Q mechanism. A process can queue its work for later
    execution if the protecting lock is currently not available. Backend which
    releases the last lock will finish the work and wake up the waiting process.

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 1a48485..59d2958 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -157,6 +157,7 @@ static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray,
 							   TransactionId xmax);
 static TransactionId KnownAssignedXidsGetOldestXmin(void);
 static void KnownAssignedXidsDisplay(int trace_level);
+static bool ProcArrayEndTransactionWQ(WorkQueueData *wqdata);
 
 /*
  * Report shared-memory space needed by CreateSharedProcArray.
@@ -331,8 +332,6 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 
 	elog(LOG, "failed to find proc %p in ProcArray", proc);
 }
-
-
 /*
  * ProcArrayEndTransaction -- mark a transaction as no longer running
  *
@@ -352,33 +351,24 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 	if (TransactionIdIsValid(latestXid))
 	{
 		/*
-		 * We must lock ProcArrayLock while clearing proc->xid, so that we do
-		 * not exit the set of "running" transactions while someone else is
-		 * taking a snapshot.  See discussion in
-		 * src/backend/access/transam/README.
+		 * Use the shared work queue mechanism to get the work done. If the
+		 * ProcArrayLock is available, it will done immediately, otherwise it
+		 * will be queued up and some other backend (the one who releases the
+		 * lock last) will do it for us and wake us up
+		 *
+		 * Use the shared area in PGPROC to communicate with other backends. We
+		 * can write to the area without holding any lock becuase its
+		 * read/written by other backends only when we are sleeping in the
+		 * queue and only one backend can access it at any time
 		 */
-		Assert(TransactionIdIsValid(proc->xid));
-
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-		proc->xid = InvalidTransactionId;
-		proc->lxid = InvalidLocalTransactionId;
-		proc->xmin = InvalidTransactionId;
-		/* must be cleared with xid/xmin: */
-		proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		proc->inCommit = false; /* be sure this is cleared in abort */
-		proc->recoveryConflictPending = false;
-
-		/* Clear the subtransaction-XID cache too while holding the lock */
-		proc->subxids.nxids = 0;
-		proc->subxids.overflowed = false;
+		WorkQueueData *wqdata = &proc->wqdata;
 
-		/* Also advance global latestCompletedXid while holding the lock */
-		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-								  latestXid))
-			ShmemVariableCache->latestCompletedXid = latestXid;
+		wqdata->wq_reqtype = WQ_END_TRANSACTION;
+		wqdata->wq_reqin.wqin_end_xact.proc = proc;
+		wqdata->wq_reqin.wqin_end_xact.latestXid = latestXid;
+		wqdata->wq_exec = ProcArrayEndTransactionWQ;
 
-		LWLockRelease(ProcArrayLock);
+		LWLockExecute(ProcArrayLock, LW_EXCLUSIVE);
 	}
 	else
 	{
@@ -401,6 +391,38 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 	}
 }
 
+/*
+ * Do the real work for ProcArrayEndTransaction. We are called while holding
+ * the mutex, so be quick and fast
+ */
+static bool
+ProcArrayEndTransactionWQ(WorkQueueData *wqdata)
+{
+	volatile PGPROC *proc = (PGPROC *) wqdata->wq_reqin.wqin_end_xact.proc;
+	TransactionId latestXid = wqdata->wq_reqin.wqin_end_xact.latestXid;
+
+	Assert(TransactionIdIsValid(proc->xid));
+
+	proc->xid = InvalidTransactionId;
+	proc->lxid = InvalidLocalTransactionId;
+	proc->xmin = InvalidTransactionId;
+	/* must be cleared with xid/xmin: */
+	proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
+	proc->inCommit = false; /* be sure this is cleared in abort */
+	proc->recoveryConflictPending = false;
+
+	/* Clear the subtransaction-XID cache too while holding the lock */
+	proc->subxids.nxids = 0;
+	proc->subxids.overflowed = false;
+
+	/* Also advance global latestCompletedXid while holding the lock */
+	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
+				latestXid))
+		ShmemVariableCache->latestCompletedXid = latestXid;
+
+	return true;
+}
+
 
 /*
  * ProcArrayClearTransaction -- clear the transaction fields
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 079eb29..ab2899e 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -43,6 +43,8 @@ typedef struct LWLock
 	bool		releaseOK;		/* T if ok to release waiters */
 	char		exclusive;		/* # of exclusive holders (0 or 1) */
 	int			shared;			/* # of shared holders (0..MaxBackends) */
+	PGPROC		*sq_head;		/* head of list of pending requests */
+	PGPROC		*sq_tail;		/* tail of list of pending request */
 	PGPROC	   *head;			/* head of list of waiting PGPROCs */
 	PGPROC	   *tail;			/* tail of list of waiting PGPROCs */
 	/* tail is undefined when head is NULL */
@@ -269,6 +271,8 @@ CreateLWLocks(void)
 		lock->lock.releaseOK = true;
 		lock->lock.exclusive = 0;
 		lock->lock.shared = 0;
+		lock->lock.sq_head = NULL;
+		lock->lock.sq_tail = NULL;
 		lock->lock.head = NULL;
 		lock->lock.tail = NULL;
 	}
@@ -571,7 +575,7 @@ void
 LWLockRelease(LWLockId lockid)
 {
 	volatile LWLock *lock = &(LWLockArray[lockid].lock);
-	PGPROC	   *head;
+	PGPROC	   *head, *wakeupq_head, *wakeupq_tail;
 	PGPROC	   *proc;
 	int			i;
 
@@ -610,21 +614,48 @@ LWLockRelease(LWLockId lockid)
 	 * if someone has already awakened waiters that haven't yet acquired the
 	 * lock.
 	 */
-	head = lock->head;
-	if (head != NULL)
+	if (lock->exclusive == 0 && lock->shared == 0 && lock->releaseOK)
 	{
-		if (lock->exclusive == 0 && lock->shared == 0 && lock->releaseOK)
+		/*
+		 * First process any pending requests and add those processes to the
+		 * list of to-be-awaken processes.
+		 */
+		head = lock->sq_head;
+		while (head != NULL)
+		{
+			proc = head;
+
+			Assert(proc->wqdata.wq_reqpending);
+			
+			/* Finish the work */
+			proc->wqdata.wq_exec(&proc->wqdata);
+			proc->wqdata.wq_reqpending = false;
+
+			head = proc->lwWaitLink;
+		}
+		
+		/*
+		 * We must wake up all those processes which are waiting for their
+		 * pending requests to be processed
+		 */
+		wakeupq_head = lock->sq_head;
+		wakeupq_tail = lock->sq_tail;
+
+		lock->sq_head = lock->sq_tail = NULL;
+
+		/*
+		 * Remove the to-be-awakened PGPROCs from the queue.  If the front
+		 * waiter wants exclusive lock, awaken him only. Otherwise awaken
+		 * as many waiters as want shared access.
+		 */
+		head = lock->head;
+		if (head != NULL)
 		{
-			/*
-			 * Remove the to-be-awakened PGPROCs from the queue.  If the front
-			 * waiter wants exclusive lock, awaken him only. Otherwise awaken
-			 * as many waiters as want shared access.
-			 */
 			proc = head;
 			if (!proc->lwExclusive)
 			{
 				while (proc->lwWaitLink != NULL &&
-					   !proc->lwWaitLink->lwExclusive)
+						!proc->lwWaitLink->lwExclusive)
 					proc = proc->lwWaitLink;
 			}
 			/* proc is now the last PGPROC to be released */
@@ -633,11 +664,19 @@ LWLockRelease(LWLockId lockid)
 			/* prevent additional wakeups until retryer gets to run */
 			lock->releaseOK = false;
 		}
+
+		/*
+		 * Add any other processes to be woken up to the list
+		 */
+		if (wakeupq_head)
+			wakeupq_tail->lwWaitLink = head;
 		else
-		{
-			/* lock is still held, can't awaken anything */
-			head = NULL;
-		}
+			wakeupq_head = head;
+	}
+	else
+	{
+		/* lock is still held, can't awaken anything */
+		wakeupq_head = NULL;
 	}
 
 	/* We are done updating shared state of the lock itself. */
@@ -648,6 +687,7 @@ LWLockRelease(LWLockId lockid)
 	/*
 	 * Awaken any waiters I removed from the queue.
 	 */
+	head = wakeupq_head;
 	while (head != NULL)
 	{
 		LOG_LWDEBUG("LWLockRelease", lockid, "release waiter");
@@ -685,6 +725,180 @@ LWLockReleaseAll(void)
 	}
 }
 
+/*
+ * Acquire the LWLock and execute the given task. If the lock is not available,
+ * we queue the task which will be executed when the lock becomes available.
+ * Even though, both exclusive and shared tasks may be performed, its useful to
+ * use this interface for tasks that need exclusive access to the lock. The
+ * task must be very short and must not perform any IO since it may be done
+ * while holding the spinlock. Also, any ereports must be avoided while
+ * executing the task.
+ *
+ * The caller can assume that the task is finished successfully when the
+ * function returns. If there is any out information, that will be returned
+ * in the MyProc->wqdata shared area.
+ */ 
+void
+LWLockExecute(LWLockId lockid, LWLockMode mode)
+{
+	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	PGPROC	   *proc = MyProc;
+	WorkQueueData	*wqdata = &proc->wqdata;
+	int			extraWaits = 0;
+	bool		mustwait;
+
+	PRINT_LWDEBUG("LWLockExecute", lockid, lock);
+
+#ifdef LWLOCK_STATS
+	/* Set up local count state first time through in a given process */
+	if (counts_for_pid != MyProcPid)
+	{
+		int		   *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
+		int			numLocks = LWLockCounter[1];
+
+		sh_acquire_counts = calloc(numLocks, sizeof(int));
+		ex_acquire_counts = calloc(numLocks, sizeof(int));
+		block_counts = calloc(numLocks, sizeof(int));
+		counts_for_pid = MyProcPid;
+		on_shmem_exit(print_lwlock_stats, 0);
+	}
+	/* Count lock acquisition attempts */
+	if (mode == LW_EXCLUSIVE)
+		ex_acquire_counts[lockid]++;
+	else
+		sh_acquire_counts[lockid]++;
+#endif   /* LWLOCK_STATS */
+
+	/*
+	 * We can't wait if we haven't got a PGPROC.  This should only occur
+	 * during bootstrap or shared memory initialization.  Put an Assert here
+	 * to catch unsafe coding practices.
+	 */
+	Assert(!(proc == NULL && IsUnderPostmaster));
+
+	/*
+	 * Lock out cancel/die interrupts until we exit the code section protected
+	 * by the LWLock.  This ensures that interrupts will not interfere with
+	 * manipulations of data structures in shared memory.
+	 */
+	HOLD_INTERRUPTS();
+
+	/* Acquire mutex.  Time spent holding mutex should be short! */
+	SpinLockAcquire(&lock->mutex);
+
+	/* If I can get the lock, execute the task and we are done */
+	if (mode == LW_EXCLUSIVE)
+	{
+		if (lock->exclusive == 0 && lock->shared == 0)
+			mustwait = false;
+		else
+			mustwait = true;
+	}
+	else
+	{
+		if (lock->exclusive == 0)
+			mustwait = false;
+		else
+			mustwait = true;
+	}
+
+	if (!mustwait)
+	{
+		/*
+		 * No other process in critical section and none can get in until we
+		 * are holding the mutex
+		 *
+		 * Invoke callback for work execution. We are still holding the
+		 * mutex, so the callback must be short and quick. We rely that the
+		 * users of the function will be smart enough to know that
+		 *
+		 * XXX Alternatively, we can acquire the LWLock in the desired mode
+		 * (lock->shared/lock->exclusive must be incremented above), execute
+		 * the task and then release the lock. May be the caller can pass
+		 * additional information to choose the desired behavior. For now, just
+		 * do this while holding the mutex.
+		 */
+		wqdata->wq_exec(wqdata);
+		/* Can release the mutex now */
+		SpinLockRelease(&lock->mutex);
+	}
+	else
+	{
+		/*
+		 * Add myself to wait queue.
+		 *
+		 * If we don't have a PGPROC structure, there's no way to wait. This
+		 * should never occur, since MyProc should only be null during shared
+		 * memory initialization.
+		 */
+		if (proc == NULL)
+			elog(PANIC, "cannot wait without a PGPROC structure");
+
+		proc->lwWaiting = true;
+		proc->lwExclusive = (mode == LW_EXCLUSIVE);
+		proc->lwWaitLink = NULL;
+
+		/* Mark the request as pending. Its of not much use right now */
+		wqdata->wq_reqpending = true;
+
+		if (lock->sq_head == NULL)
+			lock->sq_head = proc;
+		else
+			lock->sq_tail->lwWaitLink = proc;
+		lock->sq_tail = proc;
+
+		/* XXX Should we set lock->releaseOK to true here ? */
+		/* lock->releaseOK = true; */
+
+		/* Can release the mutex now */
+		SpinLockRelease(&lock->mutex);
+
+		/*
+		 * Wait until awakened.
+		 *
+		 * Since we share the process wait semaphore with the regular lock
+		 * manager and ProcWaitForSignal, and we may need to acquire an LWLock
+		 * while one of those is pending, it is possible that we get awakened
+		 * for a reason other than being signaled by LWLockRelease. If so,
+		 * loop back and wait again.  Once we've gotten the LWLock,
+		 * re-increment the sema by the number of additional signals received,
+		 * so that the lock manager or signal manager will see the received
+		 * signal when it next waits.
+		 */
+		LOG_LWDEBUG("LWLockExecute", lockid, "waiting");
+
+#ifdef LWLOCK_STATS
+		block_counts[lockid]++;
+#endif
+
+		TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);
+
+		for (;;)
+		{
+			/* "false" means cannot accept cancel/die interrupt here. */
+			PGSemaphoreLock(&proc->sem, false);
+			if (!proc->lwWaiting)
+				break;
+			extraWaits++;
+		}
+
+		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode);
+
+		LOG_LWDEBUG("LWLockExecute", lockid, "awakened");
+
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE(lockid, mode);
+		/*
+		 * Fix the process wait semaphore's count for any absorbed wakeups.
+		 */
+		while (extraWaits-- > 0)
+			PGSemaphoreUnlock(&proc->sem);
+	}
+
+	/*
+	 * Now okay to allow cancel/die interrupts.
+	 */
+	RESUME_INTERRUPTS();
+}
 
 /*
  * LWLockHeldByMe - test whether my process currently holds a lock
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 438a48d..7c2d99a 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -14,6 +14,8 @@
 #ifndef LWLOCK_H
 #define LWLOCK_H
 
+#include "storage/wqueue.h"
+
 /*
  * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
  * here, but we need them to set up enum LWLockId correctly, and having
@@ -114,5 +116,6 @@ extern Size LWLockShmemSize(void);
 extern void CreateLWLocks(void);
 
 extern void RequestAddinLWLocks(int n);
+extern void LWLockExecute(LWLockId lockid, LWLockMode mode);
 
 #endif   /* LWLOCK_H */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 6e798b1..34874f2 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -154,6 +154,9 @@ struct PGPROC
 	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID lock */
+
+	/* Shared work queue */
+	WorkQueueData	wqdata;
 };
 
 /* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */
diff --git a/src/include/storage/wqueue.h b/src/include/storage/wqueue.h
new file mode 100644
index 0000000..7ee5cd3
--- /dev/null
+++ b/src/include/storage/wqueue.h
@@ -0,0 +1,44 @@
+/*
+ * wqueue.h
+ *
+ * 	Implement shared work queue for processing
+ *
+ * 	src/include/storage/wqueue.h
+ */
+
+#ifndef _WQUEUE_H
+#define _WQUEUE_H
+
+typedef enum WQRequestType
+{
+	WQ_NO_REQUEST = 0,
+	WQ_END_TRANSACTION
+} WQRequestType;
+
+typedef union WQRequestIn
+{
+	struct {
+		void			*proc;
+		TransactionId	latestXid;
+	} wqin_end_xact;
+} WQRequestIn;
+
+typedef union WQRequestOut
+{
+	bool	wqout_status;
+} WQRequestOut;
+
+struct WorkQueueData;
+
+typedef bool (*WQExecuteReq) (struct WorkQueueData *wqdata);
+
+typedef struct WorkQueueData
+{
+	WQRequestType	wq_reqtype;
+	bool			wq_reqpending;
+	WQRequestIn		wq_reqin;
+	WQRequestOut	wq_reqout;
+	WQExecuteReq	wq_exec;
+} WorkQueueData;
+
+#endif

#25

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Pavan Deolasee (#24)

Re: FlexLocks

On Fri, Nov 18, 2011 at 6:26 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

My apologies for hijacking the thread, but the work seems quite
related, so I thought I should post here instead of starting a new
thread.

Here is a WIP patch based on the idea of having a shared Q. A process
trying to access the shared memory protected by a LWLock, sets up the
task in its PGPROC and calls a new API LWLockExecute(). If the LWLock
is available, the task is performed immediately and the function
returns. Otherwise, the process queues up itself on the lock. When the
last shared lock holder or the exclusive lock holder call
LWLockRelease(), it scans through such pending tasks, executes them
via a callback mechanism and wakes all those processes along with any
other normal waiter(s) waiting on LWLockAcquire().

I have only coded for ProcArrayEndTransaction, but it should fairly
easy to extend the usage at some more places, especially those which
does some simple modifications to the protected area. I don't propose
to use the technique for every user of LWLock, but there can be some
obvious candidates, including this one that Robert found out.

I see 35-40% improvement for 32-80 clients on a 5 minutes pgbench -N
run with scale factor of 100 and permanent tables. This is on a
32-core HP IA box.

There are few things that need some deliberations. The pending tasks
are right now executed while holding the mutex (spinlock). This is
good and bad for obvious reasons. We can possibly change that so that
the work is done without holding the spinlock or leave to the caller
to choose the behavior. Doing it without holding the spinlock will
make the technique interesting for many more callers. We can also
rework the task execution so that pending similar requests from
multiple callers can be combined and executed with a single callback,
if the caller knows its safe to do so. I haven't thought through the
API/callback changes to support that, but its definitely possible and
could be quite useful in many cases. For example, status of many
transactions can be checked with a single lookup of the ProcArray. Or
WAL inserts from multiple processes can be combined and written at
once.

So the upside and downside of this approach is that it modifies the
existing LWLock implementation rather than allowing multiple lock
implementations to exist side-by-side. That means every LWLock in the
system has access to this functionality, which might be convenient if
there turn out to be many uses for this technique. The bad news is
that everyone pays the cost of checking the work queue in
LWLockRelease(). It also means that you can't, for example, create a
custom lock with different lock modes (e.g. S, SX, X, as I proposed
upthread).

I am pretty dubious that there are going to be very many cases where
we can get away with holding the spinlock while doing the work. For
example, WAL flush is a clear example of where we can optimize away
spinlock acquisitions - if we communicate to people we wake up that
their LSN is already flushed, they needn't reacquire the lock to
check. But we certainly can't hold a spinlock across a WAL flush.
The nice thing about the FlexLock approach is that it permits
fine-grained control over these types of policies: one lock type can
switch to exclusive mode, do the work, and then reacquire the spinlock
to hand off the baton; another can do the work while holding the
spinlock; and still a third can forget about work queues altogether
but introduce additional lock modes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#26

Pavan Deolasee

pavan.deolasee@gmail.com

about 14 years ago

In reply to: Robert Haas (#25)

Re: FlexLocks

On Fri, Nov 18, 2011 at 10:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:

So the upside and downside of this approach is that it modifies the
existing LWLock implementation rather than allowing multiple lock
implementations to exist side-by-side. That means every LWLock in the
system has access to this functionality, which might be convenient if
there turn out to be many uses for this technique.

Right.

The bad news is
that everyone pays the cost of checking the work queue in
LWLockRelease().

I hope that would be minimal (may be just one instruction) for those
who don't want to use the facility.

It also means that you can't, for example, create a
custom lock with different lock modes (e.g. S, SX, X, as I proposed
upthread).

Thats a valid point.

I am pretty dubious that there are going to be very many cases where
we can get away with holding the spinlock while doing the work. For
example, WAL flush is a clear example of where we can optimize away
spinlock acquisitions - if we communicate to people we wake up that
their LSN is already flushed, they needn't reacquire the lock to
check. But we certainly can't hold a spinlock across a WAL flush.

I think thats mostly solvable as said upthread. We can and should
improve this mechanism so that the work is carried out with the
necessary LWLock instead of the spinlock. That would let other
processes to queue up for the lock while the tasks are being executed.
Or if the tasks only need shared lock, then other normal shared
requesters can go ahead and acquire the lock.

When I get some time, I would like to see if this can be extended to
have shared snapshots so that multiple callers of GetSnapshotData()
get the same snapshot, computed only once by scanning the proc array,
instead of having each process compute its own snapshot which remains
the same unless some transaction ends in between.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#27

Bruce Momjian

bruce@momjian.us

about 14 years ago

In reply to: Robert Haas (#20)

Re: FlexLocks

Robert Haas wrote:

On Wed, Nov 16, 2011 at 12:25 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

We could alternatively change one or the other of them to be a
struct with one member, but I think the cure might be worse than
the disease. ?By my count, we are talking about saving perhaps as
many as 34 lines of code changes here, and that's only if
complicating the type handling doesn't require any changes to
places that are untouched at present, which I suspect it would.

So I stepped through all the changes of this type, and I notice that
most of them are in areas where we've talked about likely benefits
of creating new FlexLock variants instead of staying with LWLocks;
if any of that is done (as seems likely), it further reduces the
impact from 34 lines. ?If we take care of LWLockHeldByMe() as you
describe, I'll concede the FlexLockId changes.

Updated patches attached.

It would be helpful if the patch included some text about how flexilocks
are different from ordinary lwlocks.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#28

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 14 years ago

In reply to: Robert Haas (#20)

Re: FlexLocks

Robert Haas <robertmhaas@gmail.com> wrote:

Updated patches attached.

I've gotten through several days of performance tests for this pair
of related patches, with results posted on a separate thread. I'll
link those in to the CF application shortly. To summarize the other
(fairly long) thread on benchmarks, it seemed like there might be a
very slight slowdown at low concurrency, but it could be the random
alignment of code with and without the patch; it was a small enough
fraction of a percent to be negligible, in my opinion. At higher
concurrency levels the patch showed significant performance
improvements. Besides the overall improvement in the median tps
numbers of several percent, there was significant mitigation of the
"performance collapse" phenomenon, where some runs were much slower
than others. It seems a clear win in terms of performance.

I've gotten through code review of the flexlock-v2.patch, and have
decided to post on that before I go through the
procarraylock-v1.patch code.

Not surprisingly, this patch was in good form and applied cleanly.
There were doc changes, and I can't see where any changes to the
tests are required. I liked the structure, and only found a few
nit-picky things to point out:

I didn't see why num_held_flexlocks and held_flexlocks had the
static keyword removed from their declarations.

FlexLockRemember() seems to have a pasto for a comment. Maybe
change to something like: "Add lock to list of locks held by this
backend."

In procarraylock.c there is this:

/* If there are no lockers, clar the critical PGPROC fields. */

s/clar/clear/

I have to admit I don't have my head around the extraWaits issue, so
I can't personally vouch for that code, although I have no reason to
doubt it, either. Everything else was something that I at least
*think* I understand, and it looked good to me.

I'm not changing the status until I get through the other patch
file, which should be tomorrow.

-Kevin

#29

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 14 years ago

In reply to: Kevin Grittner (#28)

Re: FlexLocks

"Kevin Grittner" wrote:

Robert Haas wrote:

Updated patches attached.

I have to admit I don't have my head around the extraWaits issue,
so I can't personally vouch for that code, although I have no
reason to doubt it, either. Everything else was something that I at
least *think* I understand, and it looked good to me.

I'm not changing the status until I get through the other patch
file, which should be tomorrow.

Most of the procarraylock-v1.patch file was pretty straightforward,
but I have a few concerns.

Why is it OK to drop these lines from the else condition in
ProcArrayEndTransaction()?:

/* must be cleared with xid/xmin: */
proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;

The extraWaits code still looks like black magic to me, so unless
someone can point me in the right direction to really understand
that, I can't address whether it's OK.

The need to modify flexlock_internals.h and flexlock.c seems to me to
indicate a lack of desirable modularity here. The lower level object
type shouldn't need to know about each and every implementation of a
higher level type which uses it, particularly not compiled in like
that. It would be really nice if each of the higher level types
"registered" with flexlock at runtime, so that the areas modified at
the flexlock level in this patch file could be generic. Among other
things, this could allow extensions to use specialized types, which
seems possibly useful. Does that (or some other technique to address
the concern) seem feasible?

-Kevin

Import Notes

Resolved by subject fallback

#30

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Kevin Grittner (#29)

2 attachment(s)

Re: FlexLocks

On Wed, Nov 23, 2011 at 7:18 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Why is it OK to drop these lines from the else condition in
ProcArrayEndTransaction()?:

/* must be cleared with xid/xmin: */
proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;

It's probably not. Oops.

I believe the attached patch versions address your comments regarding
the flexlock patch as well; it is also rebased over the PGXACT patch,
which has since been committed.

The extraWaits code still looks like black magic to me, so unless
someone can point me in the right direction to really understand
that, I can't address whether it's OK.

I don't think I've changed the behavior, so it should be fine. The
idea is that something like this can happen:

1. Backend #1 does some action which will eventually cause some other
process to send it a wakeup (like adding itself to the wait-queue for
a heavyweight lock).
2. Before actually going to sleep, backend #1 tries to acquire an
LWLock. The LWLock is not immediately available, so it sleeps on its
process semaphore.
3. Backend #2 sees the shared memory state created in step one and
decides sends a wakeup to backend #1 (for example, because the lock is
now available).
4. Backend #1 receives the wakeup. It duly reacquires the spinlock
protecting the LWLock, sees that the LWLock is not available, releases
the spinlock, and goes back to sleep.
5. Backend #3 now releases the LWLock that backend #1 is trying to
acquire and, as backend #1 is first in line, it sends backend #1 a
wakeup.
6. Backend #1 now wakes up again, reacquires the spinlock, gets the
lwlock, releases the spinlock, does some stuff, and releases the
lwlock.
7. Backend #1, having now finished what it needed to do while holding
the lwlock, is ready to go to sleep and wait for the event that it
queued up for back in step #1. However, the wakeup for that event
*has already arrived* and was consumed by the LWLock machinery. So
when backend #1 goes to sleep, it's waiting for a wakeup that will
never arrive, because it already did arrive, and got eaten.

The solution is the "extraWaits" thing; in step #6, we remember that
we received an extra, useless wakeup in step #4 that we threw away.
To make up for having thrown away a wakeup someone else sent us in
step #3, we send ourselves a wakeup in step #6. That way, when we go
to sleep in step #7, we wake up immediately, just as we should.

The need to modify flexlock_internals.h and flexlock.c seems to me to
indicate a lack of desirable modularity here. The lower level object
type shouldn't need to know about each and every implementation of a
higher level type which uses it, particularly not compiled in like
that. It would be really nice if each of the higher level types
"registered" with flexlock at runtime, so that the areas modified at
the flexlock level in this patch file could be generic. Among other
things, this could allow extensions to use specialized types, which
seems possibly useful. Does that (or some other technique to address
the concern) seem feasible?

Possibly; let me think about that. I haven't addressed that in this version.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

flexlock-v3.patchapplication/octet-stream; name=flexlock-v3.patchDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 8dc3054..51b24d0 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -105,7 +105,7 @@ typedef struct pgssEntry
  */
 typedef struct pgssSharedState
 {
-	LWLockId	lock;			/* protects hashtable search/modification */
+	FlexLockId	lock;			/* protects hashtable search/modification */
 	int			query_size;		/* max query length in bytes */
 } pgssSharedState;
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d1e628f..8517b36 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6199,14 +6199,14 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
      </varlistentry>
 
      <varlistentry>
-      <term><varname>trace_lwlocks</varname> (<type>boolean</type>)</term>
+      <term><varname>trace_flexlocks</varname> (<type>boolean</type>)</term>
       <indexterm>
-       <primary><varname>trace_lwlocks</> configuration parameter</primary>
+       <primary><varname>trace_flexlocks</> configuration parameter</primary>
       </indexterm>
       <listitem>
        <para>
-        If on, emit information about lightweight lock usage.  Lightweight
-        locks are intended primarily to provide mutual exclusion of access
+        If on, emit information about FlexLock usage.  FlexLocks
+        are intended primarily to provide mutual exclusion of access
         to shared-memory data structures.
        </para>
        <para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b9dc1d2..98ed0d3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1724,49 +1724,49 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
       or kilobytes of memory used for an internal sort.</entry>
     </row>
     <row>
-     <entry>lwlock-acquire</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock has been acquired.
-      arg0 is the LWLock's ID.
-      arg1 is the requested lock mode, either exclusive or shared.</entry>
+     <entry>flexlock-acquire</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock has been acquired.
+      arg0 is the FlexLock's ID.
+      arg1 is the requested lock mode.</entry>
     </row>
     <row>
-     <entry>lwlock-release</entry>
-     <entry>(LWLockId)</entry>
-     <entry>Probe that fires when an LWLock has been released (but note
+     <entry>flexlock-release</entry>
+     <entry>(FlexLockId)</entry>
+     <entry>Probe that fires when a FlexLock has been released (but note
       that any released waiters have not yet been awakened).
-      arg0 is the LWLock's ID.</entry>
+      arg0 is the FlexLock's ID.</entry>
     </row>
     <row>
-     <entry>lwlock-wait-start</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock was not immediately available and
+     <entry>flexlock-wait-start</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock was not immediately available and
       a server process has begun to wait for the lock to become available.
-      arg0 is the LWLock's ID.
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
-     <entry>lwlock-wait-done</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
+     <entry>flexlock-wait-done</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
      <entry>Probe that fires when a server process has been released from its
-      wait for an LWLock (it does not actually have the lock yet).
-      arg0 is the LWLock's ID.
+      wait for an FlexLock (it does not actually have the lock yet).
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
-     <entry>lwlock-condacquire</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock was successfully acquired when the
-      caller specified no waiting.
-      arg0 is the LWLock's ID.
+     <entry>flexlock-condacquire</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock was successfully acquired when
+      the caller specified no waiting.
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
-     <entry>lwlock-condacquire-fail</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock was not successfully acquired when
-      the caller specified no waiting.
-      arg0 is the LWLock's ID.
+     <entry>flexlock-condacquire-fail</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock was not successfully acquired
+      when the caller specified no waiting.
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
@@ -1813,11 +1813,11 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
      <entry>unsigned int</entry>
     </row>
     <row>
-     <entry>LWLockId</entry>
+     <entry>FlexLockId</entry>
      <entry>int</entry>
     </row>
     <row>
-     <entry>LWLockMode</entry>
+     <entry>FlexLockMode</entry>
      <entry>int</entry>
     </row>
     <row>
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index f7caa34..09d5862 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -151,7 +151,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(bool));		/* page_dirty[] */
 	sz += MAXALIGN(nslots * sizeof(int));		/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));		/* page_lru_count[] */
-	sz += MAXALIGN(nslots * sizeof(LWLockId));	/* buffer_locks[] */
+	sz += MAXALIGN(nslots * sizeof(FlexLockId));		/* buffer_locks[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -161,7 +161,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLockId ctllock, const char *subdir)
+			  FlexLockId ctllock, const char *subdir)
 {
 	SlruShared	shared;
 	bool		found;
@@ -202,8 +202,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		offset += MAXALIGN(nslots * sizeof(int));
 		shared->page_lru_count = (int *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(int));
-		shared->buffer_locks = (LWLockId *) (ptr + offset);
-		offset += MAXALIGN(nslots * sizeof(LWLockId));
+		shared->buffer_locks = (FlexLockId *) (ptr + offset);
+		offset += MAXALIGN(nslots * sizeof(FlexLockId));
 
 		if (nlsns > 0)
 		{
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d2fecb1..943929b 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -326,9 +326,9 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 	proc->backendId = InvalidBackendId;
 	proc->databaseId = databaseid;
 	proc->roleId = owner;
-	proc->lwWaiting = false;
-	proc->lwExclusive = false;
-	proc->lwWaitLink = NULL;
+	proc->flWaitResult = 0;
+	proc->flWaitMode = 0;
+	proc->flWaitLink = NULL;
 	proc->waitLock = NULL;
 	proc->waitProcLock = NULL;
 	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index c383011..0da2ae5 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2248,7 +2248,7 @@ AbortTransaction(void)
 	 * Releasing LW locks is critical since we might try to grab them again
 	 * while cleaning up!
 	 */
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 
 	/* Clean up buffer I/O and buffer context locks, too */
 	AbortBufferIO();
@@ -4138,7 +4138,7 @@ AbortSubTransaction(void)
 	 * FIXME This may be incorrect --- Are there some locks we should keep?
 	 * Buffer locks, for example?  I don't think so but I'm not sure.
 	 */
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 
 	AbortBufferIO();
 	UnlockBuffers();
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 6bf2421..9ceee91 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -562,13 +562,13 @@ bootstrap_signals(void)
  * Begin shutdown of an auxiliary process.	This is approximately the equivalent
  * of ShutdownPostgres() in postinit.c.  We can't run transactions in an
  * auxiliary process, so most of the work of AbortTransaction() is not needed,
- * but we do need to make sure we've released any LWLocks we are holding.
+ * but we do need to make sure we've released any flex locks we are holding.
  * (This is only critical during an error exit.)
  */
 static void
 ShutdownAuxiliaryProcess(int code, Datum arg)
 {
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index cacedab..f33f573 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -176,9 +176,10 @@ BackgroundWriterMain(void)
 		/*
 		 * These operations are really just a minimal subset of
 		 * AbortTransaction().	We don't have very many resources to worry
-		 * about in bgwriter, but we do have LWLocks, buffers, and temp files.
+		 * about in bgwriter, but we do have flex locks, buffers, and temp
+		 * files.
 		 */
-		LWLockReleaseAll();
+		FlexLockReleaseAll();
 		AbortBufferIO();
 		UnlockBuffers();
 		/* buffer pins are released here: */
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e9ae1e8..49f07a7 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -281,9 +281,10 @@ CheckpointerMain(void)
 		/*
 		 * These operations are really just a minimal subset of
 		 * AbortTransaction().	We don't have very many resources to worry
-		 * about in checkpointer, but we do have LWLocks, buffers, and temp files.
+		 * about in checkpointer, but we do have flex locks, buffers, and temp
+		 * files.
 		 */
-		LWLockReleaseAll();
+		FlexLockReleaseAll();
 		AbortBufferIO();
 		UnlockBuffers();
 		/* buffer pins are released here: */
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 963189d..59d18eb 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -109,6 +109,7 @@
 #include "postmaster/syslogger.h"
 #include "replication/walsender.h"
 #include "storage/fd.h"
+#include "storage/flexlock_internals.h"
 #include "storage/ipc.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
@@ -404,8 +405,6 @@ typedef struct
 typedef int InheritableSocket;
 #endif
 
-typedef struct LWLock LWLock;	/* ugly kluge */
-
 /*
  * Structure contains all variables passed to exec:ed backends
  */
@@ -426,7 +425,7 @@ typedef struct
 	slock_t    *ShmemLock;
 	VariableCache ShmemVariableCache;
 	Backend    *ShmemBackendArray;
-	LWLock	   *LWLockArray;
+	FlexLock   *FlexLockArray;
 	slock_t    *ProcStructLock;
 	PROC_HDR   *ProcGlobal;
 	PGPROC	   *AuxiliaryProcs;
@@ -4676,7 +4675,6 @@ MaxLivePostmasterChildren(void)
  * functions
  */
 extern slock_t *ShmemLock;
-extern LWLock *LWLockArray;
 extern slock_t *ProcStructLock;
 extern PGPROC *AuxiliaryProcs;
 extern PMSignalData *PMSignalState;
@@ -4721,7 +4719,7 @@ save_backend_variables(BackendParameters *param, Port *port,
 	param->ShmemVariableCache = ShmemVariableCache;
 	param->ShmemBackendArray = ShmemBackendArray;
 
-	param->LWLockArray = LWLockArray;
+	param->FlexLockArray = FlexLockArray;
 	param->ProcStructLock = ProcStructLock;
 	param->ProcGlobal = ProcGlobal;
 	param->AuxiliaryProcs = AuxiliaryProcs;
@@ -4945,7 +4943,7 @@ restore_backend_variables(BackendParameters *param, Port *port)
 	ShmemVariableCache = param->ShmemVariableCache;
 	ShmemBackendArray = param->ShmemBackendArray;
 
-	LWLockArray = param->LWLockArray;
+	FlexLockArray = param->FlexLockArray;
 	ProcStructLock = param->ProcStructLock;
 	ProcGlobal = param->ProcGlobal;
 	AuxiliaryProcs = param->AuxiliaryProcs;
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 157728e..587443d 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -167,9 +167,9 @@ WalWriterMain(void)
 		/*
 		 * These operations are really just a minimal subset of
 		 * AbortTransaction().	We don't have very many resources to worry
-		 * about in walwriter, but we do have LWLocks, and perhaps buffers?
+		 * about in walwriter, but we do have flex locks, and perhaps buffers?
 		 */
-		LWLockReleaseAll();
+		FlexLockReleaseAll();
 		AbortBufferIO();
 		UnlockBuffers();
 		/* buffer pins are released here: */
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 71fe8c6..4c4959c 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -141,7 +141,7 @@ PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
 	{
 		BufferTag	newTag;		/* identity of requested block */
 		uint32		newHash;	/* hash value for newTag */
-		LWLockId	newPartitionLock;	/* buffer partition lock for it */
+		FlexLockId	newPartitionLock;	/* buffer partition lock for it */
 		int			buf_id;
 
 		/* create a tag so we can lookup the buffer */
@@ -514,10 +514,10 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 {
 	BufferTag	newTag;			/* identity of requested block */
 	uint32		newHash;		/* hash value for newTag */
-	LWLockId	newPartitionLock;		/* buffer partition lock for it */
+	FlexLockId	newPartitionLock;		/* buffer partition lock for it */
 	BufferTag	oldTag;			/* previous identity of selected buffer */
 	uint32		oldHash;		/* hash value for oldTag */
-	LWLockId	oldPartitionLock;		/* buffer partition lock for it */
+	FlexLockId	oldPartitionLock;		/* buffer partition lock for it */
 	BufFlags	oldFlags;
 	int			buf_id;
 	volatile BufferDesc *buf;
@@ -857,7 +857,7 @@ InvalidateBuffer(volatile BufferDesc *buf)
 {
 	BufferTag	oldTag;
 	uint32		oldHash;		/* hash value for oldTag */
-	LWLockId	oldPartitionLock;		/* buffer partition lock for it */
+	FlexLockId	oldPartitionLock;		/* buffer partition lock for it */
 	BufFlags	oldFlags;
 
 	/* Save the original buffer tag before dropping the spinlock */
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index bb8b832..a2c570a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -113,7 +113,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, SUBTRANSShmemSize());
 		size = add_size(size, TwoPhaseShmemSize());
 		size = add_size(size, MultiXactShmemSize());
-		size = add_size(size, LWLockShmemSize());
+		size = add_size(size, FlexLockShmemSize());
 		size = add_size(size, ProcArrayShmemSize());
 		size = add_size(size, BackendStatusShmemSize());
 		size = add_size(size, SInvalShmemSize());
@@ -179,7 +179,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	 * needed for InitShmemIndex.
 	 */
 	if (!IsUnderPostmaster)
-		CreateLWLocks();
+		CreateFlexLocks();
 
 	/*
 	 * Set up shmem.c index hashtable
diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index e12a854..3730e51 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/storage/lmgr
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o predicate.o
+OBJS = flexlock.o lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o \
+	predicate.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/storage/lmgr/flexlock.c b/src/backend/storage/lmgr/flexlock.c
new file mode 100644
index 0000000..f517589
--- /dev/null
+++ b/src/backend/storage/lmgr/flexlock.c
@@ -0,0 +1,352 @@
+/*-------------------------------------------------------------------------
+ *
+ * flexlock.c
+ *	  Low-level routines for managing flex locks.
+ *
+ * Flex locks are intended primarily to provide mutual exclusion of access
+ * to shared-memory data structures.  Most, but not all, flex locks are
+ * lightweight locks (LWLocks).  This file contains support routines that
+ * are used for all types of flex locks, including lwlocks.  User-level
+ * locking should be done with the full lock manager --- which depends on
+ * LWLocks to protect its shared state.
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/lmgr/flexlock.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "access/clog.h"
+#include "access/multixact.h"
+#include "access/subtrans.h"
+#include "commands/async.h"
+#include "storage/flexlock_internals.h"
+#include "storage/lwlock.h"
+#include "storage/predicate.h"
+#include "storage/proc.h"
+#include "storage/spin.h"
+#include "utils/elog.h"
+
+/*
+ * We use this structure to keep track of flex locks held, for release
+ * during error recovery.  The maximum size could be determined at runtime
+ * if necessary, but it seems unlikely that more than a few locks could
+ * ever be held simultaneously.
+ */
+#define MAX_SIMUL_FLEXLOCKS	100
+
+static int	num_held_flexlocks = 0;
+static FlexLockId held_flexlocks[MAX_SIMUL_FLEXLOCKS];
+
+static int	lock_addin_request = 0;
+static bool lock_addin_request_allowed = true;
+
+#ifdef LOCK_DEBUG
+bool		Trace_flexlocks = false;
+#endif
+
+/*
+ * This points to the array of FlexLocks in shared memory.  Backends inherit
+ * the pointer by fork from the postmaster (except in the EXEC_BACKEND case,
+ * where we have special measures to pass it down).
+ */
+FlexLockPadded *FlexLockArray = NULL;
+
+/* We use the ShmemLock spinlock to protect LWLockAssign */
+extern slock_t *ShmemLock;
+
+static void FlexLockInit(FlexLock *flex, char locktype);
+
+/*
+ * Compute number of FlexLocks to allocate.
+ */
+int
+NumFlexLocks(void)
+{
+	int			numLocks;
+
+	/*
+	 * Possibly this logic should be spread out among the affected modules,
+	 * the same way that shmem space estimation is done.  But for now, there
+	 * are few enough users of FlexLocks that we can get away with just keeping
+	 * the knowledge here.
+	 */
+
+	/* Predefined FlexLocks */
+	numLocks = (int) NumFixedFlexLocks;
+
+	/* bufmgr.c needs two for each shared buffer */
+	numLocks += 2 * NBuffers;
+
+	/* proc.c needs one for each backend or auxiliary process */
+	numLocks += MaxBackends + NUM_AUXILIARY_PROCS;
+
+	/* clog.c needs one per CLOG buffer */
+	numLocks += NUM_CLOG_BUFFERS;
+
+	/* subtrans.c needs one per SubTrans buffer */
+	numLocks += NUM_SUBTRANS_BUFFERS;
+
+	/* multixact.c needs two SLRU areas */
+	numLocks += NUM_MXACTOFFSET_BUFFERS + NUM_MXACTMEMBER_BUFFERS;
+
+	/* async.c needs one per Async buffer */
+	numLocks += NUM_ASYNC_BUFFERS;
+
+	/* predicate.c needs one per old serializable xid buffer */
+	numLocks += NUM_OLDSERXID_BUFFERS;
+
+	/*
+	 * Add any requested by loadable modules; for backwards-compatibility
+	 * reasons, allocate at least NUM_USER_DEFINED_FLEXLOCKS of them even if
+	 * there are no explicit requests.
+	 */
+	lock_addin_request_allowed = false;
+	numLocks += Max(lock_addin_request, NUM_USER_DEFINED_FLEXLOCKS);
+
+	return numLocks;
+}
+
+
+/*
+ * RequestAddinFlexLocks
+ *		Request that extra FlexLocks be allocated for use by
+ *		a loadable module.
+ *
+ * This is only useful if called from the _PG_init hook of a library that
+ * is loaded into the postmaster via shared_preload_libraries.	Once
+ * shared memory has been allocated, calls will be ignored.  (We could
+ * raise an error, but it seems better to make it a no-op, so that
+ * libraries containing such calls can be reloaded if needed.)
+ */
+void
+RequestAddinFlexLocks(int n)
+{
+	if (IsUnderPostmaster || !lock_addin_request_allowed)
+		return;					/* too late */
+	lock_addin_request += n;
+}
+
+
+/*
+ * Compute shmem space needed for FlexLocks.
+ */
+Size
+FlexLockShmemSize(void)
+{
+	Size		size;
+	int			numLocks = NumFlexLocks();
+
+	/* Space for the FlexLock array. */
+	size = mul_size(numLocks, FLEX_LOCK_BYTES);
+
+	/* Space for dynamic allocation counter, plus room for alignment. */
+	size = add_size(size, 2 * sizeof(int) + FLEX_LOCK_BYTES);
+
+	return size;
+}
+
+/*
+ * Allocate shmem space for FlexLocks and initialize the locks.
+ */
+void
+CreateFlexLocks(void)
+{
+	int			numLocks = NumFlexLocks();
+	Size		spaceLocks = FlexLockShmemSize();
+	FlexLockPadded *lock;
+	int		   *FlexLockCounter;
+	char	   *ptr;
+	int			id;
+
+	/* Allocate and zero space */
+	ptr = (char *) ShmemAlloc(spaceLocks);
+	memset(ptr, 0, spaceLocks);
+
+	/* Leave room for dynamic allocation counter */
+	ptr += 2 * sizeof(int);
+
+	/* Ensure desired alignment of FlexLock array */
+	ptr += FLEX_LOCK_BYTES - ((uintptr_t) ptr) % FLEX_LOCK_BYTES;
+
+	FlexLockArray = (FlexLockPadded *) ptr;
+
+	/* All of the "fixed" FlexLocks are LWLocks. */
+	for (id = 0, lock = FlexLockArray; id < NumFixedFlexLocks; id++, lock++)
+		FlexLockInit(&lock->flex, FLEXLOCK_TYPE_LWLOCK);
+
+	/*
+	 * Initialize the dynamic-allocation counter, which is stored just before
+	 * the first FlexLock.
+	 */
+	FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+	FlexLockCounter[0] = (int) NumFixedFlexLocks;
+	FlexLockCounter[1] = numLocks;
+}
+
+/*
+ * FlexLockAssign - assign a dynamically-allocated FlexLock number
+ *
+ * We interlock this using the same spinlock that is used to protect
+ * ShmemAlloc().  Interlocking is not really necessary during postmaster
+ * startup, but it is needed if any user-defined code tries to allocate
+ * LWLocks after startup.
+ */
+FlexLockId
+FlexLockAssign(char locktype)
+{
+	FlexLockId	result;
+
+	/* use volatile pointer to prevent code rearrangement */
+	volatile int *FlexLockCounter;
+
+	FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+	SpinLockAcquire(ShmemLock);
+	if (FlexLockCounter[0] >= FlexLockCounter[1])
+	{
+		SpinLockRelease(ShmemLock);
+		elog(ERROR, "no more FlexLockIds available");
+	}
+	result = (FlexLockId) (FlexLockCounter[0]++);
+	SpinLockRelease(ShmemLock);
+
+	FlexLockInit(&FlexLockArray[result].flex, locktype);
+
+	return result;
+}
+
+/*
+ * Initialize a FlexLock.
+ */
+static void
+FlexLockInit(FlexLock *flex, char locktype)
+{
+	SpinLockInit(&flex->mutex);
+	flex->releaseOK = true;
+	flex->locktype = locktype;
+	/*
+	 * We might need to think a little harder about what should happen here
+	 * if some future type of FlexLock requires more initialization than this.
+	 * For now, this will suffice.
+	 */
+}
+
+/*
+ * Add lock to list of locks held by this backend.
+ */
+void
+FlexLockRemember(FlexLockId id)
+{
+	if (num_held_flexlocks >= MAX_SIMUL_FLEXLOCKS)
+		elog(PANIC, "too many FlexLocks taken");
+	held_flexlocks[num_held_flexlocks++] = id;
+}
+
+/*
+ * Remove lock from list of locks held.  Usually, but not always, it will
+ * be the latest-acquired lock; so search array backwards.
+ */
+void
+FlexLockForget(FlexLockId id)
+{
+	int			i;
+
+	for (i = num_held_flexlocks; --i >= 0;)
+	{
+		if (id == held_flexlocks[i])
+			break;
+	}
+	if (i < 0)
+		elog(ERROR, "lock %d is not held", (int) id);
+	num_held_flexlocks--;
+	for (; i < num_held_flexlocks; i++)
+		held_flexlocks[i] = held_flexlocks[i + 1];
+}
+
+/*
+ * FlexLockWait - wait until awakened
+ *
+ * Since we share the process wait semaphore with the regular lock manager
+ * and ProcWaitForSignal, and we may need to acquire a FlexLock while one of
+ * those is pending, it is possible that we get awakened for a reason other
+ * than being signaled by a FlexLock release.  If so, loop back and wait again.
+ *
+ * Returns the number of "extra" waits absorbed so that, once we've gotten the
+ * FlexLock, we can re-increment the sema by the number of additional signals
+ * received, so that the lock manager or signal manager will see the received
+ * signal when it next waits.
+ */
+int
+FlexLockWait(FlexLockId id, int mode)
+{
+	int		extraWaits = 0;
+
+	FlexLockDebug("LWLockAcquire", id, "waiting");
+	TRACE_POSTGRESQL_FLEXLOCK_WAIT_START(id, mode);
+
+	for (;;)
+   	{
+		/* "false" means cannot accept cancel/die interrupt here. */
+		PGSemaphoreLock(&MyProc->sem, false);
+		/*
+		 * FLEXTODO: I think we should return this, instead of ignoring it.
+		 * Any non-zero value means "wake up".
+		 */
+		if (MyProc->flWaitResult)
+			break;
+		extraWaits++;
+   	}
+
+	TRACE_POSTGRESQL_FLEXLOCK_WAIT_DONE(id, mode);
+	FlexLockDebug("LWLockAcquire", id, "awakened");
+
+	return extraWaits;
+}
+
+/*
+ * FlexLockReleaseAll - release all currently-held locks
+ *
+ * Used to clean up after ereport(ERROR). An important difference between this
+ * function and retail LWLockRelease calls is that InterruptHoldoffCount is
+ * unchanged by this operation.  This is necessary since InterruptHoldoffCount
+ * has been set to an appropriate level earlier in error recovery. We could
+ * decrement it below zero if we allow it to drop for each released lock!
+ */
+void
+FlexLockReleaseAll(void)
+{
+	while (num_held_flexlocks > 0)
+	{
+		HOLD_INTERRUPTS();		/* match the upcoming RESUME_INTERRUPTS */
+
+		/*
+		 * FLEXTODO: When we have multiple types of flex locks, this will
+		 * need to call the appropriate release function for each lock type.
+		 */
+		LWLockRelease(held_flexlocks[num_held_flexlocks - 1]);
+	}
+}
+
+/*
+ * FlexLockHeldByMe - test whether my process currently holds a lock
+ *
+ * This is meant as debug support only.  We do not consider the lock mode.
+ */
+bool
+FlexLockHeldByMe(FlexLockId id)
+{
+	int			i;
+
+	for (i = 0; i < num_held_flexlocks; i++)
+	{
+		if (held_flexlocks[i] == id)
+			return true;
+	}
+	return false;
+}
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 3ba4671..f594983 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -591,7 +591,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	bool		found;
 	ResourceOwner owner;
 	uint32		hashcode;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	int			status;
 	bool		log_lock = false;
 
@@ -1546,7 +1546,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	LOCALLOCK  *locallock;
 	LOCK	   *lock;
 	PROCLOCK   *proclock;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	bool		wakeupNeeded;
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
@@ -1912,7 +1912,7 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 	 */
 	for (partition = 0; partition < NUM_LOCK_PARTITIONS; partition++)
 	{
-		LWLockId	partitionLock = FirstLockMgrLock + partition;
+		FlexLockId	partitionLock = FirstLockMgrLock + partition;
 		SHM_QUEUE  *procLocks = &(MyProc->myProcLocks[partition]);
 
 		proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks,
@@ -2197,7 +2197,7 @@ static bool
 FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag,
 					  uint32 hashcode)
 {
-	LWLockId		partitionLock = LockHashPartitionLock(hashcode);
+	FlexLockId		partitionLock = LockHashPartitionLock(hashcode);
 	Oid				relid = locktag->locktag_field2;
 	uint32			i;
 
@@ -2281,7 +2281,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	LockMethod		lockMethodTable = LockMethods[DEFAULT_LOCKMETHOD];
 	LOCKTAG		   *locktag = &locallock->tag.lock;
 	PROCLOCK	   *proclock = NULL;
-	LWLockId		partitionLock = LockHashPartitionLock(locallock->hashcode);
+	FlexLockId		partitionLock = LockHashPartitionLock(locallock->hashcode);
 	Oid				relid = locktag->locktag_field2;
 	uint32			f;
 
@@ -2382,7 +2382,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
 	SHM_QUEUE  *procLocks;
 	PROCLOCK   *proclock;
 	uint32		hashcode;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	int			count = 0;
 	int			fast_count = 0;
 
@@ -2593,7 +2593,7 @@ LockRefindAndRelease(LockMethod lockMethodTable, PGPROC *proc,
 	PROCLOCKTAG proclocktag;
 	uint32		hashcode;
 	uint32		proclock_hashcode;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	bool		wakeupNeeded;
 
 	hashcode = LockTagHashCode(locktag);
@@ -2827,7 +2827,7 @@ PostPrepare_Locks(TransactionId xid)
 	 */
 	for (partition = 0; partition < NUM_LOCK_PARTITIONS; partition++)
 	{
-		LWLockId	partitionLock = FirstLockMgrLock + partition;
+		FlexLockId	partitionLock = FirstLockMgrLock + partition;
 		SHM_QUEUE  *procLocks = &(MyProc->myProcLocks[partition]);
 
 		proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks,
@@ -3343,7 +3343,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 	uint32		hashcode;
 	uint32		proclock_hashcode;
 	int			partition;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	LockMethod	lockMethodTable;
 
 	Assert(len == sizeof(TwoPhaseLockRecord));
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 079eb29..ce6c931 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -21,74 +21,23 @@
  */
 #include "postgres.h"
 
-#include "access/clog.h"
-#include "access/multixact.h"
-#include "access/subtrans.h"
-#include "commands/async.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "storage/flexlock_internals.h"
 #include "storage/ipc.h"
-#include "storage/predicate.h"
 #include "storage/proc.h"
 #include "storage/spin.h"
 
-
-/* We use the ShmemLock spinlock to protect LWLockAssign */
-extern slock_t *ShmemLock;
-
-
 typedef struct LWLock
 {
-	slock_t		mutex;			/* Protects LWLock and queue of PGPROCs */
-	bool		releaseOK;		/* T if ok to release waiters */
+	FlexLock	flex;			/* common FlexLock infrastructure */
 	char		exclusive;		/* # of exclusive holders (0 or 1) */
 	int			shared;			/* # of shared holders (0..MaxBackends) */
-	PGPROC	   *head;			/* head of list of waiting PGPROCs */
-	PGPROC	   *tail;			/* tail of list of waiting PGPROCs */
-	/* tail is undefined when head is NULL */
 } LWLock;
 
-/*
- * All the LWLock structs are allocated as an array in shared memory.
- * (LWLockIds are indexes into the array.)	We force the array stride to
- * be a power of 2, which saves a few cycles in indexing, but more
- * importantly also ensures that individual LWLocks don't cross cache line
- * boundaries.	This reduces cache contention problems, especially on AMD
- * Opterons.  (Of course, we have to also ensure that the array start
- * address is suitably aligned.)
- *
- * LWLock is between 16 and 32 bytes on all known platforms, so these two
- * cases are sufficient.
- */
-#define LWLOCK_PADDED_SIZE	(sizeof(LWLock) <= 16 ? 16 : 32)
-
-typedef union LWLockPadded
-{
-	LWLock		lock;
-	char		pad[LWLOCK_PADDED_SIZE];
-} LWLockPadded;
-
-/*
- * This points to the array of LWLocks in shared memory.  Backends inherit
- * the pointer by fork from the postmaster (except in the EXEC_BACKEND case,
- * where we have special measures to pass it down).
- */
-NON_EXEC_STATIC LWLockPadded *LWLockArray = NULL;
-
-
-/*
- * We use this structure to keep track of locked LWLocks for release
- * during error recovery.  The maximum size could be determined at runtime
- * if necessary, but it seems unlikely that more than a few locks could
- * ever be held simultaneously.
- */
-#define MAX_SIMUL_LWLOCKS	100
-
-static int	num_held_lwlocks = 0;
-static LWLockId held_lwlocks[MAX_SIMUL_LWLOCKS];
-
-static int	lock_addin_request = 0;
-static bool lock_addin_request_allowed = true;
+#define	LWLockPointer(lockid) \
+	(AssertMacro(FlexLockArray[lockid].flex.locktype == FLEXLOCK_TYPE_LWLOCK), \
+	 (volatile LWLock *) &FlexLockArray[lockid])
 
 #ifdef LWLOCK_STATS
 static int	counts_for_pid = 0;
@@ -98,27 +47,17 @@ static int *block_counts;
 #endif
 
 #ifdef LOCK_DEBUG
-bool		Trace_lwlocks = false;
-
 inline static void
-PRINT_LWDEBUG(const char *where, LWLockId lockid, const volatile LWLock *lock)
+PRINT_LWDEBUG(const char *where, FlexLockId lockid, const volatile LWLock *lock)
 {
-	if (Trace_lwlocks)
+	if (Trace_flexlocks)
 		elog(LOG, "%s(%d): excl %d shared %d head %p rOK %d",
 			 where, (int) lockid,
-			 (int) lock->exclusive, lock->shared, lock->head,
-			 (int) lock->releaseOK);
-}
-
-inline static void
-LOG_LWDEBUG(const char *where, LWLockId lockid, const char *msg)
-{
-	if (Trace_lwlocks)
-		elog(LOG, "%s(%d): %s", where, (int) lockid, msg);
+			 (int) lock->exclusive, lock->shared, lock->flex.head,
+			 (int) lock->flex.releaseOK);
 }
 #else							/* not LOCK_DEBUG */
 #define PRINT_LWDEBUG(a,b,c)
-#define LOG_LWDEBUG(a,b,c)
 #endif   /* LOCK_DEBUG */
 
 #ifdef LWLOCK_STATS
@@ -127,8 +66,8 @@ static void
 print_lwlock_stats(int code, Datum arg)
 {
 	int			i;
-	int		   *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	int			numLocks = LWLockCounter[1];
+	int		   *FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+	int			numLocks = FlexLockCounter[1];
 
 	/* Grab an LWLock to keep different backends from mixing reports */
 	LWLockAcquire(0, LW_EXCLUSIVE);
@@ -145,173 +84,15 @@ print_lwlock_stats(int code, Datum arg)
 }
 #endif   /* LWLOCK_STATS */
 
-
 /*
- * Compute number of LWLocks to allocate.
+ * LWLockAssign - initialize a new lwlock and return its ID
  */
-int
-NumLWLocks(void)
-{
-	int			numLocks;
-
-	/*
-	 * Possibly this logic should be spread out among the affected modules,
-	 * the same way that shmem space estimation is done.  But for now, there
-	 * are few enough users of LWLocks that we can get away with just keeping
-	 * the knowledge here.
-	 */
-
-	/* Predefined LWLocks */
-	numLocks = (int) NumFixedLWLocks;
-
-	/* bufmgr.c needs two for each shared buffer */
-	numLocks += 2 * NBuffers;
-
-	/* proc.c needs one for each backend or auxiliary process */
-	numLocks += MaxBackends + NUM_AUXILIARY_PROCS;
-
-	/* clog.c needs one per CLOG buffer */
-	numLocks += NUM_CLOG_BUFFERS;
-
-	/* subtrans.c needs one per SubTrans buffer */
-	numLocks += NUM_SUBTRANS_BUFFERS;
-
-	/* multixact.c needs two SLRU areas */
-	numLocks += NUM_MXACTOFFSET_BUFFERS + NUM_MXACTMEMBER_BUFFERS;
-
-	/* async.c needs one per Async buffer */
-	numLocks += NUM_ASYNC_BUFFERS;
-
-	/* predicate.c needs one per old serializable xid buffer */
-	numLocks += NUM_OLDSERXID_BUFFERS;
-
-	/*
-	 * Add any requested by loadable modules; for backwards-compatibility
-	 * reasons, allocate at least NUM_USER_DEFINED_LWLOCKS of them even if
-	 * there are no explicit requests.
-	 */
-	lock_addin_request_allowed = false;
-	numLocks += Max(lock_addin_request, NUM_USER_DEFINED_LWLOCKS);
-
-	return numLocks;
-}
-
-
-/*
- * RequestAddinLWLocks
- *		Request that extra LWLocks be allocated for use by
- *		a loadable module.
- *
- * This is only useful if called from the _PG_init hook of a library that
- * is loaded into the postmaster via shared_preload_libraries.	Once
- * shared memory has been allocated, calls will be ignored.  (We could
- * raise an error, but it seems better to make it a no-op, so that
- * libraries containing such calls can be reloaded if needed.)
- */
-void
-RequestAddinLWLocks(int n)
-{
-	if (IsUnderPostmaster || !lock_addin_request_allowed)
-		return;					/* too late */
-	lock_addin_request += n;
-}
-
-
-/*
- * Compute shmem space needed for LWLocks.
- */
-Size
-LWLockShmemSize(void)
-{
-	Size		size;
-	int			numLocks = NumLWLocks();
-
-	/* Space for the LWLock array. */
-	size = mul_size(numLocks, sizeof(LWLockPadded));
-
-	/* Space for dynamic allocation counter, plus room for alignment. */
-	size = add_size(size, 2 * sizeof(int) + LWLOCK_PADDED_SIZE);
-
-	return size;
-}
-
-
-/*
- * Allocate shmem space for LWLocks and initialize the locks.
- */
-void
-CreateLWLocks(void)
-{
-	int			numLocks = NumLWLocks();
-	Size		spaceLocks = LWLockShmemSize();
-	LWLockPadded *lock;
-	int		   *LWLockCounter;
-	char	   *ptr;
-	int			id;
-
-	/* Allocate space */
-	ptr = (char *) ShmemAlloc(spaceLocks);
-
-	/* Leave room for dynamic allocation counter */
-	ptr += 2 * sizeof(int);
-
-	/* Ensure desired alignment of LWLock array */
-	ptr += LWLOCK_PADDED_SIZE - ((uintptr_t) ptr) % LWLOCK_PADDED_SIZE;
-
-	LWLockArray = (LWLockPadded *) ptr;
-
-	/*
-	 * Initialize all LWLocks to "unlocked" state
-	 */
-	for (id = 0, lock = LWLockArray; id < numLocks; id++, lock++)
-	{
-		SpinLockInit(&lock->lock.mutex);
-		lock->lock.releaseOK = true;
-		lock->lock.exclusive = 0;
-		lock->lock.shared = 0;
-		lock->lock.head = NULL;
-		lock->lock.tail = NULL;
-	}
-
-	/*
-	 * Initialize the dynamic-allocation counter, which is stored just before
-	 * the first LWLock.
-	 */
-	LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	LWLockCounter[0] = (int) NumFixedLWLocks;
-	LWLockCounter[1] = numLocks;
-}
-
-
-/*
- * LWLockAssign - assign a dynamically-allocated LWLock number
- *
- * We interlock this using the same spinlock that is used to protect
- * ShmemAlloc().  Interlocking is not really necessary during postmaster
- * startup, but it is needed if any user-defined code tries to allocate
- * LWLocks after startup.
- */
-LWLockId
+FlexLockId
 LWLockAssign(void)
 {
-	LWLockId	result;
-
-	/* use volatile pointer to prevent code rearrangement */
-	volatile int *LWLockCounter;
-
-	LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	SpinLockAcquire(ShmemLock);
-	if (LWLockCounter[0] >= LWLockCounter[1])
-	{
-		SpinLockRelease(ShmemLock);
-		elog(ERROR, "no more LWLockIds available");
-	}
-	result = (LWLockId) (LWLockCounter[0]++);
-	SpinLockRelease(ShmemLock);
-	return result;
+	return FlexLockAssign(FLEXLOCK_TYPE_LWLOCK);
 }
 
-
 /*
  * LWLockAcquire - acquire a lightweight lock in the specified mode
  *
@@ -320,9 +101,9 @@ LWLockAssign(void)
  * Side effect: cancel/die interrupts are held off until lock release.
  */
 void
-LWLockAcquire(LWLockId lockid, LWLockMode mode)
+LWLockAcquire(FlexLockId lockid, LWLockMode mode)
 {
-	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	volatile LWLock *lock = LWLockPointer(lockid);
 	PGPROC	   *proc = MyProc;
 	bool		retry = false;
 	int			extraWaits = 0;
@@ -333,8 +114,8 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 	/* Set up local count state first time through in a given process */
 	if (counts_for_pid != MyProcPid)
 	{
-		int		   *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-		int			numLocks = LWLockCounter[1];
+		int		   *FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+		int			numLocks = FlexLockCounter[1];
 
 		sh_acquire_counts = calloc(numLocks, sizeof(int));
 		ex_acquire_counts = calloc(numLocks, sizeof(int));
@@ -356,10 +137,6 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 	 */
 	Assert(!(proc == NULL && IsUnderPostmaster));
 
-	/* Ensure we will have room to remember the lock */
-	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
-		elog(ERROR, "too many LWLocks taken");
-
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -388,11 +165,11 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 		bool		mustwait;
 
 		/* Acquire mutex.  Time spent holding mutex should be short! */
-		SpinLockAcquire(&lock->mutex);
+		SpinLockAcquire(&lock->flex.mutex);
 
 		/* If retrying, allow LWLockRelease to release waiters again */
 		if (retry)
-			lock->releaseOK = true;
+			lock->flex.releaseOK = true;
 
 		/* If I can get the lock, do so quickly. */
 		if (mode == LW_EXCLUSIVE)
@@ -419,72 +196,30 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 		if (!mustwait)
 			break;				/* got the lock */
 
-		/*
-		 * Add myself to wait queue.
-		 *
-		 * If we don't have a PGPROC structure, there's no way to wait. This
-		 * should never occur, since MyProc should only be null during shared
-		 * memory initialization.
-		 */
-		if (proc == NULL)
-			elog(PANIC, "cannot wait without a PGPROC structure");
-
-		proc->lwWaiting = true;
-		proc->lwExclusive = (mode == LW_EXCLUSIVE);
-		proc->lwWaitLink = NULL;
-		if (lock->head == NULL)
-			lock->head = proc;
-		else
-			lock->tail->lwWaitLink = proc;
-		lock->tail = proc;
+		/* Add myself to wait queue. */
+		FlexLockJoinWaitQueue(lock, (int) mode);
 
 		/* Can release the mutex now */
-		SpinLockRelease(&lock->mutex);
-
-		/*
-		 * Wait until awakened.
-		 *
-		 * Since we share the process wait semaphore with the regular lock
-		 * manager and ProcWaitForSignal, and we may need to acquire an LWLock
-		 * while one of those is pending, it is possible that we get awakened
-		 * for a reason other than being signaled by LWLockRelease. If so,
-		 * loop back and wait again.  Once we've gotten the LWLock,
-		 * re-increment the sema by the number of additional signals received,
-		 * so that the lock manager or signal manager will see the received
-		 * signal when it next waits.
-		 */
-		LOG_LWDEBUG("LWLockAcquire", lockid, "waiting");
+		SpinLockRelease(&lock->flex.mutex);
+
+		/* Wait until awakened. */
+		extraWaits += FlexLockWait(lockid, mode);
 
 #ifdef LWLOCK_STATS
 		block_counts[lockid]++;
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);
-
-		for (;;)
-		{
-			/* "false" means cannot accept cancel/die interrupt here. */
-			PGSemaphoreLock(&proc->sem, false);
-			if (!proc->lwWaiting)
-				break;
-			extraWaits++;
-		}
-
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode);
-
-		LOG_LWDEBUG("LWLockAcquire", lockid, "awakened");
-
 		/* Now loop back and try to acquire lock again. */
 		retry = true;
 	}
 
 	/* We are done updating shared state of the lock itself. */
-	SpinLockRelease(&lock->mutex);
+	SpinLockRelease(&lock->flex.mutex);
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(lockid, mode);
+	TRACE_POSTGRESQL_FLEXLOCK_ACQUIRE(lockid, mode);
 
 	/* Add lock to list of locks held by this backend */
-	held_lwlocks[num_held_lwlocks++] = lockid;
+	FlexLockRemember(lockid);
 
 	/*
 	 * Fix the process wait semaphore's count for any absorbed wakeups.
@@ -501,17 +236,13 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
  * If successful, cancel/die interrupts are held off until lock release.
  */
 bool
-LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
+LWLockConditionalAcquire(FlexLockId lockid, LWLockMode mode)
 {
-	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	volatile LWLock *lock = LWLockPointer(lockid);
 	bool		mustwait;
 
 	PRINT_LWDEBUG("LWLockConditionalAcquire", lockid, lock);
 
-	/* Ensure we will have room to remember the lock */
-	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
-		elog(ERROR, "too many LWLocks taken");
-
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -520,7 +251,7 @@ LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
 	HOLD_INTERRUPTS();
 
 	/* Acquire mutex.  Time spent holding mutex should be short! */
-	SpinLockAcquire(&lock->mutex);
+	SpinLockAcquire(&lock->flex.mutex);
 
 	/* If I can get the lock, do so quickly. */
 	if (mode == LW_EXCLUSIVE)
@@ -545,20 +276,20 @@ LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
 	}
 
 	/* We are done updating shared state of the lock itself. */
-	SpinLockRelease(&lock->mutex);
+	SpinLockRelease(&lock->flex.mutex);
 
 	if (mustwait)
 	{
 		/* Failed to get lock, so release interrupt holdoff */
 		RESUME_INTERRUPTS();
-		LOG_LWDEBUG("LWLockConditionalAcquire", lockid, "failed");
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(lockid, mode);
+		FlexLockDebug("LWLockConditionalAcquire", lockid, "failed");
+		TRACE_POSTGRESQL_FLEXLOCK_CONDACQUIRE_FAIL(lockid, mode);
 	}
 	else
 	{
 		/* Add lock to list of locks held by this backend */
-		held_lwlocks[num_held_lwlocks++] = lockid;
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(lockid, mode);
+		FlexLockRemember(lockid);
+		TRACE_POSTGRESQL_FLEXLOCK_CONDACQUIRE(lockid, mode);
 	}
 
 	return !mustwait;
@@ -568,32 +299,18 @@ LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
  * LWLockRelease - release a previously acquired lock
  */
 void
-LWLockRelease(LWLockId lockid)
+LWLockRelease(FlexLockId lockid)
 {
-	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	volatile LWLock *lock = LWLockPointer(lockid);
 	PGPROC	   *head;
 	PGPROC	   *proc;
-	int			i;
 
 	PRINT_LWDEBUG("LWLockRelease", lockid, lock);
 
-	/*
-	 * Remove lock from list of locks held.  Usually, but not always, it will
-	 * be the latest-acquired lock; so search array backwards.
-	 */
-	for (i = num_held_lwlocks; --i >= 0;)
-	{
-		if (lockid == held_lwlocks[i])
-			break;
-	}
-	if (i < 0)
-		elog(ERROR, "lock %d is not held", (int) lockid);
-	num_held_lwlocks--;
-	for (; i < num_held_lwlocks; i++)
-		held_lwlocks[i] = held_lwlocks[i + 1];
+	FlexLockForget(lockid);
 
 	/* Acquire mutex.  Time spent holding mutex should be short! */
-	SpinLockAcquire(&lock->mutex);
+	SpinLockAcquire(&lock->flex.mutex);
 
 	/* Release my hold on lock */
 	if (lock->exclusive > 0)
@@ -610,10 +327,10 @@ LWLockRelease(LWLockId lockid)
 	 * if someone has already awakened waiters that haven't yet acquired the
 	 * lock.
 	 */
-	head = lock->head;
+	head = lock->flex.head;
 	if (head != NULL)
 	{
-		if (lock->exclusive == 0 && lock->shared == 0 && lock->releaseOK)
+		if (lock->exclusive == 0 && lock->shared == 0 && lock->flex.releaseOK)
 		{
 			/*
 			 * Remove the to-be-awakened PGPROCs from the queue.  If the front
@@ -621,17 +338,17 @@ LWLockRelease(LWLockId lockid)
 			 * as many waiters as want shared access.
 			 */
 			proc = head;
-			if (!proc->lwExclusive)
+			if (proc->flWaitMode != LW_EXCLUSIVE)
 			{
-				while (proc->lwWaitLink != NULL &&
-					   !proc->lwWaitLink->lwExclusive)
-					proc = proc->lwWaitLink;
+				while (proc->flWaitLink != NULL &&
+					   proc->flWaitLink->flWaitMode != LW_EXCLUSIVE)
+					proc = proc->flWaitLink;
 			}
 			/* proc is now the last PGPROC to be released */
-			lock->head = proc->lwWaitLink;
-			proc->lwWaitLink = NULL;
+			lock->flex.head = proc->flWaitLink;
+			proc->flWaitLink = NULL;
 			/* prevent additional wakeups until retryer gets to run */
-			lock->releaseOK = false;
+			lock->flex.releaseOK = false;
 		}
 		else
 		{
@@ -641,20 +358,20 @@ LWLockRelease(LWLockId lockid)
 	}
 
 	/* We are done updating shared state of the lock itself. */
-	SpinLockRelease(&lock->mutex);
+	SpinLockRelease(&lock->flex.mutex);
 
-	TRACE_POSTGRESQL_LWLOCK_RELEASE(lockid);
+	TRACE_POSTGRESQL_FLEXLOCK_RELEASE(lockid);
 
 	/*
 	 * Awaken any waiters I removed from the queue.
 	 */
 	while (head != NULL)
 	{
-		LOG_LWDEBUG("LWLockRelease", lockid, "release waiter");
+		FlexLockDebug("LWLockRelease", lockid, "release waiter");
 		proc = head;
-		head = proc->lwWaitLink;
-		proc->lwWaitLink = NULL;
-		proc->lwWaiting = false;
+		head = proc->flWaitLink;
+		proc->flWaitLink = NULL;
+		proc->flWaitResult = 1;		/* any non-zero value will do */
 		PGSemaphoreUnlock(&proc->sem);
 	}
 
@@ -664,43 +381,17 @@ LWLockRelease(LWLockId lockid)
 	RESUME_INTERRUPTS();
 }
 
-
-/*
- * LWLockReleaseAll - release all currently-held locks
- *
- * Used to clean up after ereport(ERROR). An important difference between this
- * function and retail LWLockRelease calls is that InterruptHoldoffCount is
- * unchanged by this operation.  This is necessary since InterruptHoldoffCount
- * has been set to an appropriate level earlier in error recovery. We could
- * decrement it below zero if we allow it to drop for each released lock!
- */
-void
-LWLockReleaseAll(void)
-{
-	while (num_held_lwlocks > 0)
-	{
-		HOLD_INTERRUPTS();		/* match the upcoming RESUME_INTERRUPTS */
-
-		LWLockRelease(held_lwlocks[num_held_lwlocks - 1]);
-	}
-}
-
-
 /*
  * LWLockHeldByMe - test whether my process currently holds a lock
  *
- * This is meant as debug support only.  We do not distinguish whether the
- * lock is held shared or exclusive.
+ * The following convenience routine might not be worthwhile but for the fact
+ * that we've had a function by this name since long before FlexLocks existed.
+ * Callers who want to check whether an arbitrary FlexLock (that may or may not
+ * be an LWLock) is held can use FlexLockHeldByMe directly.
  */
 bool
-LWLockHeldByMe(LWLockId lockid)
+LWLockHeldByMe(FlexLockId lockid)
 {
-	int			i;
-
-	for (i = 0; i < num_held_lwlocks; i++)
-	{
-		if (held_lwlocks[i] == lockid)
-			return true;
-	}
-	return false;
+	AssertMacro(FlexLockArray[lockid].flex.locktype == FLEXLOCK_TYPE_LWLOCK);
+	return FlexLockHeldByMe(lockid);
 }
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 345f6f5..15978a4 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -239,7 +239,7 @@
 #define PredicateLockHashPartition(hashcode) \
 	((hashcode) % NUM_PREDICATELOCK_PARTITIONS)
 #define PredicateLockHashPartitionLock(hashcode) \
-	((LWLockId) (FirstPredicateLockMgrLock + PredicateLockHashPartition(hashcode)))
+	((FlexLockId) (FirstPredicateLockMgrLock + PredicateLockHashPartition(hashcode)))
 
 #define NPREDICATELOCKTARGETENTS() \
 	mul_size(max_predicate_locks_per_xact, add_size(MaxBackends, max_prepared_xacts))
@@ -1840,7 +1840,7 @@ PageIsPredicateLocked(Relation relation, BlockNumber blkno)
 {
 	PREDICATELOCKTARGETTAG targettag;
 	uint32		targettaghash;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	PREDICATELOCKTARGET *target;
 
 	SET_PREDICATELOCKTARGETTAG_PAGE(targettag,
@@ -2073,7 +2073,7 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 		if (TargetTagIsCoveredBy(oldtargettag, *newtargettag))
 		{
 			uint32		oldtargettaghash;
-			LWLockId	partitionLock;
+			FlexLockId	partitionLock;
 			PREDICATELOCK *rmpredlock;
 
 			oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
@@ -2285,7 +2285,7 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	PREDICATELOCKTARGET *target;
 	PREDICATELOCKTAG locktag;
 	PREDICATELOCK *lock;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	bool		found;
 
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
@@ -2586,10 +2586,10 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 								  bool removeOld)
 {
 	uint32		oldtargettaghash;
-	LWLockId	oldpartitionLock;
+	FlexLockId	oldpartitionLock;
 	PREDICATELOCKTARGET *oldtarget;
 	uint32		newtargettaghash;
-	LWLockId	newpartitionLock;
+	FlexLockId	newpartitionLock;
 	bool		found;
 	bool		outOfShmem = false;
 
@@ -3578,7 +3578,7 @@ ClearOldPredicateLocks(void)
 			PREDICATELOCKTARGET *target;
 			PREDICATELOCKTARGETTAG targettag;
 			uint32		targettaghash;
-			LWLockId	partitionLock;
+			FlexLockId	partitionLock;
 
 			tag = predlock->tag;
 			target = tag.myTarget;
@@ -3656,7 +3656,7 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 		PREDICATELOCKTARGET *target;
 		PREDICATELOCKTARGETTAG targettag;
 		uint32		targettaghash;
-		LWLockId	partitionLock;
+		FlexLockId	partitionLock;
 
 		nextpredlock = (PREDICATELOCK *)
 			SHMQueueNext(&(sxact->predicateLocks),
@@ -4034,7 +4034,7 @@ static void
 CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 {
 	uint32		targettaghash;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	PREDICATELOCKTARGET *target;
 	PREDICATELOCK *predlock;
 	PREDICATELOCK *mypredlock = NULL;
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index bcbc802..db01e9d 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -360,9 +360,9 @@ InitProcess(void)
 	/* NB -- autovac launcher intentionally does not set IS_AUTOVACUUM */
 	if (IsAutoVacuumWorkerProcess())
 		MyPgXact->vacuumFlags |= PROC_IS_AUTOVACUUM;
-	MyProc->lwWaiting = false;
-	MyProc->lwExclusive = false;
-	MyProc->lwWaitLink = NULL;
+	MyProc->flWaitResult = 0;
+	MyProc->flWaitMode = 0;
+	MyProc->flWaitLink = NULL;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
 #ifdef USE_ASSERT_CHECKING
@@ -515,9 +515,9 @@ InitAuxiliaryProcess(void)
 	MyProc->roleId = InvalidOid;
 	MyPgXact->inCommit = false;
 	MyPgXact->vacuumFlags = 0;
-	MyProc->lwWaiting = false;
-	MyProc->lwExclusive = false;
-	MyProc->lwWaitLink = NULL;
+	MyProc->flWaitMode = 0;
+	MyProc->flWaitResult = 0;
+	MyProc->flWaitLink = NULL;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
 #ifdef USE_ASSERT_CHECKING
@@ -643,7 +643,7 @@ IsWaitingForLock(void)
 void
 LockWaitCancel(void)
 {
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 
 	/* Nothing to do if we weren't waiting for a lock */
 	if (lockAwaited == NULL)
@@ -754,11 +754,11 @@ ProcKill(int code, Datum arg)
 #endif
 
 	/*
-	 * Release any LW locks I am holding.  There really shouldn't be any, but
-	 * it's cheap to check again before we cut the knees off the LWLock
+	 * Release any felx locks I am holding.  There really shouldn't be any, but
+	 * it's cheap to check again before we cut the knees off the flex lock
 	 * facility by releasing our PGPROC ...
 	 */
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 
 	/* Release ownership of the process's latch, too */
 	DisownLatch(&MyProc->procLatch);
@@ -815,8 +815,8 @@ AuxiliaryProcKill(int code, Datum arg)
 
 	Assert(MyProc == auxproc);
 
-	/* Release any LW locks I am holding (see notes above) */
-	LWLockReleaseAll();
+	/* Release any flex locks I am holding (see notes above) */
+	FlexLockReleaseAll();
 
 	/* Release ownership of the process's latch, too */
 	DisownLatch(&MyProc->procLatch);
@@ -901,7 +901,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 	LOCK	   *lock = locallock->lock;
 	PROCLOCK   *proclock = locallock->proclock;
 	uint32		hashcode = locallock->hashcode;
-	LWLockId	partitionLock = LockHashPartitionLock(hashcode);
+	FlexLockId	partitionLock = LockHashPartitionLock(hashcode);
 	PROC_QUEUE *waitQueue = &(lock->waitProcs);
 	LOCKMASK	myHeldLocks = MyProc->heldLocks;
 	bool		early_deadlock = false;
diff --git a/src/backend/utils/misc/check_guc b/src/backend/utils/misc/check_guc
index 293fb03..1a19e36 100755
--- a/src/backend/utils/misc/check_guc
+++ b/src/backend/utils/misc/check_guc
@@ -19,7 +19,7 @@
 INTENTIONALLY_NOT_INCLUDED="autocommit debug_deadlocks \
 is_superuser lc_collate lc_ctype lc_messages lc_monetary lc_numeric lc_time \
 pre_auth_delay role seed server_encoding server_version server_version_int \
-session_authorization trace_lock_oidmin trace_lock_table trace_locks trace_lwlocks \
+session_authorization trace_lock_oidmin trace_lock_table trace_locks trace_flexlocks \
 trace_notify trace_userlocks transaction_isolation transaction_read_only \
 zero_damaged_pages"
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index da7b6d4..52de233 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -59,6 +59,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/flexlock_internals.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
 #include "storage/predicate.h"
@@ -1071,12 +1072,12 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 	{
-		{"trace_lwlocks", PGC_SUSET, DEVELOPER_OPTIONS,
+		{"trace_flexlocks", PGC_SUSET, DEVELOPER_OPTIONS,
 			gettext_noop("No description available."),
 			NULL,
 			GUC_NOT_IN_SAMPLE
 		},
-		&Trace_lwlocks,
+		&Trace_flexlocks,
 		false,
 		NULL, NULL, NULL
 	},
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 71c5ab0..5b9cfe6 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -15,8 +15,8 @@
  * in probe definitions, as they cause compilation errors on Mac OS X 10.5.
  */
 #define LocalTransactionId unsigned int
-#define LWLockId int
-#define LWLockMode int
+#define FlexLockId int
+#define FlexLockMode int
 #define LOCKMODE int
 #define BlockNumber unsigned int
 #define Oid unsigned int
@@ -29,12 +29,12 @@ provider postgresql {
 	probe transaction__commit(LocalTransactionId);
 	probe transaction__abort(LocalTransactionId);
 
-	probe lwlock__acquire(LWLockId, LWLockMode);
-	probe lwlock__release(LWLockId);
-	probe lwlock__wait__start(LWLockId, LWLockMode);
-	probe lwlock__wait__done(LWLockId, LWLockMode);
-	probe lwlock__condacquire(LWLockId, LWLockMode);
-	probe lwlock__condacquire__fail(LWLockId, LWLockMode);
+	probe flexlock__acquire(FlexLockId, FlexLockMode);
+	probe flexlock__release(FlexLockId);
+	probe flexlock__wait__start(FlexLockId, FlexLockMode);
+	probe flexlock__wait__done(FlexLockId, FlexLockMode);
+	probe flexlock__condacquire(FlexLockId, FlexLockMode);
+	probe flexlock__condacquire__fail(FlexLockId, FlexLockMode);
 
 	probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
 	probe lock__wait__done(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index e48743f..680a87f 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -55,7 +55,7 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLockId	ControlLock;
+	FlexLockId	ControlLock;
 
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
@@ -69,7 +69,7 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int		   *page_number;
 	int		   *page_lru_count;
-	LWLockId   *buffer_locks;
+	FlexLockId *buffer_locks;
 
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
@@ -136,7 +136,7 @@ typedef SlruCtlData *SlruCtl;
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLockId ctllock, const char *subdir);
+			  FlexLockId ctllock, const char *subdir);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int pageno);
 extern int SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				  TransactionId xid);
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 6c8e312..d3b74db 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -49,9 +49,9 @@
 #define SEQ_MINVALUE	(-SEQ_MAXVALUE)
 
 /*
- * Number of spare LWLocks to allocate for user-defined add-on code.
+ * Number of spare FlexLocks to allocate for user-defined add-on code.
  */
-#define NUM_USER_DEFINED_LWLOCKS	4
+#define NUM_USER_DEFINED_FLEXLOCKS	4
 
 /*
  * Define this if you want to allow the lo_import and lo_export SQL
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index b7d4ea5..ac7f665 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -103,7 +103,7 @@ typedef struct buftag
 #define BufTableHashPartition(hashcode) \
 	((hashcode) % NUM_BUFFER_PARTITIONS)
 #define BufMappingPartitionLock(hashcode) \
-	((LWLockId) (FirstBufMappingLock + BufTableHashPartition(hashcode)))
+	((FlexLockId) (FirstBufMappingLock + BufTableHashPartition(hashcode)))
 
 /*
  *	BufferDesc -- shared descriptor/state data for a single shared buffer.
@@ -143,8 +143,8 @@ typedef struct sbufdesc
 	int			buf_id;			/* buffer's index number (from 0) */
 	int			freeNext;		/* link in freelist chain */
 
-	LWLockId	io_in_progress_lock;	/* to wait for I/O to complete */
-	LWLockId	content_lock;	/* to lock access to buffer contents */
+	FlexLockId	io_in_progress_lock;	/* to wait for I/O to complete */
+	FlexLockId	content_lock;	/* to lock access to buffer contents */
 } BufferDesc;
 
 #define BufferDescriptorGetBuffer(bdesc) ((bdesc)->buf_id + 1)
diff --git a/src/include/storage/flexlock.h b/src/include/storage/flexlock.h
new file mode 100644
index 0000000..612c21a
--- /dev/null
+++ b/src/include/storage/flexlock.h
@@ -0,0 +1,102 @@
+/*-------------------------------------------------------------------------
+ *
+ * flexlock.h
+ *	  Flex lock manager
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/flexlock.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FLEXLOCK_H
+#define FLEXLOCK_H
+
+/*
+ * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
+ * here, but we need them to set up enum FlexLockId correctly, and having
+ * this file include lock.h or bufmgr.h would be backwards.
+ */
+
+/* Number of partitions of the shared buffer mapping hashtable */
+#define NUM_BUFFER_PARTITIONS  16
+
+/* Number of partitions the shared lock tables are divided into */
+#define LOG2_NUM_LOCK_PARTITIONS  4
+#define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)
+
+/* Number of partitions the shared predicate lock tables are divided into */
+#define LOG2_NUM_PREDICATELOCK_PARTITIONS  4
+#define NUM_PREDICATELOCK_PARTITIONS  (1 << LOG2_NUM_PREDICATELOCK_PARTITIONS)
+
+/*
+ * We have a number of predefined FlexLocks, plus a bunch of locks that are
+ * dynamically assigned (e.g., for shared buffers).  The FlexLock structures
+ * live in shared memory (since they contain shared data) and are identified
+ * by values of this enumerated type.  We abuse the notion of an enum somewhat
+ * by allowing values not listed in the enum declaration to be assigned.
+ * The extra value MaxDynamicFlexLock is there to keep the compiler from
+ * deciding that the enum can be represented as char or short ...
+ *
+ * If you remove a lock, please replace it with a placeholder. This retains
+ * the lock numbering, which is helpful for DTrace and other external
+ * debugging scripts.
+ */
+typedef enum FlexLockId
+{
+	BufFreelistLock,
+	ShmemIndexLock,
+	OidGenLock,
+	XidGenLock,
+	ProcArrayLock,
+	SInvalReadLock,
+	SInvalWriteLock,
+	WALInsertLock,
+	WALWriteLock,
+	ControlFileLock,
+	CheckpointLock,
+	CLogControlLock,
+	SubtransControlLock,
+	MultiXactGenLock,
+	MultiXactOffsetControlLock,
+	MultiXactMemberControlLock,
+	RelCacheInitLock,
+	BgWriterCommLock,
+	TwoPhaseStateLock,
+	TablespaceCreateLock,
+	BtreeVacuumLock,
+	AddinShmemInitLock,
+	AutovacuumLock,
+	AutovacuumScheduleLock,
+	SyncScanLock,
+	RelationMappingLock,
+	AsyncCtlLock,
+	AsyncQueueLock,
+	SerializableXactHashLock,
+	SerializableFinishedListLock,
+	SerializablePredicateLockListLock,
+	OldSerXidLock,
+	SyncRepLock,
+	/* Individual lock IDs end here */
+	FirstBufMappingLock,
+	FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
+	FirstPredicateLockMgrLock = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
+
+	/* must be last except for MaxDynamicFlexLock: */
+	NumFixedFlexLocks = FirstPredicateLockMgrLock + NUM_PREDICATELOCK_PARTITIONS,
+
+	MaxDynamicFlexLock = 1000000000
+} FlexLockId;
+
+/* Shared memory setup. */
+extern int	NumFlexLocks(void);
+extern Size FlexLockShmemSize(void);
+extern void RequestAddinFlexLocks(int n);
+extern void CreateFlexLocks(void);
+
+/* Error recovery and debugging support functions. */
+extern void FlexLockReleaseAll(void);
+extern bool FlexLockHeldByMe(FlexLockId id);
+
+#endif   /* FLEXLOCK_H */
diff --git a/src/include/storage/flexlock_internals.h b/src/include/storage/flexlock_internals.h
new file mode 100644
index 0000000..5f78da7
--- /dev/null
+++ b/src/include/storage/flexlock_internals.h
@@ -0,0 +1,88 @@
+/*-------------------------------------------------------------------------
+ *
+ * flexlock_internals.h
+ *	  Flex lock internals.  Only files which implement a FlexLock
+ *    type should need to include this.  Merging this with flexlock.h
+ *    creates a circular header dependency, but even if it didn't, this
+ *    is cleaner.
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/flexlock_internals.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FLEXLOCK_INTERNALS_H
+#define FLEXLOCK_INTERNALS_H
+
+#include "pg_trace.h"
+#include "storage/flexlock.h"
+#include "storage/proc.h"
+#include "storage/s_lock.h"
+
+/*
+ * Individual FlexLock implementations each get this many bytes to store
+ * its state; of course, a given implementation could also allocate additional
+ * shmem elsewhere, but we provide this many bytes within the array.  The
+ * header fields common to all FlexLock types are included in this number.
+ * A power of two should probably be chosen, to avoid alignment issues and
+ * cache line splitting.  It might be useful to increase this on systems where
+ * a cache line is more than 64 bytes in size.
+ */
+#define FLEX_LOCK_BYTES		64
+
+typedef struct FlexLock
+{
+	char		locktype;		/* see FLEXLOCK_TYPE_* constants */
+	slock_t		mutex;			/* Protects FlexLock state and wait queues */
+	bool		releaseOK;		/* T if ok to release waiters */
+	PGPROC	   *head;			/* head of list of waiting PGPROCs */
+	PGPROC	   *tail;			/* tail of list of waiting PGPROCs */
+	/* tail is undefined when head is NULL */
+} FlexLock;
+
+#define FLEXLOCK_TYPE_LWLOCK			'l'
+
+typedef union FlexLockPadded
+{
+	FlexLock	flex;
+	char		pad[FLEX_LOCK_BYTES];
+} FlexLockPadded;
+
+extern FlexLockPadded *FlexLockArray;
+
+extern FlexLockId FlexLockAssign(char locktype);
+extern void FlexLockRemember(FlexLockId id);
+extern void FlexLockForget(FlexLockId id);
+extern int FlexLockWait(FlexLockId id, int mode);
+
+/*
+ * We must join the wait queue while holding the spinlock, so we define this
+ * as a macro, for speed.
+ */
+#define FlexLockJoinWaitQueue(lock, mode) \
+	do { \
+		Assert(MyProc != NULL); \
+		MyProc->flWaitResult = 0; \
+		MyProc->flWaitMode = mode; \
+		MyProc->flWaitLink = NULL; \
+		if (lock->flex.head == NULL) \
+			lock->flex.head = MyProc; \
+		else \
+			lock->flex.tail->flWaitLink = MyProc; \
+		lock->flex.tail = MyProc; \
+	} while (0)
+
+#ifdef LOCK_DEBUG
+extern bool	Trace_flexlocks;
+#define FlexLockDebug(where, id, msg) \
+	do { \
+		if (Trace_flexlocks) \
+			elog(LOG, "%s(%d): %s", where, (int) id, msg); \
+	} while (0)
+#else
+#define FlexLockDebug(where, id, msg)
+#endif
+
+#endif   /* FLEXLOCK_H */
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index e106ad5..ba87db2 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -471,7 +471,7 @@ typedef enum
 #define LockHashPartition(hashcode) \
 	((hashcode) % NUM_LOCK_PARTITIONS)
 #define LockHashPartitionLock(hashcode) \
-	((LWLockId) (FirstLockMgrLock + LockHashPartition(hashcode)))
+	((FlexLockId) (FirstLockMgrLock + LockHashPartition(hashcode)))
 
 
 /*
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 438a48d..f68cddc 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -14,82 +14,7 @@
 #ifndef LWLOCK_H
 #define LWLOCK_H
 
-/*
- * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
- * here, but we need them to set up enum LWLockId correctly, and having
- * this file include lock.h or bufmgr.h would be backwards.
- */
-
-/* Number of partitions of the shared buffer mapping hashtable */
-#define NUM_BUFFER_PARTITIONS  16
-
-/* Number of partitions the shared lock tables are divided into */
-#define LOG2_NUM_LOCK_PARTITIONS  4
-#define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)
-
-/* Number of partitions the shared predicate lock tables are divided into */
-#define LOG2_NUM_PREDICATELOCK_PARTITIONS  4
-#define NUM_PREDICATELOCK_PARTITIONS  (1 << LOG2_NUM_PREDICATELOCK_PARTITIONS)
-
-/*
- * We have a number of predefined LWLocks, plus a bunch of LWLocks that are
- * dynamically assigned (e.g., for shared buffers).  The LWLock structures
- * live in shared memory (since they contain shared data) and are identified
- * by values of this enumerated type.  We abuse the notion of an enum somewhat
- * by allowing values not listed in the enum declaration to be assigned.
- * The extra value MaxDynamicLWLock is there to keep the compiler from
- * deciding that the enum can be represented as char or short ...
- *
- * If you remove a lock, please replace it with a placeholder. This retains
- * the lock numbering, which is helpful for DTrace and other external
- * debugging scripts.
- */
-typedef enum LWLockId
-{
-	BufFreelistLock,
-	ShmemIndexLock,
-	OidGenLock,
-	XidGenLock,
-	ProcArrayLock,
-	SInvalReadLock,
-	SInvalWriteLock,
-	WALInsertLock,
-	WALWriteLock,
-	ControlFileLock,
-	CheckpointLock,
-	CLogControlLock,
-	SubtransControlLock,
-	MultiXactGenLock,
-	MultiXactOffsetControlLock,
-	MultiXactMemberControlLock,
-	RelCacheInitLock,
-	BgWriterCommLock,
-	TwoPhaseStateLock,
-	TablespaceCreateLock,
-	BtreeVacuumLock,
-	AddinShmemInitLock,
-	AutovacuumLock,
-	AutovacuumScheduleLock,
-	SyncScanLock,
-	RelationMappingLock,
-	AsyncCtlLock,
-	AsyncQueueLock,
-	SerializableXactHashLock,
-	SerializableFinishedListLock,
-	SerializablePredicateLockListLock,
-	OldSerXidLock,
-	SyncRepLock,
-	/* Individual lock IDs end here */
-	FirstBufMappingLock,
-	FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
-	FirstPredicateLockMgrLock = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
-
-	/* must be last except for MaxDynamicLWLock: */
-	NumFixedLWLocks = FirstPredicateLockMgrLock + NUM_PREDICATELOCK_PARTITIONS,
-
-	MaxDynamicLWLock = 1000000000
-} LWLockId;
-
+#include "storage/flexlock.h"
 
 typedef enum LWLockMode
 {
@@ -97,22 +22,10 @@ typedef enum LWLockMode
 	LW_SHARED
 } LWLockMode;
 
-
-#ifdef LOCK_DEBUG
-extern bool Trace_lwlocks;
-#endif
-
-extern LWLockId LWLockAssign(void);
-extern void LWLockAcquire(LWLockId lockid, LWLockMode mode);
-extern bool LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode);
-extern void LWLockRelease(LWLockId lockid);
-extern void LWLockReleaseAll(void);
-extern bool LWLockHeldByMe(LWLockId lockid);
-
-extern int	NumLWLocks(void);
-extern Size LWLockShmemSize(void);
-extern void CreateLWLocks(void);
-
-extern void RequestAddinLWLocks(int n);
+extern FlexLockId LWLockAssign(void);
+extern void LWLockAcquire(FlexLockId lockid, LWLockMode mode);
+extern bool LWLockConditionalAcquire(FlexLockId lockid, LWLockMode mode);
+extern void LWLockRelease(FlexLockId lockid);
+extern bool LWLockHeldByMe(FlexLockId lockid);
 
 #endif   /* LWLOCK_H */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index c7cddc7..1f3a71d 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -99,10 +99,10 @@ struct PGPROC
 	 */
 	bool		recoveryConflictPending;
 
-	/* Info about LWLock the process is currently waiting for, if any. */
-	bool		lwWaiting;		/* true if waiting for an LW lock */
-	bool		lwExclusive;	/* true if waiting for exclusive access */
-	struct PGPROC *lwWaitLink;	/* next waiter for same LW lock */
+	/* Info about FlexLock the process is currently waiting for, if any. */
+	int			flWaitResult;	/* result of wait, or 0 if still waiting */
+	int			flWaitMode;		/* lock mode sought */
+	struct PGPROC *flWaitLink;	/* next waiter for same FlexLock */
 
 	/* Info about lock the process is currently waiting for, if any. */
 	/* waitLock and waitProcLock are NULL if not currently waiting. */
@@ -132,7 +132,7 @@ struct PGPROC
 	struct XidCache subxids;	/* cache for subtransaction XIDs */
 
 	/* Per-backend LWLock.  Protects fields below. */
-	LWLockId	backendLock;	/* protects the fields below */
+	FlexLockId	backendLock;	/* protects the fields below */
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	uint64		fpLockBits;		/* lock modes held for each fast-path slot */

procarraylock-v2.patchapplication/octet-stream; name=procarraylock-v2.patchDownload

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 3143246..2e972ec 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -40,6 +40,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "utils/acl.h"
 #include "utils/attoptcache.h"
 #include "utils/datum.h"
@@ -222,9 +223,9 @@ analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
 	/*
 	 * OK, let's do it.  First let other backends know I'm in ANALYZE.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/*
 	 * Do the normal non-recursive ANALYZE.
@@ -249,9 +250,9 @@ analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
 	 * Reset my PGPROC flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	MyPgXact->vacuumFlags &= ~PROC_IN_ANALYZE;
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e70dbed..09aa32b 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -39,6 +39,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
@@ -895,11 +896,11 @@ vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool do_toast, bool for_wraparound)
 		 * MyProc->xid/xmin, else OldestXmin might appear to go backwards,
 		 * which is probably Not Good.
 		 */
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		ProcArrayLockAcquire(PAL_EXCLUSIVE);
 		MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
 		if (for_wraparound)
 			MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 	}
 
 	/*
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 19ff524..d457e3f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -52,6 +52,7 @@
 #include "access/twophase.h"
 #include "miscadmin.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/snapmgr.h"
@@ -261,7 +262,7 @@ ProcArrayAdd(PGPROC *proc)
 	ProcArrayStruct *arrayP = procArray;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	if (arrayP->numProcs >= arrayP->maxProcs)
 	{
@@ -270,7 +271,7 @@ ProcArrayAdd(PGPROC *proc)
 		 * fixed supply of PGPROC structs too, and so we should have failed
 		 * earlier.)
 		 */
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 		ereport(FATAL,
 				(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
 				 errmsg("sorry, too many clients already")));
@@ -300,7 +301,7 @@ ProcArrayAdd(PGPROC *proc)
 	arrayP->pgprocnos[index] = proc->pgprocno;
 	arrayP->numProcs++;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -325,7 +326,7 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 		DisplayXidCache();
 #endif
 
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	if (TransactionIdIsValid(latestXid))
 	{
@@ -351,13 +352,13 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 					(arrayP->numProcs - index - 1) * sizeof (int));
 			arrayP->pgprocnos[arrayP->numProcs - 1] = -1; /* for debugging */
 			arrayP->numProcs--;
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 			return;
 		}
 	}
 
 	/* Ooops */
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	elog(LOG, "failed to find proc %p in ProcArray", proc);
 }
@@ -383,54 +384,19 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 
 	if (TransactionIdIsValid(latestXid))
 	{
-		/*
-		 * We must lock ProcArrayLock while clearing our advertised XID, so
-		 * that we do not exit the set of "running" transactions while someone
-		 * else is taking a snapshot.  See discussion in
-		 * src/backend/access/transam/README.
-		 */
-		Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-		pgxact->xid = InvalidTransactionId;
-		proc->lxid = InvalidLocalTransactionId;
-		pgxact->xmin = InvalidTransactionId;
-		/* must be cleared with xid/xmin: */
-		pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		pgxact->inCommit = false; /* be sure this is cleared in abort */
-		proc->recoveryConflictPending = false;
-
-		/* Clear the subtransaction-XID cache too while holding the lock */
-		pgxact->nxids = 0;
-		pgxact->overflowed = false;
-
-		/* Also advance global latestCompletedXid while holding the lock */
-		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-								  latestXid))
-			ShmemVariableCache->latestCompletedXid = latestXid;
-
-		LWLockRelease(ProcArrayLock);
+		Assert(proc == MyProc);
+		ProcArrayLockClearTransaction(latestXid);		
 	}
 	else
 	{
-		/*
-		 * If we have no XID, we don't need to lock, since we won't affect
-		 * anyone else's calculation of a snapshot.  We might change their
-		 * estimate of global xmin, but that's OK.
-		 */
-		Assert(!TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		proc->lxid = InvalidLocalTransactionId;
 		pgxact->xmin = InvalidTransactionId;
 		/* must be cleared with xid/xmin: */
 		pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		pgxact->inCommit = false; /* be sure this is cleared in abort */
-		proc->recoveryConflictPending = false;
-
-		Assert(pgxact->nxids == 0);
-		Assert(pgxact->overflowed == false);
 	}
+
+	proc->lxid = InvalidLocalTransactionId;
+	pgxact->inCommit = false; /* be sure this is cleared in abort */
+	proc->recoveryConflictPending = false;
 }
 
 
@@ -562,7 +528,7 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	/*
 	 * Nobody else is running yet, but take locks anyhow
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	/*
 	 * KnownAssignedXids is sorted so we cannot just add the xids, we have to
@@ -669,7 +635,7 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	Assert(TransactionIdIsNormal(ShmemVariableCache->latestCompletedXid));
 	Assert(TransactionIdIsValid(ShmemVariableCache->nextXid));
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
 	if (standbyState == STANDBY_SNAPSHOT_READY)
@@ -724,7 +690,7 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
 	/*
 	 * Uses same locking as transaction commit
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	/*
 	 * Remove subxids from known-assigned-xacts.
@@ -737,7 +703,7 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
 	if (TransactionIdPrecedes(procArray->lastOverflowedXid, max_xid))
 		procArray->lastOverflowedXid = max_xid;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -829,7 +795,7 @@ TransactionIdIsInProgress(TransactionId xid)
 					 errmsg("out of memory")));
 	}
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	/*
 	 * Now that we have the lock, we can check latestCompletedXid; if the
@@ -837,7 +803,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	 */
 	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid, xid))
 	{
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 		xc_by_latest_xid_inc();
 		return true;
 	}
@@ -865,7 +831,7 @@ TransactionIdIsInProgress(TransactionId xid)
 		 */
 		if (TransactionIdEquals(pxid, xid))
 		{
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 			xc_by_main_xid_inc();
 			return true;
 		}
@@ -887,7 +853,7 @@ TransactionIdIsInProgress(TransactionId xid)
 
 			if (TransactionIdEquals(cxid, xid))
 			{
-				LWLockRelease(ProcArrayLock);
+				ProcArrayLockRelease();
 				xc_by_child_xid_inc();
 				return true;
 			}
@@ -915,7 +881,7 @@ TransactionIdIsInProgress(TransactionId xid)
 
 		if (KnownAssignedXidExists(xid))
 		{
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 			xc_by_known_assigned_inc();
 			return true;
 		}
@@ -931,7 +897,7 @@ TransactionIdIsInProgress(TransactionId xid)
 			nxids = KnownAssignedXidsGet(xids, xid);
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/*
 	 * If none of the relevant caches overflowed, we know the Xid is not
@@ -997,7 +963,7 @@ TransactionIdIsActive(TransactionId xid)
 	if (TransactionIdPrecedes(xid, RecentXmin))
 		return false;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (i = 0; i < arrayP->numProcs; i++)
 	{
@@ -1022,7 +988,7 @@ TransactionIdIsActive(TransactionId xid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1085,7 +1051,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 	/* Cannot look for individual databases during recovery */
 	Assert(allDbs || !RecoveryInProgress());
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	/*
 	 * We initialize the MIN() calculation with latestCompletedXid + 1. This
@@ -1140,7 +1106,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 		 */
 		TransactionId kaxmin = KnownAssignedXidsGetOldestXmin();
 
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 
 		if (TransactionIdIsNormal(kaxmin) &&
 			TransactionIdPrecedes(kaxmin, result))
@@ -1151,7 +1117,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 		/*
 		 * No other information needed, so release the lock immediately.
 		 */
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 
 		/*
 		 * Compute the cutoff XID by subtracting vacuum_defer_cleanup_age,
@@ -1280,7 +1246,7 @@ GetSnapshotData(Snapshot snapshot)
 	 * It is sufficient to get shared lock on ProcArrayLock, even if we are
 	 * going to set MyProc->xmin.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	/* xmax is always latestCompletedXid + 1 */
 	xmax = ShmemVariableCache->latestCompletedXid;
@@ -1418,7 +1384,7 @@ GetSnapshotData(Snapshot snapshot)
 
 	if (!TransactionIdIsValid(MyPgXact->xmin))
 		MyPgXact->xmin = TransactionXmin = xmin;
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/*
 	 * Update globalxmin to include actual process xids.  This is a slightly
@@ -1475,7 +1441,7 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
 		return false;
 
 	/* Get lock so source xact can't end while we're doing this */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1521,7 +1487,7 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
 		break;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1595,7 +1561,7 @@ GetRunningTransactionData(void)
 	 * Ensure that no xids enter or leave the procarray while we obtain
 	 * snapshot.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 	LWLockAcquire(XidGenLock, LW_SHARED);
 
 	latestCompletedXid = ShmemVariableCache->latestCompletedXid;
@@ -1658,7 +1624,7 @@ GetRunningTransactionData(void)
 	CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
 
 	/* We don't release XidGenLock here, the caller is responsible for that */
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
 	Assert(TransactionIdIsValid(CurrentRunningXacts->oldestRunningXid));
@@ -1691,7 +1657,7 @@ GetOldestActiveTransactionId(void)
 
 	Assert(!RecoveryInProgress());
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	oldestRunningXid = ShmemVariableCache->nextXid;
 
@@ -1720,7 +1686,7 @@ GetOldestActiveTransactionId(void)
 		 */
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return oldestRunningXid;
 }
@@ -1753,7 +1719,7 @@ GetTransactionsInCommit(TransactionId **xids_p)
 	xids = (TransactionId *) palloc(arrayP->maxProcs * sizeof(TransactionId));
 	nxids = 0;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1768,7 +1734,7 @@ GetTransactionsInCommit(TransactionId **xids_p)
 			xids[nxids++] = pxid;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	*xids_p = xids;
 	return nxids;
@@ -1790,7 +1756,7 @@ HaveTransactionsInCommit(TransactionId *xids, int nxids)
 	ProcArrayStruct *arrayP = procArray;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1818,7 +1784,7 @@ HaveTransactionsInCommit(TransactionId *xids, int nxids)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1840,7 +1806,7 @@ BackendPidGetProc(int pid)
 	if (pid == 0)				/* never match dummy PGPROCs */
 		return NULL;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1853,7 +1819,7 @@ BackendPidGetProc(int pid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1881,7 +1847,7 @@ BackendXidGetPid(TransactionId xid)
 	if (xid == InvalidTransactionId)	/* never match invalid xid */
 		return 0;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1896,7 +1862,7 @@ BackendXidGetPid(TransactionId xid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1951,7 +1917,7 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
 	vxids = (VirtualTransactionId *)
 		palloc(sizeof(VirtualTransactionId) * arrayP->maxProcs);
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1989,7 +1955,7 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	*nvxids = count;
 	return vxids;
@@ -2048,7 +2014,7 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
 					 errmsg("out of memory")));
 	}
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2083,7 +2049,7 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/* add the terminator */
 	vxids[count].backendId = InvalidBackendId;
@@ -2104,7 +2070,7 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
 	int			index;
 	pid_t		pid = 0;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2131,7 +2097,7 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return pid;
 }
@@ -2207,7 +2173,7 @@ CountDBBackends(Oid databaseid)
 	int			count = 0;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2221,7 +2187,7 @@ CountDBBackends(Oid databaseid)
 			count++;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return count;
 }
@@ -2237,7 +2203,7 @@ CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending)
 	pid_t		pid = 0;
 
 	/* tell all backends to die */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2263,7 +2229,7 @@ CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -2276,7 +2242,7 @@ CountUserBackends(Oid roleid)
 	int			count = 0;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2289,7 +2255,7 @@ CountUserBackends(Oid roleid)
 			count++;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return count;
 }
@@ -2337,7 +2303,7 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 
 		*nbackends = *nprepared = 0;
 
-		LWLockAcquire(ProcArrayLock, LW_SHARED);
+		ProcArrayLockAcquire(PAL_SHARED);
 
 		for (index = 0; index < arrayP->numProcs; index++)
 		{
@@ -2363,7 +2329,7 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 			}
 		}
 
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 
 		if (!found)
 			return false;		/* no conflicting backends, so done */
@@ -2416,7 +2382,7 @@ XidCacheRemoveRunningXids(TransactionId xid,
 	 * to abort subtransactions, but pending closer analysis we'd best be
 	 * conservative.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	/*
 	 * Under normal circumstances xid and xids[] will be in increasing order,
@@ -2464,7 +2430,7 @@ XidCacheRemoveRunningXids(TransactionId xid,
 							  latestXid))
 		ShmemVariableCache->latestCompletedXid = latestXid;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 #ifdef XIDCACHE_DEBUG
@@ -2631,7 +2597,7 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
 	/*
 	 * Uses same locking as transaction commit
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	KnownAssignedXidsRemoveTree(xid, nsubxids, subxids);
 
@@ -2640,7 +2606,7 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
 							  max_xid))
 		ShmemVariableCache->latestCompletedXid = max_xid;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -2650,9 +2616,9 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
 void
 ExpireAllKnownAssignedTransactionIds(void)
 {
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	KnownAssignedXidsRemovePreceding(InvalidTransactionId);
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -2662,9 +2628,9 @@ ExpireAllKnownAssignedTransactionIds(void)
 void
 ExpireOldKnownAssignedTransactionIds(TransactionId xid)
 {
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	KnownAssignedXidsRemovePreceding(xid);
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 
@@ -2886,7 +2852,7 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 	{
 		/* must hold lock to compress */
 		if (!exclusive_lock)
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+			ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 		KnownAssignedXidsCompress(true);
 
@@ -2894,7 +2860,7 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 		/* note: we no longer care about the tail pointer */
 
 		if (!exclusive_lock)
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 
 		/*
 		 * If it still won't fit then we're out of memory
diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index 3730e51..27eaa97 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = flexlock.o lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o \
-	predicate.o
+	procarraylock.o predicate.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/storage/lmgr/flexlock.c b/src/backend/storage/lmgr/flexlock.c
index f517589..f96437b 100644
--- a/src/backend/storage/lmgr/flexlock.c
+++ b/src/backend/storage/lmgr/flexlock.c
@@ -30,6 +30,7 @@
 #include "storage/lwlock.h"
 #include "storage/predicate.h"
 #include "storage/proc.h"
+#include "storage/procarraylock.h"
 #include "storage/spin.h"
 #include "utils/elog.h"
 
@@ -177,9 +178,14 @@ CreateFlexLocks(void)
 
 	FlexLockArray = (FlexLockPadded *) ptr;
 
-	/* All of the "fixed" FlexLocks are LWLocks. */
+	/* All of the "fixed" FlexLocks are LWLocks - except ProcArrayLock. */
 	for (id = 0, lock = FlexLockArray; id < NumFixedFlexLocks; id++, lock++)
-		FlexLockInit(&lock->flex, FLEXLOCK_TYPE_LWLOCK);
+	{
+		if (id == ProcArrayLock)
+			FlexLockInit(&lock->flex, FLEXLOCK_TYPE_PROCARRAYLOCK);
+		else
+			FlexLockInit(&lock->flex, FLEXLOCK_TYPE_LWLOCK);
+	}
 
 	/*
 	 * Initialize the dynamic-allocation counter, which is stored just before
@@ -323,13 +329,20 @@ FlexLockReleaseAll(void)
 {
 	while (num_held_flexlocks > 0)
 	{
+		FlexLockId	id;
+		FlexLock   *flex;
+
 		HOLD_INTERRUPTS();		/* match the upcoming RESUME_INTERRUPTS */
 
-		/*
-		 * FLEXTODO: When we have multiple types of flex locks, this will
-		 * need to call the appropriate release function for each lock type.
-		 */
-		LWLockRelease(held_flexlocks[num_held_flexlocks - 1]);
+		id = held_flexlocks[num_held_flexlocks - 1];
+		flex = &FlexLockArray[id].flex;
+		if (flex->locktype == FLEXLOCK_TYPE_LWLOCK)
+			LWLockRelease(id);
+		else
+		{
+			Assert(id == ProcArrayLock);
+			ProcArrayLockRelease();
+		}
 	}
 }
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index db01e9d..173b7cb 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -46,6 +46,7 @@
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "storage/procsignal.h"
 #include "storage/spin.h"
 #include "utils/timestamp.h"
@@ -1083,7 +1084,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 			PGPROC	   *autovac = GetBlockingAutoVacuumPgproc();
 			PGXACT	   *autovac_pgxact = &ProcGlobal->allPgXact[autovac->pgprocno];
 
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+			ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 			/*
 			 * Only do it if the worker is not working to protect against Xid
@@ -1099,7 +1100,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 					 pid);
 
 				/* don't hold the lock across the kill() syscall */
-				LWLockRelease(ProcArrayLock);
+				ProcArrayLockRelease();
 
 				/* send the autovacuum worker Back to Old Kent Road */
 				if (kill(pid, SIGINT) < 0)
@@ -1111,7 +1112,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 				}
 			}
 			else
-				LWLockRelease(ProcArrayLock);
+				ProcArrayLockRelease();
 
 			/* prevent signal from being resent more than once */
 			allow_autovacuum_cancel = false;
diff --git a/src/backend/storage/lmgr/procarraylock.c b/src/backend/storage/lmgr/procarraylock.c
new file mode 100644
index 0000000..7cd4b6b
--- /dev/null
+++ b/src/backend/storage/lmgr/procarraylock.c
@@ -0,0 +1,344 @@
+/*-------------------------------------------------------------------------
+ *
+ * procarraylock.c
+ *	  Lock management for the ProcArray
+ *
+ * Because the ProcArray data structure is highly trafficked, it is
+ * critical that mutual exclusion for ProcArray options be as efficient
+ * as possible.  A particular problem is transaction end (commit or abort)
+ * which cannot be done in parallel with snapshot acquisition.  We
+ * therefore include some special hacks to deal with this case efficiently.
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/lmgr/procarraylock.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "pg_trace.h"
+#include "access/transam.h"
+#include "storage/flexlock_internals.h"
+#include "storage/ipc.h"
+#include "storage/procarraylock.h"
+#include "storage/proc.h"
+#include "storage/spin.h"
+
+typedef struct ProcArrayLockStruct
+{
+	FlexLock	flex;			/* common FlexLock infrastructure */
+	char		exclusive;		/* # of exclusive holders (0 or 1) */
+	int			shared;			/* # of shared holders (0..MaxBackends) */
+	PGPROC	   *ending;			/* transactions wishing to clear state */
+	TransactionId	latest_ending_xid;	/* latest ending XID */
+} ProcArrayLockStruct;
+
+/* There is only one ProcArrayLock. */
+#define	ProcArrayLockPointer() \
+	(AssertMacro(FlexLockArray[ProcArrayLock].flex.locktype == \
+		FLEXLOCK_TYPE_PROCARRAYLOCK), \
+	 (volatile ProcArrayLockStruct *) &FlexLockArray[ProcArrayLock])
+
+/*
+ * ProcArrayLockAcquire - acquire a lightweight lock in the specified mode
+ *
+ * If the lock is not available, sleep until it is.
+ *
+ * Side effect: cancel/die interrupts are held off until lock release.
+ */
+void
+ProcArrayLockAcquire(ProcArrayLockMode mode)
+{
+	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
+	PGPROC	   *proc = MyProc;
+	bool		retry = false;
+	int			extraWaits = 0;
+
+	/*
+	 * We can't wait if we haven't got a PGPROC.  This should only occur
+	 * during bootstrap or shared memory initialization.  Put an Assert here
+	 * to catch unsafe coding practices.
+	 */
+	Assert(!(proc == NULL && IsUnderPostmaster));
+
+	/*
+	 * Lock out cancel/die interrupts until we exit the code section protected
+	 * by the ProcArrayLock.  This ensures that interrupts will not interfere
+     * with manipulations of data structures in shared memory.
+	 */
+	HOLD_INTERRUPTS();
+
+	/*
+	 * Loop here to try to acquire lock after each time we are signaled by
+	 * ProcArrayLockRelease.  See comments in LWLockAcquire for an explanation
+	 * of why do we not attempt to hand off the lock directly.
+	 */
+	for (;;)
+	{
+		bool		mustwait;
+
+		/* Acquire mutex.  Time spent holding mutex should be short! */
+		SpinLockAcquire(&lock->flex.mutex);
+
+		/* If retrying, allow LWLockRelease to release waiters again */
+		if (retry)
+			lock->flex.releaseOK = true;
+
+		/* If I can get the lock, do so quickly. */
+		if (mode == PAL_EXCLUSIVE)
+		{
+			if (lock->exclusive == 0 && lock->shared == 0)
+			{
+				lock->exclusive++;
+				mustwait = false;
+			}
+			else
+				mustwait = true;
+		}
+		else
+		{
+			if (lock->exclusive == 0)
+			{
+				lock->shared++;
+				mustwait = false;
+			}
+			else
+				mustwait = true;
+		}
+
+		if (!mustwait)
+			break;				/* got the lock */
+
+		/* Add myself to wait queue. */
+		FlexLockJoinWaitQueue(lock, (int) mode);
+
+		/* Can release the mutex now */
+		SpinLockRelease(&lock->flex.mutex);
+
+		/* Wait until awakened. */
+		extraWaits += FlexLockWait(ProcArrayLock, mode);
+
+		/* Now loop back and try to acquire lock again. */
+		retry = true;
+	}
+
+	/* We are done updating shared state of the lock itself. */
+	SpinLockRelease(&lock->flex.mutex);
+
+	TRACE_POSTGRESQL_FLEXLOCK_ACQUIRE(lockid, mode);
+
+	/* Add lock to list of locks held by this backend */
+	FlexLockRemember(ProcArrayLock);
+
+	/*
+	 * Fix the process wait semaphore's count for any absorbed wakeups.
+	 */
+	while (extraWaits-- > 0)
+		PGSemaphoreUnlock(&proc->sem);
+}
+
+/*
+ * ProcArrayLockClearTransaction - safely clear transaction details
+ *
+ * This can't be done while ProcArrayLock is held, but it's so fast that
+ * we can afford to do it while holding the spinlock, rather than acquiring
+ * and releasing the lock.
+ */
+void
+ProcArrayLockClearTransaction(TransactionId latestXid)
+{
+	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
+	PGPROC	   *proc = MyProc;
+	int			extraWaits = 0;
+	bool		mustwait;
+
+	HOLD_INTERRUPTS();
+
+	/* Acquire mutex.  Time spent holding mutex should be short! */
+	SpinLockAcquire(&lock->flex.mutex);
+
+	if (lock->exclusive == 0 && lock->shared == 0)
+	{
+		{
+			volatile PGPROC *vproc = proc;
+			volatile PGXACT *pgxact = &ProcGlobal->allPgXact[vproc->pgprocno];
+			/* If there are no lockers, clear the critical PGPROC fields. */
+			pgxact->xid = InvalidTransactionId;
+	        pgxact->xmin = InvalidTransactionId;
+	        /* must be cleared with xid/xmin: */
+	        pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
+			pgxact->nxids = 0;
+			pgxact->overflowed = false;
+		}
+		mustwait = false;
+
+        /* Also advance global latestCompletedXid while holding the lock */
+        if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
+                                  latestXid))
+            ShmemVariableCache->latestCompletedXid = latestXid;
+	}
+	else
+	{
+		/* Rats, must wait. */
+		proc->flWaitLink = lock->ending;
+		lock->ending = proc;
+		if (!TransactionIdIsValid(lock->latest_ending_xid) ||
+				TransactionIdPrecedes(lock->latest_ending_xid, latestXid)) 
+			lock->latest_ending_xid = latestXid;
+		mustwait = true;
+	}
+
+	/* Can release the mutex now */
+	SpinLockRelease(&lock->flex.mutex);
+
+	/*
+	 * If we were not able to perfom the operation immediately, we must wait.
+	 * But we need not retry after being awoken, because the last lock holder
+	 * to release the lock will do the work first, on our behalf.
+	 */
+	if (mustwait)
+	{
+		extraWaits += FlexLockWait(ProcArrayLock, 2);
+		while (extraWaits-- > 0)
+			PGSemaphoreUnlock(&proc->sem);
+	}
+
+	RESUME_INTERRUPTS();
+}
+
+/*
+ * ProcArrayLockRelease - release a previously acquired lock
+ */
+void
+ProcArrayLockRelease(void)
+{
+	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
+	PGPROC	   *head;
+	PGPROC	   *ending = NULL;
+	PGPROC	   *proc;
+
+	FlexLockForget(ProcArrayLock);
+
+	/* Acquire mutex.  Time spent holding mutex should be short! */
+	SpinLockAcquire(&lock->flex.mutex);
+
+	/* Release my hold on lock */
+	if (lock->exclusive > 0)
+		lock->exclusive--;
+	else
+	{
+		Assert(lock->shared > 0);
+		lock->shared--;
+	}
+
+	/*
+	 * If the lock is now free, but there are some transactions trying to
+	 * end, we must clear the critical PGPROC fields for them, and save a
+	 * list of them so we can wake them up.
+	 */
+	if (lock->exclusive == 0 && lock->shared == 0 && lock->ending != NULL)
+	{
+		volatile PGPROC *vproc;
+
+		ending = lock->ending;
+		vproc = ending;
+
+		while (vproc != NULL)
+		{
+			volatile PGXACT *pgxact = &ProcGlobal->allPgXact[vproc->pgprocno];
+
+        	pgxact->xid = InvalidTransactionId;
+	        pgxact->xmin = InvalidTransactionId;
+	        /* must be cleared with xid/xmin: */
+	        pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
+			pgxact->nxids = 0;
+			pgxact->overflowed = false;
+			vproc = vproc->flWaitLink;
+		}
+
+		/* Also advance global latestCompletedXid */
+		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
+								  lock->latest_ending_xid))
+			ShmemVariableCache->latestCompletedXid = lock->latest_ending_xid;
+
+		/* Reset lock state. */
+		lock->ending = NULL;
+		lock->latest_ending_xid = InvalidTransactionId;
+	}
+
+	/*
+	 * See if I need to awaken any waiters.  If I released a non-last shared
+	 * hold, there cannot be anything to do.  Also, do not awaken any waiters
+	 * if someone has already awakened waiters that haven't yet acquired the
+	 * lock.
+	 */
+	head = lock->flex.head;
+	if (head != NULL)
+	{
+		if (lock->exclusive == 0 && lock->shared == 0 && lock->flex.releaseOK)
+		{
+			/*
+			 * Remove the to-be-awakened PGPROCs from the queue.  If the front
+			 * waiter wants exclusive lock, awaken him only. Otherwise awaken
+			 * as many waiters as want shared access.
+			 */
+			proc = head;
+			if (proc->flWaitMode != LW_EXCLUSIVE)
+			{
+				while (proc->flWaitLink != NULL &&
+					   proc->flWaitLink->flWaitMode != LW_EXCLUSIVE)
+					proc = proc->flWaitLink;
+			}
+			/* proc is now the last PGPROC to be released */
+			lock->flex.head = proc->flWaitLink;
+			proc->flWaitLink = NULL;
+			/* prevent additional wakeups until retryer gets to run */
+			lock->flex.releaseOK = false;
+		}
+		else
+		{
+			/* lock is still held, can't awaken anything */
+			head = NULL;
+		}
+	}
+
+	/* We are done updating shared state of the lock itself. */
+	SpinLockRelease(&lock->flex.mutex);
+
+	TRACE_POSTGRESQL_FLEXLOCK_RELEASE(lockid);
+
+	/*
+	 * Awaken any waiters I removed from the queue.
+	 */
+	while (head != NULL)
+	{
+		FlexLockDebug("LWLockRelease", lockid, "release waiter");
+		proc = head;
+		head = proc->flWaitLink;
+		proc->flWaitLink = NULL;
+		proc->flWaitResult = 1;		/* any non-zero value will do */
+		PGSemaphoreUnlock(&proc->sem);
+	}
+
+	/*
+	 * Also awaken any processes whose critical PGPROC fields I cleared
+	 */
+	while (ending != NULL)
+	{
+		FlexLockDebug("LWLockRelease", lockid, "release ending");
+		proc = ending;
+		ending = proc->flWaitLink;
+		proc->flWaitLink = NULL;
+		proc->flWaitResult = 1;		/* any non-zero value will do */
+		PGSemaphoreUnlock(&proc->sem);
+	}
+
+	/*
+	 * Now okay to allow cancel/die interrupts.
+	 */
+	RESUME_INTERRUPTS();
+}
diff --git a/src/include/storage/flexlock_internals.h b/src/include/storage/flexlock_internals.h
index 5f78da7..d1bca45 100644
--- a/src/include/storage/flexlock_internals.h
+++ b/src/include/storage/flexlock_internals.h
@@ -43,6 +43,7 @@ typedef struct FlexLock
 } FlexLock;
 
 #define FLEXLOCK_TYPE_LWLOCK			'l'
+#define FLEXLOCK_TYPE_PROCARRAYLOCK		'p'
 
 typedef union FlexLockPadded
 {
diff --git a/src/include/storage/procarraylock.h b/src/include/storage/procarraylock.h
new file mode 100644
index 0000000..678ca6f
--- /dev/null
+++ b/src/include/storage/procarraylock.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * procarraylock.h
+ *	  Lock management for the ProcArray
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/lwlock.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PROCARRAYLOCK_H
+#define PROCARRAYLOCK_H
+
+#include "storage/flexlock.h"
+
+typedef enum ProcArrayLockMode
+{
+	PAL_EXCLUSIVE,
+	PAL_SHARED
+} ProcArrayLockMode;
+
+extern void ProcArrayLockAcquire(ProcArrayLockMode mode);
+extern void ProcArrayLockClearTransaction(TransactionId latestXid);
+extern void ProcArrayLockRelease(void);
+
+#endif   /* PROCARRAYLOCK_H */

#31

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 14 years ago

In reply to: Robert Haas (#30)

Re: FlexLocks

Robert Haas <robertmhaas@gmail.com> wrote:

Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

Why is it OK to drop these lines from the else condition in
ProcArrayEndTransaction()?:

/* must be cleared with xid/xmin: */
proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;

It's probably not. Oops.

OK. I see that's back now.

I believe the attached patch versions address your comments
regarding the flexlock patch as well; it is also rebased over the
PGXACT patch, which has since been committed.

Applies cleanly again.

The extraWaits code still looks like black magic to me, so unless
someone can point me in the right direction to really understand
that, I can't address whether it's OK.

I don't think I've changed the behavior, so it should be fine.
The idea is that something like this can happen:

[explanation of the extraWaits behavior]

Thanks. I'll spend some time reviewing this part. There is some
rearrangement of related code, and this should arm me with enough of
a grasp to review that.

[gripes about modularity compromise and lack of pluggability]

let me think about that. I haven't addressed that in this
version.

OK. There are a few things I found in this pass which missed in the
last. One contrib module was missed, I found another typo in a
comment, and I think we can reduce the include files a bit. Rather
than describe it, I'm attaching a patch file over the top of your
patches with what I think might be a good idea. I don't think
there's anything here to merit a new round of benchmarking.

-Kevin

#32

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 14 years ago

In reply to: Kevin Grittner (#31)

1 attachment(s)

Re: FlexLocks

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

OK. There are a few things I found in this pass which missed in the
last. One contrib module was missed, I found another typo in a
comment, and I think we can reduce the include files a bit. Rather
than describe it, I'm attaching a patch file over the top of your
patches with what I think might be a good idea.

This time with it really attached.

-Kevin

Attachments:

flexlock-v3a.patchtext/plain; name=flexlock-v3a.patchDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 51b24d0..6167e36 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -260,7 +260,7 @@ _PG_init(void)
 	 * resources in pgss_shmem_startup().
 	 */
 	RequestAddinShmemSpace(pgss_memsize());
-	RequestAddinLWLocks(1);
+	RequestAddinFlexLocks(1);
 
 	/*
 	 * Install hooks.
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 59d18eb..a07a4c9 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -109,7 +109,6 @@
 #include "postmaster/syslogger.h"
 #include "replication/walsender.h"
 #include "storage/fd.h"
-#include "storage/flexlock_internals.h"
 #include "storage/ipc.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
diff --git a/src/backend/storage/lmgr/flexlock.c b/src/backend/storage/lmgr/flexlock.c
index f96437b..6145951 100644
--- a/src/backend/storage/lmgr/flexlock.c
+++ b/src/backend/storage/lmgr/flexlock.c
@@ -22,17 +22,16 @@
 #include "postgres.h"
 
 #include "miscadmin.h"
+#include "pg_trace.h"
 #include "access/clog.h"
 #include "access/multixact.h"
 #include "access/subtrans.h"
 #include "commands/async.h"
+#include "storage/flexlock.h"
 #include "storage/flexlock_internals.h"
-#include "storage/lwlock.h"
 #include "storage/predicate.h"
-#include "storage/proc.h"
 #include "storage/procarraylock.h"
 #include "storage/spin.h"
-#include "utils/elog.h"
 
 /*
  * We use this structure to keep track of flex locks held, for release
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 173b7cb..10ec83b 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -755,7 +755,7 @@ ProcKill(int code, Datum arg)
 #endif
 
 	/*
-	 * Release any felx locks I am holding.  There really shouldn't be any, but
+	 * Release any flex locks I am holding.  There really shouldn't be any, but
 	 * it's cheap to check again before we cut the knees off the flex lock
 	 * facility by releasing our PGPROC ...
 	 */
diff --git a/src/include/storage/flexlock_internals.h b/src/include/storage/flexlock_internals.h
index d1bca45..a5c5711 100644
--- a/src/include/storage/flexlock_internals.h
+++ b/src/include/storage/flexlock_internals.h
@@ -16,8 +16,6 @@
 #ifndef FLEXLOCK_INTERNALS_H
 #define FLEXLOCK_INTERNALS_H
 
-#include "pg_trace.h"
-#include "storage/flexlock.h"
 #include "storage/proc.h"
 #include "storage/s_lock.h"

#33

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Kevin Grittner (#32)

Re: FlexLocks

On Wed, Nov 30, 2011 at 7:01 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

OK. There are a few things I found in this pass which missed in the
last. One contrib module was missed, I found another typo in a
comment, and I think we can reduce the include files a bit. Rather
than describe it, I'm attaching a patch file over the top of your
patches with what I think might be a good idea.

This time with it really attached.

Thanks, I've merged those in.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#34

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 14 years ago

In reply to: Kevin Grittner (#31)

1 attachment(s)

Re: FlexLocks

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

The extraWaits code still looks like black magic to me

[explanation of the extraWaits behavior]

Thanks. I'll spend some time reviewing this part. There is some
rearrangement of related code, and this should arm me with enough
of a grasp to review that.

I got through that without spotting any significant problems,
although I offer the attached micro-optimizations for your
consideration. (Applies over the top of your patches.)

As far as I'm concerned it looks great and "Ready for Committer"
except for the modularity/pluggability question. Perhaps that could
be done as a follow-on patch (if deemed a good idea)?

-Kevin

Attachments:

flexlock-v3b.patchtext/plain; name=flexlock-v3b.patchDownload

diff --git a/src/backend/storage/lmgr/procarraylock.c b/src/backend/storage/lmgr/procarraylock.c
index 7cd4b6b..13b51cb 100644
--- a/src/backend/storage/lmgr/procarraylock.c
+++ b/src/backend/storage/lmgr/procarraylock.c
@@ -153,9 +153,10 @@ ProcArrayLockClearTransaction(TransactionId latestXid)
 {
 	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
 	PGPROC	   *proc = MyProc;
-	int			extraWaits = 0;
 	bool		mustwait;
 
+	Assert(TransactionIdIsValid(latestXid));
+
 	HOLD_INTERRUPTS();
 
 	/* Acquire mutex.  Time spent holding mutex should be short! */
@@ -186,8 +187,11 @@ ProcArrayLockClearTransaction(TransactionId latestXid)
 		/* Rats, must wait. */
 		proc->flWaitLink = lock->ending;
 		lock->ending = proc;
-		if (!TransactionIdIsValid(lock->latest_ending_xid) ||
-				TransactionIdPrecedes(lock->latest_ending_xid, latestXid)) 
+		/*
+		 * lock->latest_ending_xid may be invalid, but invalid transaction
+		 * IDs always precede valid ones.
+		 */
+		if (TransactionIdPrecedes(lock->latest_ending_xid, latestXid)) 
 			lock->latest_ending_xid = latestXid;
 		mustwait = true;
 	}
@@ -202,7 +206,9 @@ ProcArrayLockClearTransaction(TransactionId latestXid)
 	 */
 	if (mustwait)
 	{
-		extraWaits += FlexLockWait(ProcArrayLock, 2);
+		int			extraWaits;
+
+		extraWaits = FlexLockWait(ProcArrayLock, 2);
 		while (extraWaits-- > 0)
 			PGSemaphoreUnlock(&proc->sem);
 	}
@@ -247,7 +253,7 @@ ProcArrayLockRelease(void)
 		ending = lock->ending;
 		vproc = ending;
 
-		while (vproc != NULL)
+		do
 		{
 			volatile PGXACT *pgxact = &ProcGlobal->allPgXact[vproc->pgprocno];
 
@@ -258,7 +264,7 @@ ProcArrayLockRelease(void)
 			pgxact->nxids = 0;
 			pgxact->overflowed = false;
 			vproc = vproc->flWaitLink;
-		}
+		} while (vproc != NULL);
 
 		/* Also advance global latestCompletedXid */
 		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,

#35

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Kevin Grittner (#34)

2 attachment(s)

Re: FlexLocks

On Fri, Dec 2, 2011 at 4:11 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

As far as I'm concerned it looks great and "Ready for Committer"
except for the modularity/pluggability question. Perhaps that could
be done as a follow-on patch (if deemed a good idea)?

I investigated the performance issues with the previous version of the
patch and found that turning some of the FlexLock support functions
into macros seems to help quite a bit, so I've done that in the
attached versions. I've also incorporated Kevin's incremental patch
from his previous version.

That having been said, I'm leaning away from applying any of this for
the time being. For one thing, Pavan's PGXACT stuff has greatly
eroded the benefit of this patch. I'm fairly optimistic about the
prospects of finding other good uses for the FlexLock machinery down
the road, but I don't feel like that's enough reason to apply it now.
Also, there are several other good ideas kicking around out there for
further reducing ProcArrayLock contention, some of which are
lower-impact than this and others of which would obsolete the entire
approach. So it seems like I should probably let the dust settle on
those things before deciding whether this even makes sense. In
particular, I'm starting to think that resolving the contention
between GetSnapshotData() and ProcArrayEndTransaction() is basically a
layup at this point, and I really want to go for a three-pointer,
namely also eliminating the spinlock contention between different
backends all trying to acquire ProcArrayLock in shared mode during
read-only operation.

So, I'm going to mark this Returned with Feedback for now and keep
working on the problem. Thanks for the review and positive comments.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

flexlock-v4.patchapplication/octet-stream; name=flexlock-v4.patchDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 8dc3054..6167e36 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -105,7 +105,7 @@ typedef struct pgssEntry
  */
 typedef struct pgssSharedState
 {
-	LWLockId	lock;			/* protects hashtable search/modification */
+	FlexLockId	lock;			/* protects hashtable search/modification */
 	int			query_size;		/* max query length in bytes */
 } pgssSharedState;
 
@@ -260,7 +260,7 @@ _PG_init(void)
 	 * resources in pgss_shmem_startup().
 	 */
 	RequestAddinShmemSpace(pgss_memsize());
-	RequestAddinLWLocks(1);
+	RequestAddinFlexLocks(1);
 
 	/*
 	 * Install hooks.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d1e628f..8517b36 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6199,14 +6199,14 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
      </varlistentry>
 
      <varlistentry>
-      <term><varname>trace_lwlocks</varname> (<type>boolean</type>)</term>
+      <term><varname>trace_flexlocks</varname> (<type>boolean</type>)</term>
       <indexterm>
-       <primary><varname>trace_lwlocks</> configuration parameter</primary>
+       <primary><varname>trace_flexlocks</> configuration parameter</primary>
       </indexterm>
       <listitem>
        <para>
-        If on, emit information about lightweight lock usage.  Lightweight
-        locks are intended primarily to provide mutual exclusion of access
+        If on, emit information about FlexLock usage.  FlexLocks
+        are intended primarily to provide mutual exclusion of access
         to shared-memory data structures.
        </para>
        <para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d6056a2..f110253 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1733,49 +1733,49 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
       or kilobytes of memory used for an internal sort.</entry>
     </row>
     <row>
-     <entry>lwlock-acquire</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock has been acquired.
-      arg0 is the LWLock's ID.
-      arg1 is the requested lock mode, either exclusive or shared.</entry>
+     <entry>flexlock-acquire</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock has been acquired.
+      arg0 is the FlexLock's ID.
+      arg1 is the requested lock mode.</entry>
     </row>
     <row>
-     <entry>lwlock-release</entry>
-     <entry>(LWLockId)</entry>
-     <entry>Probe that fires when an LWLock has been released (but note
+     <entry>flexlock-release</entry>
+     <entry>(FlexLockId)</entry>
+     <entry>Probe that fires when a FlexLock has been released (but note
       that any released waiters have not yet been awakened).
-      arg0 is the LWLock's ID.</entry>
+      arg0 is the FlexLock's ID.</entry>
     </row>
     <row>
-     <entry>lwlock-wait-start</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock was not immediately available and
+     <entry>flexlock-wait-start</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock was not immediately available and
       a server process has begun to wait for the lock to become available.
-      arg0 is the LWLock's ID.
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
-     <entry>lwlock-wait-done</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
+     <entry>flexlock-wait-done</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
      <entry>Probe that fires when a server process has been released from its
-      wait for an LWLock (it does not actually have the lock yet).
-      arg0 is the LWLock's ID.
+      wait for an FlexLock (it does not actually have the lock yet).
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
-     <entry>lwlock-condacquire</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock was successfully acquired when the
-      caller specified no waiting.
-      arg0 is the LWLock's ID.
+     <entry>flexlock-condacquire</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock was successfully acquired when
+      the caller specified no waiting.
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
-     <entry>lwlock-condacquire-fail</entry>
-     <entry>(LWLockId, LWLockMode)</entry>
-     <entry>Probe that fires when an LWLock was not successfully acquired when
-      the caller specified no waiting.
-      arg0 is the LWLock's ID.
+     <entry>flexlock-condacquire-fail</entry>
+     <entry>(FlexLockId, FlexLockMode)</entry>
+     <entry>Probe that fires when an FlexLock was not successfully acquired
+      when the caller specified no waiting.
+      arg0 is the FlexLock's ID.
       arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
@@ -1822,11 +1822,11 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
      <entry>unsigned int</entry>
     </row>
     <row>
-     <entry>LWLockId</entry>
+     <entry>FlexLockId</entry>
      <entry>int</entry>
     </row>
     <row>
-     <entry>LWLockMode</entry>
+     <entry>FlexLockMode</entry>
      <entry>int</entry>
     </row>
     <row>
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index f7caa34..09d5862 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -151,7 +151,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(bool));		/* page_dirty[] */
 	sz += MAXALIGN(nslots * sizeof(int));		/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));		/* page_lru_count[] */
-	sz += MAXALIGN(nslots * sizeof(LWLockId));	/* buffer_locks[] */
+	sz += MAXALIGN(nslots * sizeof(FlexLockId));		/* buffer_locks[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -161,7 +161,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLockId ctllock, const char *subdir)
+			  FlexLockId ctllock, const char *subdir)
 {
 	SlruShared	shared;
 	bool		found;
@@ -202,8 +202,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		offset += MAXALIGN(nslots * sizeof(int));
 		shared->page_lru_count = (int *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(int));
-		shared->buffer_locks = (LWLockId *) (ptr + offset);
-		offset += MAXALIGN(nslots * sizeof(LWLockId));
+		shared->buffer_locks = (FlexLockId *) (ptr + offset);
+		offset += MAXALIGN(nslots * sizeof(FlexLockId));
 
 		if (nlsns > 0)
 		{
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d2fecb1..943929b 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -326,9 +326,9 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 	proc->backendId = InvalidBackendId;
 	proc->databaseId = databaseid;
 	proc->roleId = owner;
-	proc->lwWaiting = false;
-	proc->lwExclusive = false;
-	proc->lwWaitLink = NULL;
+	proc->flWaitResult = 0;
+	proc->flWaitMode = 0;
+	proc->flWaitLink = NULL;
 	proc->waitLock = NULL;
 	proc->waitProcLock = NULL;
 	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index c383011..0da2ae5 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2248,7 +2248,7 @@ AbortTransaction(void)
 	 * Releasing LW locks is critical since we might try to grab them again
 	 * while cleaning up!
 	 */
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 
 	/* Clean up buffer I/O and buffer context locks, too */
 	AbortBufferIO();
@@ -4138,7 +4138,7 @@ AbortSubTransaction(void)
 	 * FIXME This may be incorrect --- Are there some locks we should keep?
 	 * Buffer locks, for example?  I don't think so but I'm not sure.
 	 */
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 
 	AbortBufferIO();
 	UnlockBuffers();
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 6bf2421..9ceee91 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -562,13 +562,13 @@ bootstrap_signals(void)
  * Begin shutdown of an auxiliary process.	This is approximately the equivalent
  * of ShutdownPostgres() in postinit.c.  We can't run transactions in an
  * auxiliary process, so most of the work of AbortTransaction() is not needed,
- * but we do need to make sure we've released any LWLocks we are holding.
+ * but we do need to make sure we've released any flex locks we are holding.
  * (This is only critical during an error exit.)
  */
 static void
 ShutdownAuxiliaryProcess(int code, Datum arg)
 {
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index cacedab..f33f573 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -176,9 +176,10 @@ BackgroundWriterMain(void)
 		/*
 		 * These operations are really just a minimal subset of
 		 * AbortTransaction().	We don't have very many resources to worry
-		 * about in bgwriter, but we do have LWLocks, buffers, and temp files.
+		 * about in bgwriter, but we do have flex locks, buffers, and temp
+		 * files.
 		 */
-		LWLockReleaseAll();
+		FlexLockReleaseAll();
 		AbortBufferIO();
 		UnlockBuffers();
 		/* buffer pins are released here: */
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e9ae1e8..49f07a7 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -281,9 +281,10 @@ CheckpointerMain(void)
 		/*
 		 * These operations are really just a minimal subset of
 		 * AbortTransaction().	We don't have very many resources to worry
-		 * about in checkpointer, but we do have LWLocks, buffers, and temp files.
+		 * about in checkpointer, but we do have flex locks, buffers, and temp
+		 * files.
 		 */
-		LWLockReleaseAll();
+		FlexLockReleaseAll();
 		AbortBufferIO();
 		UnlockBuffers();
 		/* buffer pins are released here: */
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 963189d..a07a4c9 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -404,8 +404,6 @@ typedef struct
 typedef int InheritableSocket;
 #endif
 
-typedef struct LWLock LWLock;	/* ugly kluge */
-
 /*
  * Structure contains all variables passed to exec:ed backends
  */
@@ -426,7 +424,7 @@ typedef struct
 	slock_t    *ShmemLock;
 	VariableCache ShmemVariableCache;
 	Backend    *ShmemBackendArray;
-	LWLock	   *LWLockArray;
+	FlexLock   *FlexLockArray;
 	slock_t    *ProcStructLock;
 	PROC_HDR   *ProcGlobal;
 	PGPROC	   *AuxiliaryProcs;
@@ -4676,7 +4674,6 @@ MaxLivePostmasterChildren(void)
  * functions
  */
 extern slock_t *ShmemLock;
-extern LWLock *LWLockArray;
 extern slock_t *ProcStructLock;
 extern PGPROC *AuxiliaryProcs;
 extern PMSignalData *PMSignalState;
@@ -4721,7 +4718,7 @@ save_backend_variables(BackendParameters *param, Port *port,
 	param->ShmemVariableCache = ShmemVariableCache;
 	param->ShmemBackendArray = ShmemBackendArray;
 
-	param->LWLockArray = LWLockArray;
+	param->FlexLockArray = FlexLockArray;
 	param->ProcStructLock = ProcStructLock;
 	param->ProcGlobal = ProcGlobal;
 	param->AuxiliaryProcs = AuxiliaryProcs;
@@ -4945,7 +4942,7 @@ restore_backend_variables(BackendParameters *param, Port *port)
 	ShmemVariableCache = param->ShmemVariableCache;
 	ShmemBackendArray = param->ShmemBackendArray;
 
-	LWLockArray = param->LWLockArray;
+	FlexLockArray = param->FlexLockArray;
 	ProcStructLock = param->ProcStructLock;
 	ProcGlobal = param->ProcGlobal;
 	AuxiliaryProcs = param->AuxiliaryProcs;
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 157728e..587443d 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -167,9 +167,9 @@ WalWriterMain(void)
 		/*
 		 * These operations are really just a minimal subset of
 		 * AbortTransaction().	We don't have very many resources to worry
-		 * about in walwriter, but we do have LWLocks, and perhaps buffers?
+		 * about in walwriter, but we do have flex locks, and perhaps buffers?
 		 */
-		LWLockReleaseAll();
+		FlexLockReleaseAll();
 		AbortBufferIO();
 		UnlockBuffers();
 		/* buffer pins are released here: */
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 71fe8c6..4c4959c 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -141,7 +141,7 @@ PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
 	{
 		BufferTag	newTag;		/* identity of requested block */
 		uint32		newHash;	/* hash value for newTag */
-		LWLockId	newPartitionLock;	/* buffer partition lock for it */
+		FlexLockId	newPartitionLock;	/* buffer partition lock for it */
 		int			buf_id;
 
 		/* create a tag so we can lookup the buffer */
@@ -514,10 +514,10 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 {
 	BufferTag	newTag;			/* identity of requested block */
 	uint32		newHash;		/* hash value for newTag */
-	LWLockId	newPartitionLock;		/* buffer partition lock for it */
+	FlexLockId	newPartitionLock;		/* buffer partition lock for it */
 	BufferTag	oldTag;			/* previous identity of selected buffer */
 	uint32		oldHash;		/* hash value for oldTag */
-	LWLockId	oldPartitionLock;		/* buffer partition lock for it */
+	FlexLockId	oldPartitionLock;		/* buffer partition lock for it */
 	BufFlags	oldFlags;
 	int			buf_id;
 	volatile BufferDesc *buf;
@@ -857,7 +857,7 @@ InvalidateBuffer(volatile BufferDesc *buf)
 {
 	BufferTag	oldTag;
 	uint32		oldHash;		/* hash value for oldTag */
-	LWLockId	oldPartitionLock;		/* buffer partition lock for it */
+	FlexLockId	oldPartitionLock;		/* buffer partition lock for it */
 	BufFlags	oldFlags;
 
 	/* Save the original buffer tag before dropping the spinlock */
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index bb8b832..a2c570a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -113,7 +113,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, SUBTRANSShmemSize());
 		size = add_size(size, TwoPhaseShmemSize());
 		size = add_size(size, MultiXactShmemSize());
-		size = add_size(size, LWLockShmemSize());
+		size = add_size(size, FlexLockShmemSize());
 		size = add_size(size, ProcArrayShmemSize());
 		size = add_size(size, BackendStatusShmemSize());
 		size = add_size(size, SInvalShmemSize());
@@ -179,7 +179,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	 * needed for InitShmemIndex.
 	 */
 	if (!IsUnderPostmaster)
-		CreateLWLocks();
+		CreateFlexLocks();
 
 	/*
 	 * Set up shmem.c index hashtable
diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index e12a854..3730e51 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/storage/lmgr
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o predicate.o
+OBJS = flexlock.o lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o \
+	predicate.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/storage/lmgr/flexlock.c b/src/backend/storage/lmgr/flexlock.c
new file mode 100644
index 0000000..cf0004b
--- /dev/null
+++ b/src/backend/storage/lmgr/flexlock.c
@@ -0,0 +1,271 @@
+/*-------------------------------------------------------------------------
+ *
+ * flexlock.c
+ *	  Low-level routines for managing flex locks.
+ *
+ * Flex locks are intended primarily to provide mutual exclusion of access
+ * to shared-memory data structures.  Most, but not all, flex locks are
+ * lightweight locks (LWLocks).  This file contains support routines that
+ * are used for all types of flex locks, including lwlocks.  User-level
+ * locking should be done with the full lock manager --- which depends on
+ * LWLocks to protect its shared state.
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/lmgr/flexlock.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "pg_trace.h"
+#include "access/clog.h"
+#include "access/multixact.h"
+#include "access/subtrans.h"
+#include "commands/async.h"
+#include "storage/flexlock.h"
+#include "storage/flexlock_internals.h"
+#include "storage/predicate.h"
+#include "storage/spin.h"
+
+int	num_held_flexlocks = 0;
+FlexLockId held_flexlocks[MAX_SIMUL_FLEXLOCKS];
+
+static int	lock_addin_request = 0;
+static bool lock_addin_request_allowed = true;
+
+#ifdef LOCK_DEBUG
+bool		Trace_flexlocks = false;
+#endif
+
+/*
+ * This points to the array of FlexLocks in shared memory.  Backends inherit
+ * the pointer by fork from the postmaster (except in the EXEC_BACKEND case,
+ * where we have special measures to pass it down).
+ */
+FlexLockPadded *FlexLockArray = NULL;
+
+/* We use the ShmemLock spinlock to protect LWLockAssign */
+extern slock_t *ShmemLock;
+
+static void FlexLockInit(FlexLock *flex, char locktype);
+
+/*
+ * Compute number of FlexLocks to allocate.
+ */
+int
+NumFlexLocks(void)
+{
+	int			numLocks;
+
+	/*
+	 * Possibly this logic should be spread out among the affected modules,
+	 * the same way that shmem space estimation is done.  But for now, there
+	 * are few enough users of FlexLocks that we can get away with just keeping
+	 * the knowledge here.
+	 */
+
+	/* Predefined FlexLocks */
+	numLocks = (int) NumFixedFlexLocks;
+
+	/* bufmgr.c needs two for each shared buffer */
+	numLocks += 2 * NBuffers;
+
+	/* proc.c needs one for each backend or auxiliary process */
+	numLocks += MaxBackends + NUM_AUXILIARY_PROCS;
+
+	/* clog.c needs one per CLOG buffer */
+	numLocks += NUM_CLOG_BUFFERS;
+
+	/* subtrans.c needs one per SubTrans buffer */
+	numLocks += NUM_SUBTRANS_BUFFERS;
+
+	/* multixact.c needs two SLRU areas */
+	numLocks += NUM_MXACTOFFSET_BUFFERS + NUM_MXACTMEMBER_BUFFERS;
+
+	/* async.c needs one per Async buffer */
+	numLocks += NUM_ASYNC_BUFFERS;
+
+	/* predicate.c needs one per old serializable xid buffer */
+	numLocks += NUM_OLDSERXID_BUFFERS;
+
+	/*
+	 * Add any requested by loadable modules; for backwards-compatibility
+	 * reasons, allocate at least NUM_USER_DEFINED_FLEXLOCKS of them even if
+	 * there are no explicit requests.
+	 */
+	lock_addin_request_allowed = false;
+	numLocks += Max(lock_addin_request, NUM_USER_DEFINED_FLEXLOCKS);
+
+	return numLocks;
+}
+
+
+/*
+ * RequestAddinFlexLocks
+ *		Request that extra FlexLocks be allocated for use by
+ *		a loadable module.
+ *
+ * This is only useful if called from the _PG_init hook of a library that
+ * is loaded into the postmaster via shared_preload_libraries.	Once
+ * shared memory has been allocated, calls will be ignored.  (We could
+ * raise an error, but it seems better to make it a no-op, so that
+ * libraries containing such calls can be reloaded if needed.)
+ */
+void
+RequestAddinFlexLocks(int n)
+{
+	if (IsUnderPostmaster || !lock_addin_request_allowed)
+		return;					/* too late */
+	lock_addin_request += n;
+}
+
+
+/*
+ * Compute shmem space needed for FlexLocks.
+ */
+Size
+FlexLockShmemSize(void)
+{
+	Size		size;
+	int			numLocks = NumFlexLocks();
+
+	/* Space for the FlexLock array. */
+	size = mul_size(numLocks, FLEX_LOCK_BYTES);
+
+	/* Space for dynamic allocation counter, plus room for alignment. */
+	size = add_size(size, 2 * sizeof(int) + FLEX_LOCK_BYTES);
+
+	return size;
+}
+
+/*
+ * Allocate shmem space for FlexLocks and initialize the locks.
+ */
+void
+CreateFlexLocks(void)
+{
+	int			numLocks = NumFlexLocks();
+	Size		spaceLocks = FlexLockShmemSize();
+	FlexLockPadded *lock;
+	int		   *FlexLockCounter;
+	char	   *ptr;
+	int			id;
+
+	/* Allocate and zero space */
+	ptr = (char *) ShmemAlloc(spaceLocks);
+	memset(ptr, 0, spaceLocks);
+
+	/* Leave room for dynamic allocation counter */
+	ptr += 2 * sizeof(int);
+
+	/* Ensure desired alignment of FlexLock array */
+	ptr += FLEX_LOCK_BYTES - ((uintptr_t) ptr) % FLEX_LOCK_BYTES;
+
+	FlexLockArray = (FlexLockPadded *) ptr;
+
+	/* All of the "fixed" FlexLocks are LWLocks. */
+	for (id = 0, lock = FlexLockArray; id < NumFixedFlexLocks; id++, lock++)
+		FlexLockInit(&lock->flex, FLEXLOCK_TYPE_LWLOCK);
+
+	/*
+	 * Initialize the dynamic-allocation counter, which is stored just before
+	 * the first FlexLock.
+	 */
+	FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+	FlexLockCounter[0] = (int) NumFixedFlexLocks;
+	FlexLockCounter[1] = numLocks;
+}
+
+/*
+ * FlexLockAssign - assign a dynamically-allocated FlexLock number
+ *
+ * We interlock this using the same spinlock that is used to protect
+ * ShmemAlloc().  Interlocking is not really necessary during postmaster
+ * startup, but it is needed if any user-defined code tries to allocate
+ * LWLocks after startup.
+ */
+FlexLockId
+FlexLockAssign(char locktype)
+{
+	FlexLockId	result;
+
+	/* use volatile pointer to prevent code rearrangement */
+	volatile int *FlexLockCounter;
+
+	FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+	SpinLockAcquire(ShmemLock);
+	if (FlexLockCounter[0] >= FlexLockCounter[1])
+	{
+		SpinLockRelease(ShmemLock);
+		elog(ERROR, "no more FlexLockIds available");
+	}
+	result = (FlexLockId) (FlexLockCounter[0]++);
+	SpinLockRelease(ShmemLock);
+
+	FlexLockInit(&FlexLockArray[result].flex, locktype);
+
+	return result;
+}
+
+/*
+ * Initialize a FlexLock.
+ */
+static void
+FlexLockInit(FlexLock *flex, char locktype)
+{
+	SpinLockInit(&flex->mutex);
+	flex->releaseOK = true;
+	flex->locktype = locktype;
+	/*
+	 * We might need to think a little harder about what should happen here
+	 * if some future type of FlexLock requires more initialization than this.
+	 * For now, this will suffice.
+	 */
+}
+
+/*
+ * FlexLockReleaseAll - release all currently-held locks
+ *
+ * Used to clean up after ereport(ERROR). An important difference between this
+ * function and retail LWLockRelease calls is that InterruptHoldoffCount is
+ * unchanged by this operation.  This is necessary since InterruptHoldoffCount
+ * has been set to an appropriate level earlier in error recovery. We could
+ * decrement it below zero if we allow it to drop for each released lock!
+ */
+void
+FlexLockReleaseAll(void)
+{
+	while (num_held_flexlocks > 0)
+	{
+		HOLD_INTERRUPTS();		/* match the upcoming RESUME_INTERRUPTS */
+
+		/*
+		 * FLEXTODO: When we have multiple types of flex locks, this will
+		 * need to call the appropriate release function for each lock type.
+		 */
+		LWLockRelease(held_flexlocks[num_held_flexlocks - 1]);
+	}
+}
+
+/*
+ * FlexLockHeldByMe - test whether my process currently holds a lock
+ *
+ * This is meant as debug support only.  We do not consider the lock mode.
+ */
+bool
+FlexLockHeldByMe(FlexLockId id)
+{
+	int			i;
+
+	for (i = 0; i < num_held_flexlocks; i++)
+	{
+		if (held_flexlocks[i] == id)
+			return true;
+	}
+	return false;
+}
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 3ba4671..f594983 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -591,7 +591,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	bool		found;
 	ResourceOwner owner;
 	uint32		hashcode;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	int			status;
 	bool		log_lock = false;
 
@@ -1546,7 +1546,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	LOCALLOCK  *locallock;
 	LOCK	   *lock;
 	PROCLOCK   *proclock;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	bool		wakeupNeeded;
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
@@ -1912,7 +1912,7 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 	 */
 	for (partition = 0; partition < NUM_LOCK_PARTITIONS; partition++)
 	{
-		LWLockId	partitionLock = FirstLockMgrLock + partition;
+		FlexLockId	partitionLock = FirstLockMgrLock + partition;
 		SHM_QUEUE  *procLocks = &(MyProc->myProcLocks[partition]);
 
 		proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks,
@@ -2197,7 +2197,7 @@ static bool
 FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag,
 					  uint32 hashcode)
 {
-	LWLockId		partitionLock = LockHashPartitionLock(hashcode);
+	FlexLockId		partitionLock = LockHashPartitionLock(hashcode);
 	Oid				relid = locktag->locktag_field2;
 	uint32			i;
 
@@ -2281,7 +2281,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	LockMethod		lockMethodTable = LockMethods[DEFAULT_LOCKMETHOD];
 	LOCKTAG		   *locktag = &locallock->tag.lock;
 	PROCLOCK	   *proclock = NULL;
-	LWLockId		partitionLock = LockHashPartitionLock(locallock->hashcode);
+	FlexLockId		partitionLock = LockHashPartitionLock(locallock->hashcode);
 	Oid				relid = locktag->locktag_field2;
 	uint32			f;
 
@@ -2382,7 +2382,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
 	SHM_QUEUE  *procLocks;
 	PROCLOCK   *proclock;
 	uint32		hashcode;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	int			count = 0;
 	int			fast_count = 0;
 
@@ -2593,7 +2593,7 @@ LockRefindAndRelease(LockMethod lockMethodTable, PGPROC *proc,
 	PROCLOCKTAG proclocktag;
 	uint32		hashcode;
 	uint32		proclock_hashcode;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	bool		wakeupNeeded;
 
 	hashcode = LockTagHashCode(locktag);
@@ -2827,7 +2827,7 @@ PostPrepare_Locks(TransactionId xid)
 	 */
 	for (partition = 0; partition < NUM_LOCK_PARTITIONS; partition++)
 	{
-		LWLockId	partitionLock = FirstLockMgrLock + partition;
+		FlexLockId	partitionLock = FirstLockMgrLock + partition;
 		SHM_QUEUE  *procLocks = &(MyProc->myProcLocks[partition]);
 
 		proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks,
@@ -3343,7 +3343,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 	uint32		hashcode;
 	uint32		proclock_hashcode;
 	int			partition;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	LockMethod	lockMethodTable;
 
 	Assert(len == sizeof(TwoPhaseLockRecord));
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 079eb29..bdb5f6e 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -21,74 +21,23 @@
  */
 #include "postgres.h"
 
-#include "access/clog.h"
-#include "access/multixact.h"
-#include "access/subtrans.h"
-#include "commands/async.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "storage/flexlock_internals.h"
 #include "storage/ipc.h"
-#include "storage/predicate.h"
 #include "storage/proc.h"
 #include "storage/spin.h"
 
-
-/* We use the ShmemLock spinlock to protect LWLockAssign */
-extern slock_t *ShmemLock;
-
-
 typedef struct LWLock
 {
-	slock_t		mutex;			/* Protects LWLock and queue of PGPROCs */
-	bool		releaseOK;		/* T if ok to release waiters */
+	FlexLock	flex;			/* common FlexLock infrastructure */
 	char		exclusive;		/* # of exclusive holders (0 or 1) */
 	int			shared;			/* # of shared holders (0..MaxBackends) */
-	PGPROC	   *head;			/* head of list of waiting PGPROCs */
-	PGPROC	   *tail;			/* tail of list of waiting PGPROCs */
-	/* tail is undefined when head is NULL */
 } LWLock;
 
-/*
- * All the LWLock structs are allocated as an array in shared memory.
- * (LWLockIds are indexes into the array.)	We force the array stride to
- * be a power of 2, which saves a few cycles in indexing, but more
- * importantly also ensures that individual LWLocks don't cross cache line
- * boundaries.	This reduces cache contention problems, especially on AMD
- * Opterons.  (Of course, we have to also ensure that the array start
- * address is suitably aligned.)
- *
- * LWLock is between 16 and 32 bytes on all known platforms, so these two
- * cases are sufficient.
- */
-#define LWLOCK_PADDED_SIZE	(sizeof(LWLock) <= 16 ? 16 : 32)
-
-typedef union LWLockPadded
-{
-	LWLock		lock;
-	char		pad[LWLOCK_PADDED_SIZE];
-} LWLockPadded;
-
-/*
- * This points to the array of LWLocks in shared memory.  Backends inherit
- * the pointer by fork from the postmaster (except in the EXEC_BACKEND case,
- * where we have special measures to pass it down).
- */
-NON_EXEC_STATIC LWLockPadded *LWLockArray = NULL;
-
-
-/*
- * We use this structure to keep track of locked LWLocks for release
- * during error recovery.  The maximum size could be determined at runtime
- * if necessary, but it seems unlikely that more than a few locks could
- * ever be held simultaneously.
- */
-#define MAX_SIMUL_LWLOCKS	100
-
-static int	num_held_lwlocks = 0;
-static LWLockId held_lwlocks[MAX_SIMUL_LWLOCKS];
-
-static int	lock_addin_request = 0;
-static bool lock_addin_request_allowed = true;
+#define	LWLockPointer(lockid) \
+	(AssertMacro(FlexLockArray[lockid].flex.locktype == FLEXLOCK_TYPE_LWLOCK), \
+	 (volatile LWLock *) &FlexLockArray[lockid])
 
 #ifdef LWLOCK_STATS
 static int	counts_for_pid = 0;
@@ -98,27 +47,17 @@ static int *block_counts;
 #endif
 
 #ifdef LOCK_DEBUG
-bool		Trace_lwlocks = false;
-
 inline static void
-PRINT_LWDEBUG(const char *where, LWLockId lockid, const volatile LWLock *lock)
+PRINT_LWDEBUG(const char *where, FlexLockId lockid, const volatile LWLock *lock)
 {
-	if (Trace_lwlocks)
+	if (Trace_flexlocks)
 		elog(LOG, "%s(%d): excl %d shared %d head %p rOK %d",
 			 where, (int) lockid,
-			 (int) lock->exclusive, lock->shared, lock->head,
-			 (int) lock->releaseOK);
-}
-
-inline static void
-LOG_LWDEBUG(const char *where, LWLockId lockid, const char *msg)
-{
-	if (Trace_lwlocks)
-		elog(LOG, "%s(%d): %s", where, (int) lockid, msg);
+			 (int) lock->exclusive, lock->shared, lock->flex.head,
+			 (int) lock->flex.releaseOK);
 }
 #else							/* not LOCK_DEBUG */
 #define PRINT_LWDEBUG(a,b,c)
-#define LOG_LWDEBUG(a,b,c)
 #endif   /* LOCK_DEBUG */
 
 #ifdef LWLOCK_STATS
@@ -127,8 +66,8 @@ static void
 print_lwlock_stats(int code, Datum arg)
 {
 	int			i;
-	int		   *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	int			numLocks = LWLockCounter[1];
+	int		   *FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+	int			numLocks = FlexLockCounter[1];
 
 	/* Grab an LWLock to keep different backends from mixing reports */
 	LWLockAcquire(0, LW_EXCLUSIVE);
@@ -145,173 +84,15 @@ print_lwlock_stats(int code, Datum arg)
 }
 #endif   /* LWLOCK_STATS */
 
-
 /*
- * Compute number of LWLocks to allocate.
+ * LWLockAssign - initialize a new lwlock and return its ID
  */
-int
-NumLWLocks(void)
-{
-	int			numLocks;
-
-	/*
-	 * Possibly this logic should be spread out among the affected modules,
-	 * the same way that shmem space estimation is done.  But for now, there
-	 * are few enough users of LWLocks that we can get away with just keeping
-	 * the knowledge here.
-	 */
-
-	/* Predefined LWLocks */
-	numLocks = (int) NumFixedLWLocks;
-
-	/* bufmgr.c needs two for each shared buffer */
-	numLocks += 2 * NBuffers;
-
-	/* proc.c needs one for each backend or auxiliary process */
-	numLocks += MaxBackends + NUM_AUXILIARY_PROCS;
-
-	/* clog.c needs one per CLOG buffer */
-	numLocks += NUM_CLOG_BUFFERS;
-
-	/* subtrans.c needs one per SubTrans buffer */
-	numLocks += NUM_SUBTRANS_BUFFERS;
-
-	/* multixact.c needs two SLRU areas */
-	numLocks += NUM_MXACTOFFSET_BUFFERS + NUM_MXACTMEMBER_BUFFERS;
-
-	/* async.c needs one per Async buffer */
-	numLocks += NUM_ASYNC_BUFFERS;
-
-	/* predicate.c needs one per old serializable xid buffer */
-	numLocks += NUM_OLDSERXID_BUFFERS;
-
-	/*
-	 * Add any requested by loadable modules; for backwards-compatibility
-	 * reasons, allocate at least NUM_USER_DEFINED_LWLOCKS of them even if
-	 * there are no explicit requests.
-	 */
-	lock_addin_request_allowed = false;
-	numLocks += Max(lock_addin_request, NUM_USER_DEFINED_LWLOCKS);
-
-	return numLocks;
-}
-
-
-/*
- * RequestAddinLWLocks
- *		Request that extra LWLocks be allocated for use by
- *		a loadable module.
- *
- * This is only useful if called from the _PG_init hook of a library that
- * is loaded into the postmaster via shared_preload_libraries.	Once
- * shared memory has been allocated, calls will be ignored.  (We could
- * raise an error, but it seems better to make it a no-op, so that
- * libraries containing such calls can be reloaded if needed.)
- */
-void
-RequestAddinLWLocks(int n)
-{
-	if (IsUnderPostmaster || !lock_addin_request_allowed)
-		return;					/* too late */
-	lock_addin_request += n;
-}
-
-
-/*
- * Compute shmem space needed for LWLocks.
- */
-Size
-LWLockShmemSize(void)
-{
-	Size		size;
-	int			numLocks = NumLWLocks();
-
-	/* Space for the LWLock array. */
-	size = mul_size(numLocks, sizeof(LWLockPadded));
-
-	/* Space for dynamic allocation counter, plus room for alignment. */
-	size = add_size(size, 2 * sizeof(int) + LWLOCK_PADDED_SIZE);
-
-	return size;
-}
-
-
-/*
- * Allocate shmem space for LWLocks and initialize the locks.
- */
-void
-CreateLWLocks(void)
-{
-	int			numLocks = NumLWLocks();
-	Size		spaceLocks = LWLockShmemSize();
-	LWLockPadded *lock;
-	int		   *LWLockCounter;
-	char	   *ptr;
-	int			id;
-
-	/* Allocate space */
-	ptr = (char *) ShmemAlloc(spaceLocks);
-
-	/* Leave room for dynamic allocation counter */
-	ptr += 2 * sizeof(int);
-
-	/* Ensure desired alignment of LWLock array */
-	ptr += LWLOCK_PADDED_SIZE - ((uintptr_t) ptr) % LWLOCK_PADDED_SIZE;
-
-	LWLockArray = (LWLockPadded *) ptr;
-
-	/*
-	 * Initialize all LWLocks to "unlocked" state
-	 */
-	for (id = 0, lock = LWLockArray; id < numLocks; id++, lock++)
-	{
-		SpinLockInit(&lock->lock.mutex);
-		lock->lock.releaseOK = true;
-		lock->lock.exclusive = 0;
-		lock->lock.shared = 0;
-		lock->lock.head = NULL;
-		lock->lock.tail = NULL;
-	}
-
-	/*
-	 * Initialize the dynamic-allocation counter, which is stored just before
-	 * the first LWLock.
-	 */
-	LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	LWLockCounter[0] = (int) NumFixedLWLocks;
-	LWLockCounter[1] = numLocks;
-}
-
-
-/*
- * LWLockAssign - assign a dynamically-allocated LWLock number
- *
- * We interlock this using the same spinlock that is used to protect
- * ShmemAlloc().  Interlocking is not really necessary during postmaster
- * startup, but it is needed if any user-defined code tries to allocate
- * LWLocks after startup.
- */
-LWLockId
+FlexLockId
 LWLockAssign(void)
 {
-	LWLockId	result;
-
-	/* use volatile pointer to prevent code rearrangement */
-	volatile int *LWLockCounter;
-
-	LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	SpinLockAcquire(ShmemLock);
-	if (LWLockCounter[0] >= LWLockCounter[1])
-	{
-		SpinLockRelease(ShmemLock);
-		elog(ERROR, "no more LWLockIds available");
-	}
-	result = (LWLockId) (LWLockCounter[0]++);
-	SpinLockRelease(ShmemLock);
-	return result;
+	return FlexLockAssign(FLEXLOCK_TYPE_LWLOCK);
 }
 
-
 /*
  * LWLockAcquire - acquire a lightweight lock in the specified mode
  *
@@ -320,9 +101,9 @@ LWLockAssign(void)
  * Side effect: cancel/die interrupts are held off until lock release.
  */
 void
-LWLockAcquire(LWLockId lockid, LWLockMode mode)
+LWLockAcquire(FlexLockId lockid, LWLockMode mode)
 {
-	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	volatile LWLock *lock = LWLockPointer(lockid);
 	PGPROC	   *proc = MyProc;
 	bool		retry = false;
 	int			extraWaits = 0;
@@ -333,8 +114,8 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 	/* Set up local count state first time through in a given process */
 	if (counts_for_pid != MyProcPid)
 	{
-		int		   *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-		int			numLocks = LWLockCounter[1];
+		int		   *FlexLockCounter = (int *) ((char *) FlexLockArray - 2 * sizeof(int));
+		int			numLocks = FlexLockCounter[1];
 
 		sh_acquire_counts = calloc(numLocks, sizeof(int));
 		ex_acquire_counts = calloc(numLocks, sizeof(int));
@@ -356,10 +137,6 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 	 */
 	Assert(!(proc == NULL && IsUnderPostmaster));
 
-	/* Ensure we will have room to remember the lock */
-	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
-		elog(ERROR, "too many LWLocks taken");
-
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -388,11 +165,11 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 		bool		mustwait;
 
 		/* Acquire mutex.  Time spent holding mutex should be short! */
-		SpinLockAcquire(&lock->mutex);
+		SpinLockAcquire(&lock->flex.mutex);
 
 		/* If retrying, allow LWLockRelease to release waiters again */
 		if (retry)
-			lock->releaseOK = true;
+			lock->flex.releaseOK = true;
 
 		/* If I can get the lock, do so quickly. */
 		if (mode == LW_EXCLUSIVE)
@@ -419,72 +196,30 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 		if (!mustwait)
 			break;				/* got the lock */
 
-		/*
-		 * Add myself to wait queue.
-		 *
-		 * If we don't have a PGPROC structure, there's no way to wait. This
-		 * should never occur, since MyProc should only be null during shared
-		 * memory initialization.
-		 */
-		if (proc == NULL)
-			elog(PANIC, "cannot wait without a PGPROC structure");
-
-		proc->lwWaiting = true;
-		proc->lwExclusive = (mode == LW_EXCLUSIVE);
-		proc->lwWaitLink = NULL;
-		if (lock->head == NULL)
-			lock->head = proc;
-		else
-			lock->tail->lwWaitLink = proc;
-		lock->tail = proc;
+		/* Add myself to wait queue. */
+		FlexLockJoinWaitQueue(lock, (int) mode);
 
 		/* Can release the mutex now */
-		SpinLockRelease(&lock->mutex);
-
-		/*
-		 * Wait until awakened.
-		 *
-		 * Since we share the process wait semaphore with the regular lock
-		 * manager and ProcWaitForSignal, and we may need to acquire an LWLock
-		 * while one of those is pending, it is possible that we get awakened
-		 * for a reason other than being signaled by LWLockRelease. If so,
-		 * loop back and wait again.  Once we've gotten the LWLock,
-		 * re-increment the sema by the number of additional signals received,
-		 * so that the lock manager or signal manager will see the received
-		 * signal when it next waits.
-		 */
-		LOG_LWDEBUG("LWLockAcquire", lockid, "waiting");
+		SpinLockRelease(&lock->flex.mutex);
+
+		/* Wait until awakened. */
+		FlexLockWait(lockid, mode, extraWaits);
 
 #ifdef LWLOCK_STATS
 		block_counts[lockid]++;
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);
-
-		for (;;)
-		{
-			/* "false" means cannot accept cancel/die interrupt here. */
-			PGSemaphoreLock(&proc->sem, false);
-			if (!proc->lwWaiting)
-				break;
-			extraWaits++;
-		}
-
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode);
-
-		LOG_LWDEBUG("LWLockAcquire", lockid, "awakened");
-
 		/* Now loop back and try to acquire lock again. */
 		retry = true;
 	}
 
 	/* We are done updating shared state of the lock itself. */
-	SpinLockRelease(&lock->mutex);
+	SpinLockRelease(&lock->flex.mutex);
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(lockid, mode);
+	TRACE_POSTGRESQL_FLEXLOCK_ACQUIRE(lockid, mode);
 
 	/* Add lock to list of locks held by this backend */
-	held_lwlocks[num_held_lwlocks++] = lockid;
+	FlexLockRemember(lockid);
 
 	/*
 	 * Fix the process wait semaphore's count for any absorbed wakeups.
@@ -501,17 +236,13 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
  * If successful, cancel/die interrupts are held off until lock release.
  */
 bool
-LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
+LWLockConditionalAcquire(FlexLockId lockid, LWLockMode mode)
 {
-	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	volatile LWLock *lock = LWLockPointer(lockid);
 	bool		mustwait;
 
 	PRINT_LWDEBUG("LWLockConditionalAcquire", lockid, lock);
 
-	/* Ensure we will have room to remember the lock */
-	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
-		elog(ERROR, "too many LWLocks taken");
-
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -520,7 +251,7 @@ LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
 	HOLD_INTERRUPTS();
 
 	/* Acquire mutex.  Time spent holding mutex should be short! */
-	SpinLockAcquire(&lock->mutex);
+	SpinLockAcquire(&lock->flex.mutex);
 
 	/* If I can get the lock, do so quickly. */
 	if (mode == LW_EXCLUSIVE)
@@ -545,20 +276,20 @@ LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
 	}
 
 	/* We are done updating shared state of the lock itself. */
-	SpinLockRelease(&lock->mutex);
+	SpinLockRelease(&lock->flex.mutex);
 
 	if (mustwait)
 	{
 		/* Failed to get lock, so release interrupt holdoff */
 		RESUME_INTERRUPTS();
-		LOG_LWDEBUG("LWLockConditionalAcquire", lockid, "failed");
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(lockid, mode);
+		FlexLockDebug("LWLockConditionalAcquire", lockid, "failed");
+		TRACE_POSTGRESQL_FLEXLOCK_CONDACQUIRE_FAIL(lockid, mode);
 	}
 	else
 	{
 		/* Add lock to list of locks held by this backend */
-		held_lwlocks[num_held_lwlocks++] = lockid;
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(lockid, mode);
+		FlexLockRemember(lockid);
+		TRACE_POSTGRESQL_FLEXLOCK_CONDACQUIRE(lockid, mode);
 	}
 
 	return !mustwait;
@@ -568,32 +299,18 @@ LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
  * LWLockRelease - release a previously acquired lock
  */
 void
-LWLockRelease(LWLockId lockid)
+LWLockRelease(FlexLockId lockid)
 {
-	volatile LWLock *lock = &(LWLockArray[lockid].lock);
+	volatile LWLock *lock = LWLockPointer(lockid);
 	PGPROC	   *head;
 	PGPROC	   *proc;
-	int			i;
 
 	PRINT_LWDEBUG("LWLockRelease", lockid, lock);
 
-	/*
-	 * Remove lock from list of locks held.  Usually, but not always, it will
-	 * be the latest-acquired lock; so search array backwards.
-	 */
-	for (i = num_held_lwlocks; --i >= 0;)
-	{
-		if (lockid == held_lwlocks[i])
-			break;
-	}
-	if (i < 0)
-		elog(ERROR, "lock %d is not held", (int) lockid);
-	num_held_lwlocks--;
-	for (; i < num_held_lwlocks; i++)
-		held_lwlocks[i] = held_lwlocks[i + 1];
+	FlexLockForget(lockid);
 
 	/* Acquire mutex.  Time spent holding mutex should be short! */
-	SpinLockAcquire(&lock->mutex);
+	SpinLockAcquire(&lock->flex.mutex);
 
 	/* Release my hold on lock */
 	if (lock->exclusive > 0)
@@ -610,10 +327,10 @@ LWLockRelease(LWLockId lockid)
 	 * if someone has already awakened waiters that haven't yet acquired the
 	 * lock.
 	 */
-	head = lock->head;
+	head = lock->flex.head;
 	if (head != NULL)
 	{
-		if (lock->exclusive == 0 && lock->shared == 0 && lock->releaseOK)
+		if (lock->exclusive == 0 && lock->shared == 0 && lock->flex.releaseOK)
 		{
 			/*
 			 * Remove the to-be-awakened PGPROCs from the queue.  If the front
@@ -621,17 +338,17 @@ LWLockRelease(LWLockId lockid)
 			 * as many waiters as want shared access.
 			 */
 			proc = head;
-			if (!proc->lwExclusive)
+			if (proc->flWaitMode != LW_EXCLUSIVE)
 			{
-				while (proc->lwWaitLink != NULL &&
-					   !proc->lwWaitLink->lwExclusive)
-					proc = proc->lwWaitLink;
+				while (proc->flWaitLink != NULL &&
+					   proc->flWaitLink->flWaitMode != LW_EXCLUSIVE)
+					proc = proc->flWaitLink;
 			}
 			/* proc is now the last PGPROC to be released */
-			lock->head = proc->lwWaitLink;
-			proc->lwWaitLink = NULL;
+			lock->flex.head = proc->flWaitLink;
+			proc->flWaitLink = NULL;
 			/* prevent additional wakeups until retryer gets to run */
-			lock->releaseOK = false;
+			lock->flex.releaseOK = false;
 		}
 		else
 		{
@@ -641,20 +358,20 @@ LWLockRelease(LWLockId lockid)
 	}
 
 	/* We are done updating shared state of the lock itself. */
-	SpinLockRelease(&lock->mutex);
+	SpinLockRelease(&lock->flex.mutex);
 
-	TRACE_POSTGRESQL_LWLOCK_RELEASE(lockid);
+	TRACE_POSTGRESQL_FLEXLOCK_RELEASE(lockid);
 
 	/*
 	 * Awaken any waiters I removed from the queue.
 	 */
 	while (head != NULL)
 	{
-		LOG_LWDEBUG("LWLockRelease", lockid, "release waiter");
+		FlexLockDebug("LWLockRelease", lockid, "release waiter");
 		proc = head;
-		head = proc->lwWaitLink;
-		proc->lwWaitLink = NULL;
-		proc->lwWaiting = false;
+		head = proc->flWaitLink;
+		proc->flWaitLink = NULL;
+		proc->flWaitResult = 1;		/* any non-zero value will do */
 		PGSemaphoreUnlock(&proc->sem);
 	}
 
@@ -664,43 +381,17 @@ LWLockRelease(LWLockId lockid)
 	RESUME_INTERRUPTS();
 }
 
-
-/*
- * LWLockReleaseAll - release all currently-held locks
- *
- * Used to clean up after ereport(ERROR). An important difference between this
- * function and retail LWLockRelease calls is that InterruptHoldoffCount is
- * unchanged by this operation.  This is necessary since InterruptHoldoffCount
- * has been set to an appropriate level earlier in error recovery. We could
- * decrement it below zero if we allow it to drop for each released lock!
- */
-void
-LWLockReleaseAll(void)
-{
-	while (num_held_lwlocks > 0)
-	{
-		HOLD_INTERRUPTS();		/* match the upcoming RESUME_INTERRUPTS */
-
-		LWLockRelease(held_lwlocks[num_held_lwlocks - 1]);
-	}
-}
-
-
 /*
  * LWLockHeldByMe - test whether my process currently holds a lock
  *
- * This is meant as debug support only.  We do not distinguish whether the
- * lock is held shared or exclusive.
+ * The following convenience routine might not be worthwhile but for the fact
+ * that we've had a function by this name since long before FlexLocks existed.
+ * Callers who want to check whether an arbitrary FlexLock (that may or may not
+ * be an LWLock) is held can use FlexLockHeldByMe directly.
  */
 bool
-LWLockHeldByMe(LWLockId lockid)
+LWLockHeldByMe(FlexLockId lockid)
 {
-	int			i;
-
-	for (i = 0; i < num_held_lwlocks; i++)
-	{
-		if (held_lwlocks[i] == lockid)
-			return true;
-	}
-	return false;
+	AssertMacro(FlexLockArray[lockid].flex.locktype == FLEXLOCK_TYPE_LWLOCK);
+	return FlexLockHeldByMe(lockid);
 }
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 345f6f5..15978a4 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -239,7 +239,7 @@
 #define PredicateLockHashPartition(hashcode) \
 	((hashcode) % NUM_PREDICATELOCK_PARTITIONS)
 #define PredicateLockHashPartitionLock(hashcode) \
-	((LWLockId) (FirstPredicateLockMgrLock + PredicateLockHashPartition(hashcode)))
+	((FlexLockId) (FirstPredicateLockMgrLock + PredicateLockHashPartition(hashcode)))
 
 #define NPREDICATELOCKTARGETENTS() \
 	mul_size(max_predicate_locks_per_xact, add_size(MaxBackends, max_prepared_xacts))
@@ -1840,7 +1840,7 @@ PageIsPredicateLocked(Relation relation, BlockNumber blkno)
 {
 	PREDICATELOCKTARGETTAG targettag;
 	uint32		targettaghash;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	PREDICATELOCKTARGET *target;
 
 	SET_PREDICATELOCKTARGETTAG_PAGE(targettag,
@@ -2073,7 +2073,7 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
 		if (TargetTagIsCoveredBy(oldtargettag, *newtargettag))
 		{
 			uint32		oldtargettaghash;
-			LWLockId	partitionLock;
+			FlexLockId	partitionLock;
 			PREDICATELOCK *rmpredlock;
 
 			oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
@@ -2285,7 +2285,7 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 	PREDICATELOCKTARGET *target;
 	PREDICATELOCKTAG locktag;
 	PREDICATELOCK *lock;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	bool		found;
 
 	partitionLock = PredicateLockHashPartitionLock(targettaghash);
@@ -2586,10 +2586,10 @@ TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
 								  bool removeOld)
 {
 	uint32		oldtargettaghash;
-	LWLockId	oldpartitionLock;
+	FlexLockId	oldpartitionLock;
 	PREDICATELOCKTARGET *oldtarget;
 	uint32		newtargettaghash;
-	LWLockId	newpartitionLock;
+	FlexLockId	newpartitionLock;
 	bool		found;
 	bool		outOfShmem = false;
 
@@ -3578,7 +3578,7 @@ ClearOldPredicateLocks(void)
 			PREDICATELOCKTARGET *target;
 			PREDICATELOCKTARGETTAG targettag;
 			uint32		targettaghash;
-			LWLockId	partitionLock;
+			FlexLockId	partitionLock;
 
 			tag = predlock->tag;
 			target = tag.myTarget;
@@ -3656,7 +3656,7 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 		PREDICATELOCKTARGET *target;
 		PREDICATELOCKTARGETTAG targettag;
 		uint32		targettaghash;
-		LWLockId	partitionLock;
+		FlexLockId	partitionLock;
 
 		nextpredlock = (PREDICATELOCK *)
 			SHMQueueNext(&(sxact->predicateLocks),
@@ -4034,7 +4034,7 @@ static void
 CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 {
 	uint32		targettaghash;
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 	PREDICATELOCKTARGET *target;
 	PREDICATELOCK *predlock;
 	PREDICATELOCK *mypredlock = NULL;
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index bcbc802..b402999 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -360,9 +360,9 @@ InitProcess(void)
 	/* NB -- autovac launcher intentionally does not set IS_AUTOVACUUM */
 	if (IsAutoVacuumWorkerProcess())
 		MyPgXact->vacuumFlags |= PROC_IS_AUTOVACUUM;
-	MyProc->lwWaiting = false;
-	MyProc->lwExclusive = false;
-	MyProc->lwWaitLink = NULL;
+	MyProc->flWaitResult = 0;
+	MyProc->flWaitMode = 0;
+	MyProc->flWaitLink = NULL;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
 #ifdef USE_ASSERT_CHECKING
@@ -515,9 +515,9 @@ InitAuxiliaryProcess(void)
 	MyProc->roleId = InvalidOid;
 	MyPgXact->inCommit = false;
 	MyPgXact->vacuumFlags = 0;
-	MyProc->lwWaiting = false;
-	MyProc->lwExclusive = false;
-	MyProc->lwWaitLink = NULL;
+	MyProc->flWaitMode = 0;
+	MyProc->flWaitResult = 0;
+	MyProc->flWaitLink = NULL;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
 #ifdef USE_ASSERT_CHECKING
@@ -643,7 +643,7 @@ IsWaitingForLock(void)
 void
 LockWaitCancel(void)
 {
-	LWLockId	partitionLock;
+	FlexLockId	partitionLock;
 
 	/* Nothing to do if we weren't waiting for a lock */
 	if (lockAwaited == NULL)
@@ -754,11 +754,11 @@ ProcKill(int code, Datum arg)
 #endif
 
 	/*
-	 * Release any LW locks I am holding.  There really shouldn't be any, but
-	 * it's cheap to check again before we cut the knees off the LWLock
+	 * Release any flex locks I am holding.  There really shouldn't be any, but
+	 * it's cheap to check again before we cut the knees off the flex lock
 	 * facility by releasing our PGPROC ...
 	 */
-	LWLockReleaseAll();
+	FlexLockReleaseAll();
 
 	/* Release ownership of the process's latch, too */
 	DisownLatch(&MyProc->procLatch);
@@ -815,8 +815,8 @@ AuxiliaryProcKill(int code, Datum arg)
 
 	Assert(MyProc == auxproc);
 
-	/* Release any LW locks I am holding (see notes above) */
-	LWLockReleaseAll();
+	/* Release any flex locks I am holding (see notes above) */
+	FlexLockReleaseAll();
 
 	/* Release ownership of the process's latch, too */
 	DisownLatch(&MyProc->procLatch);
@@ -901,7 +901,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 	LOCK	   *lock = locallock->lock;
 	PROCLOCK   *proclock = locallock->proclock;
 	uint32		hashcode = locallock->hashcode;
-	LWLockId	partitionLock = LockHashPartitionLock(hashcode);
+	FlexLockId	partitionLock = LockHashPartitionLock(hashcode);
 	PROC_QUEUE *waitQueue = &(lock->waitProcs);
 	LOCKMASK	myHeldLocks = MyProc->heldLocks;
 	bool		early_deadlock = false;
diff --git a/src/backend/utils/misc/check_guc b/src/backend/utils/misc/check_guc
index 293fb03..1a19e36 100755
--- a/src/backend/utils/misc/check_guc
+++ b/src/backend/utils/misc/check_guc
@@ -19,7 +19,7 @@
 INTENTIONALLY_NOT_INCLUDED="autocommit debug_deadlocks \
 is_superuser lc_collate lc_ctype lc_messages lc_monetary lc_numeric lc_time \
 pre_auth_delay role seed server_encoding server_version server_version_int \
-session_authorization trace_lock_oidmin trace_lock_table trace_locks trace_lwlocks \
+session_authorization trace_lock_oidmin trace_lock_table trace_locks trace_flexlocks \
 trace_notify trace_userlocks transaction_isolation transaction_read_only \
 zero_damaged_pages"
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index da7b6d4..52de233 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -59,6 +59,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/flexlock_internals.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
 #include "storage/predicate.h"
@@ -1071,12 +1072,12 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 	{
-		{"trace_lwlocks", PGC_SUSET, DEVELOPER_OPTIONS,
+		{"trace_flexlocks", PGC_SUSET, DEVELOPER_OPTIONS,
 			gettext_noop("No description available."),
 			NULL,
 			GUC_NOT_IN_SAMPLE
 		},
-		&Trace_lwlocks,
+		&Trace_flexlocks,
 		false,
 		NULL, NULL, NULL
 	},
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 71c5ab0..5b9cfe6 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -15,8 +15,8 @@
  * in probe definitions, as they cause compilation errors on Mac OS X 10.5.
  */
 #define LocalTransactionId unsigned int
-#define LWLockId int
-#define LWLockMode int
+#define FlexLockId int
+#define FlexLockMode int
 #define LOCKMODE int
 #define BlockNumber unsigned int
 #define Oid unsigned int
@@ -29,12 +29,12 @@ provider postgresql {
 	probe transaction__commit(LocalTransactionId);
 	probe transaction__abort(LocalTransactionId);
 
-	probe lwlock__acquire(LWLockId, LWLockMode);
-	probe lwlock__release(LWLockId);
-	probe lwlock__wait__start(LWLockId, LWLockMode);
-	probe lwlock__wait__done(LWLockId, LWLockMode);
-	probe lwlock__condacquire(LWLockId, LWLockMode);
-	probe lwlock__condacquire__fail(LWLockId, LWLockMode);
+	probe flexlock__acquire(FlexLockId, FlexLockMode);
+	probe flexlock__release(FlexLockId);
+	probe flexlock__wait__start(FlexLockId, FlexLockMode);
+	probe flexlock__wait__done(FlexLockId, FlexLockMode);
+	probe flexlock__condacquire(FlexLockId, FlexLockMode);
+	probe flexlock__condacquire__fail(FlexLockId, FlexLockMode);
 
 	probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
 	probe lock__wait__done(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index e48743f..680a87f 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -55,7 +55,7 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLockId	ControlLock;
+	FlexLockId	ControlLock;
 
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
@@ -69,7 +69,7 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int		   *page_number;
 	int		   *page_lru_count;
-	LWLockId   *buffer_locks;
+	FlexLockId *buffer_locks;
 
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
@@ -136,7 +136,7 @@ typedef SlruCtlData *SlruCtl;
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLockId ctllock, const char *subdir);
+			  FlexLockId ctllock, const char *subdir);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int pageno);
 extern int SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				  TransactionId xid);
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 6c8e312..d3b74db 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -49,9 +49,9 @@
 #define SEQ_MINVALUE	(-SEQ_MAXVALUE)
 
 /*
- * Number of spare LWLocks to allocate for user-defined add-on code.
+ * Number of spare FlexLocks to allocate for user-defined add-on code.
  */
-#define NUM_USER_DEFINED_LWLOCKS	4
+#define NUM_USER_DEFINED_FLEXLOCKS	4
 
 /*
  * Define this if you want to allow the lo_import and lo_export SQL
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index b7d4ea5..ac7f665 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -103,7 +103,7 @@ typedef struct buftag
 #define BufTableHashPartition(hashcode) \
 	((hashcode) % NUM_BUFFER_PARTITIONS)
 #define BufMappingPartitionLock(hashcode) \
-	((LWLockId) (FirstBufMappingLock + BufTableHashPartition(hashcode)))
+	((FlexLockId) (FirstBufMappingLock + BufTableHashPartition(hashcode)))
 
 /*
  *	BufferDesc -- shared descriptor/state data for a single shared buffer.
@@ -143,8 +143,8 @@ typedef struct sbufdesc
 	int			buf_id;			/* buffer's index number (from 0) */
 	int			freeNext;		/* link in freelist chain */
 
-	LWLockId	io_in_progress_lock;	/* to wait for I/O to complete */
-	LWLockId	content_lock;	/* to lock access to buffer contents */
+	FlexLockId	io_in_progress_lock;	/* to wait for I/O to complete */
+	FlexLockId	content_lock;	/* to lock access to buffer contents */
 } BufferDesc;
 
 #define BufferDescriptorGetBuffer(bdesc) ((bdesc)->buf_id + 1)
diff --git a/src/include/storage/flexlock.h b/src/include/storage/flexlock.h
new file mode 100644
index 0000000..612c21a
--- /dev/null
+++ b/src/include/storage/flexlock.h
@@ -0,0 +1,102 @@
+/*-------------------------------------------------------------------------
+ *
+ * flexlock.h
+ *	  Flex lock manager
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/flexlock.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FLEXLOCK_H
+#define FLEXLOCK_H
+
+/*
+ * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
+ * here, but we need them to set up enum FlexLockId correctly, and having
+ * this file include lock.h or bufmgr.h would be backwards.
+ */
+
+/* Number of partitions of the shared buffer mapping hashtable */
+#define NUM_BUFFER_PARTITIONS  16
+
+/* Number of partitions the shared lock tables are divided into */
+#define LOG2_NUM_LOCK_PARTITIONS  4
+#define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)
+
+/* Number of partitions the shared predicate lock tables are divided into */
+#define LOG2_NUM_PREDICATELOCK_PARTITIONS  4
+#define NUM_PREDICATELOCK_PARTITIONS  (1 << LOG2_NUM_PREDICATELOCK_PARTITIONS)
+
+/*
+ * We have a number of predefined FlexLocks, plus a bunch of locks that are
+ * dynamically assigned (e.g., for shared buffers).  The FlexLock structures
+ * live in shared memory (since they contain shared data) and are identified
+ * by values of this enumerated type.  We abuse the notion of an enum somewhat
+ * by allowing values not listed in the enum declaration to be assigned.
+ * The extra value MaxDynamicFlexLock is there to keep the compiler from
+ * deciding that the enum can be represented as char or short ...
+ *
+ * If you remove a lock, please replace it with a placeholder. This retains
+ * the lock numbering, which is helpful for DTrace and other external
+ * debugging scripts.
+ */
+typedef enum FlexLockId
+{
+	BufFreelistLock,
+	ShmemIndexLock,
+	OidGenLock,
+	XidGenLock,
+	ProcArrayLock,
+	SInvalReadLock,
+	SInvalWriteLock,
+	WALInsertLock,
+	WALWriteLock,
+	ControlFileLock,
+	CheckpointLock,
+	CLogControlLock,
+	SubtransControlLock,
+	MultiXactGenLock,
+	MultiXactOffsetControlLock,
+	MultiXactMemberControlLock,
+	RelCacheInitLock,
+	BgWriterCommLock,
+	TwoPhaseStateLock,
+	TablespaceCreateLock,
+	BtreeVacuumLock,
+	AddinShmemInitLock,
+	AutovacuumLock,
+	AutovacuumScheduleLock,
+	SyncScanLock,
+	RelationMappingLock,
+	AsyncCtlLock,
+	AsyncQueueLock,
+	SerializableXactHashLock,
+	SerializableFinishedListLock,
+	SerializablePredicateLockListLock,
+	OldSerXidLock,
+	SyncRepLock,
+	/* Individual lock IDs end here */
+	FirstBufMappingLock,
+	FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
+	FirstPredicateLockMgrLock = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
+
+	/* must be last except for MaxDynamicFlexLock: */
+	NumFixedFlexLocks = FirstPredicateLockMgrLock + NUM_PREDICATELOCK_PARTITIONS,
+
+	MaxDynamicFlexLock = 1000000000
+} FlexLockId;
+
+/* Shared memory setup. */
+extern int	NumFlexLocks(void);
+extern Size FlexLockShmemSize(void);
+extern void RequestAddinFlexLocks(int n);
+extern void CreateFlexLocks(void);
+
+/* Error recovery and debugging support functions. */
+extern void FlexLockReleaseAll(void);
+extern bool FlexLockHeldByMe(FlexLockId id);
+
+#endif   /* FLEXLOCK_H */
diff --git a/src/include/storage/flexlock_internals.h b/src/include/storage/flexlock_internals.h
new file mode 100644
index 0000000..a5a6bde
--- /dev/null
+++ b/src/include/storage/flexlock_internals.h
@@ -0,0 +1,146 @@
+/*-------------------------------------------------------------------------
+ *
+ * flexlock_internals.h
+ *	  Flex lock internals.  Only files which implement a FlexLock
+ *    type should need to include this.  Merging this with flexlock.h
+ *    creates a circular header dependency, but even if it didn't, this
+ *    is cleaner.
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/flexlock_internals.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FLEXLOCK_INTERNALS_H
+#define FLEXLOCK_INTERNALS_H
+
+#include "storage/proc.h"
+#include "storage/s_lock.h"
+
+/*
+ * Individual FlexLock implementations each get this many bytes to store
+ * its state; of course, a given implementation could also allocate additional
+ * shmem elsewhere, but we provide this many bytes within the array.  The
+ * header fields common to all FlexLock types are included in this number.
+ * A power of two should probably be chosen, to avoid alignment issues and
+ * cache line splitting.  It might be useful to increase this on systems where
+ * a cache line is more than 64 bytes in size.
+ */
+#define FLEX_LOCK_BYTES		64
+
+typedef struct FlexLock
+{
+	char		locktype;		/* see FLEXLOCK_TYPE_* constants */
+	slock_t		mutex;			/* Protects FlexLock state and wait queues */
+	bool		releaseOK;		/* T if ok to release waiters */
+	PGPROC	   *head;			/* head of list of waiting PGPROCs */
+	PGPROC	   *tail;			/* tail of list of waiting PGPROCs */
+	/* tail is undefined when head is NULL */
+} FlexLock;
+
+#define FLEXLOCK_TYPE_LWLOCK			'l'
+
+typedef union FlexLockPadded
+{
+	FlexLock	flex;
+	char		pad[FLEX_LOCK_BYTES];
+} FlexLockPadded;
+
+extern FlexLockPadded *FlexLockArray;
+
+extern FlexLockId FlexLockAssign(char locktype);
+
+/*
+ * We use this structure to keep track of flex locks held, for release
+ * during error recovery.  The maximum size could be determined at runtime
+ * if necessary, but it seems unlikely that more than a few locks could
+ * ever be held simultaneously.
+ */
+#define MAX_SIMUL_FLEXLOCKS 100
+extern int num_held_flexlocks;
+extern FlexLockId held_flexlocks[MAX_SIMUL_FLEXLOCKS];
+
+/* We define the following operations as macros, for speed. */
+
+/* Remember that we've acquired a FlexLock. */
+#define FlexLockRemember(lock) \
+	do { \
+		if (num_held_flexlocks >= MAX_SIMUL_FLEXLOCKS) \
+			elog(PANIC, "too many FlexLocks taken"); \
+		held_flexlocks[num_held_flexlocks++] = lock; \
+	} while (0)
+
+/*
+ * Remove lock from list of locks held.  Usually, but not always, it will
+ * be the latest-acquired lock; so search array backwards.
+ */
+#define FlexLockForget(lock) \
+	do { \
+		int	i; \
+		for (i = num_held_flexlocks; --i >= 0;) \
+			if (lock == held_flexlocks[i]) \
+				break; \
+		if (i < 0) \
+			elog(ERROR, "lock %d is not held", (int) lock); \
+		num_held_flexlocks--; \
+		for (; i < num_held_flexlocks; i++) \
+			held_flexlocks[i] = held_flexlocks[i + 1]; \
+	} while (0)
+
+/*
+ * FlexLockWait - wait until awakened
+ *
+ * Since we share the process wait semaphore with the regular lock manager
+ * and ProcWaitForSignal, and we may need to acquire a FlexLock while one of
+ * those is pending, it is possible that we get awakened for a reason other
+ * than being signaled by a FlexLock release.  If so, loop back and wait again.
+ *
+ * Returns the number of "extra" waits absorbed so that, once we've gotten the
+ * FlexLock, we can re-increment the sema by the number of additional signals
+ * received, so that the lock manager or signal manager will see the received
+ * signal when it next waits.
+ */
+#define FlexLockWait(lock, mode, extraWaits) \
+	do { \
+		FlexLockDebug("FlexLockWait", lock, "waiting"); \
+		TRACE_POSTGRESQL_FLEXLOCK_WAIT_START(lock, mode); \
+		for (;;) \
+	   	{ \
+			/* "false" means cannot accept cancel/die interrupt here. */ \
+			PGSemaphoreLock(&MyProc->sem, false); \
+			/* any non-zero value means "wake up */ \
+			if (MyProc->flWaitResult) \
+				break; \
+			extraWaits++; \
+   		} \
+		TRACE_POSTGRESQL_FLEXLOCK_WAIT_DONE(lock, mode); \
+		FlexLockDebug("FlexLockWait", lock, "awakened"); \
+	} while (0)
+
+#define FlexLockJoinWaitQueue(lock, mode) \
+	do { \
+		Assert(MyProc != NULL); \
+		MyProc->flWaitResult = 0; \
+		MyProc->flWaitMode = mode; \
+		MyProc->flWaitLink = NULL; \
+		if (lock->flex.head == NULL) \
+			lock->flex.head = MyProc; \
+		else \
+			lock->flex.tail->flWaitLink = MyProc; \
+		lock->flex.tail = MyProc; \
+	} while (0)
+
+#ifdef LOCK_DEBUG
+extern bool	Trace_flexlocks;
+#define FlexLockDebug(where, lock, msg) \
+	do { \
+		if (Trace_flexlocks) \
+			elog(LOG, "%s(%d): %s", where, (int) lock, msg); \
+	} while (0)
+#else
+#define FlexLockDebug(where, lock, msg)
+#endif
+
+#endif   /* FLEXLOCK_H */
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index e106ad5..ba87db2 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -471,7 +471,7 @@ typedef enum
 #define LockHashPartition(hashcode) \
 	((hashcode) % NUM_LOCK_PARTITIONS)
 #define LockHashPartitionLock(hashcode) \
-	((LWLockId) (FirstLockMgrLock + LockHashPartition(hashcode)))
+	((FlexLockId) (FirstLockMgrLock + LockHashPartition(hashcode)))
 
 
 /*
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 438a48d..f68cddc 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -14,82 +14,7 @@
 #ifndef LWLOCK_H
 #define LWLOCK_H
 
-/*
- * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
- * here, but we need them to set up enum LWLockId correctly, and having
- * this file include lock.h or bufmgr.h would be backwards.
- */
-
-/* Number of partitions of the shared buffer mapping hashtable */
-#define NUM_BUFFER_PARTITIONS  16
-
-/* Number of partitions the shared lock tables are divided into */
-#define LOG2_NUM_LOCK_PARTITIONS  4
-#define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)
-
-/* Number of partitions the shared predicate lock tables are divided into */
-#define LOG2_NUM_PREDICATELOCK_PARTITIONS  4
-#define NUM_PREDICATELOCK_PARTITIONS  (1 << LOG2_NUM_PREDICATELOCK_PARTITIONS)
-
-/*
- * We have a number of predefined LWLocks, plus a bunch of LWLocks that are
- * dynamically assigned (e.g., for shared buffers).  The LWLock structures
- * live in shared memory (since they contain shared data) and are identified
- * by values of this enumerated type.  We abuse the notion of an enum somewhat
- * by allowing values not listed in the enum declaration to be assigned.
- * The extra value MaxDynamicLWLock is there to keep the compiler from
- * deciding that the enum can be represented as char or short ...
- *
- * If you remove a lock, please replace it with a placeholder. This retains
- * the lock numbering, which is helpful for DTrace and other external
- * debugging scripts.
- */
-typedef enum LWLockId
-{
-	BufFreelistLock,
-	ShmemIndexLock,
-	OidGenLock,
-	XidGenLock,
-	ProcArrayLock,
-	SInvalReadLock,
-	SInvalWriteLock,
-	WALInsertLock,
-	WALWriteLock,
-	ControlFileLock,
-	CheckpointLock,
-	CLogControlLock,
-	SubtransControlLock,
-	MultiXactGenLock,
-	MultiXactOffsetControlLock,
-	MultiXactMemberControlLock,
-	RelCacheInitLock,
-	BgWriterCommLock,
-	TwoPhaseStateLock,
-	TablespaceCreateLock,
-	BtreeVacuumLock,
-	AddinShmemInitLock,
-	AutovacuumLock,
-	AutovacuumScheduleLock,
-	SyncScanLock,
-	RelationMappingLock,
-	AsyncCtlLock,
-	AsyncQueueLock,
-	SerializableXactHashLock,
-	SerializableFinishedListLock,
-	SerializablePredicateLockListLock,
-	OldSerXidLock,
-	SyncRepLock,
-	/* Individual lock IDs end here */
-	FirstBufMappingLock,
-	FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
-	FirstPredicateLockMgrLock = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
-
-	/* must be last except for MaxDynamicLWLock: */
-	NumFixedLWLocks = FirstPredicateLockMgrLock + NUM_PREDICATELOCK_PARTITIONS,
-
-	MaxDynamicLWLock = 1000000000
-} LWLockId;
-
+#include "storage/flexlock.h"
 
 typedef enum LWLockMode
 {
@@ -97,22 +22,10 @@ typedef enum LWLockMode
 	LW_SHARED
 } LWLockMode;
 
-
-#ifdef LOCK_DEBUG
-extern bool Trace_lwlocks;
-#endif
-
-extern LWLockId LWLockAssign(void);
-extern void LWLockAcquire(LWLockId lockid, LWLockMode mode);
-extern bool LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode);
-extern void LWLockRelease(LWLockId lockid);
-extern void LWLockReleaseAll(void);
-extern bool LWLockHeldByMe(LWLockId lockid);
-
-extern int	NumLWLocks(void);
-extern Size LWLockShmemSize(void);
-extern void CreateLWLocks(void);
-
-extern void RequestAddinLWLocks(int n);
+extern FlexLockId LWLockAssign(void);
+extern void LWLockAcquire(FlexLockId lockid, LWLockMode mode);
+extern bool LWLockConditionalAcquire(FlexLockId lockid, LWLockMode mode);
+extern void LWLockRelease(FlexLockId lockid);
+extern bool LWLockHeldByMe(FlexLockId lockid);
 
 #endif   /* LWLOCK_H */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index c7cddc7..1f3a71d 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -99,10 +99,10 @@ struct PGPROC
 	 */
 	bool		recoveryConflictPending;
 
-	/* Info about LWLock the process is currently waiting for, if any. */
-	bool		lwWaiting;		/* true if waiting for an LW lock */
-	bool		lwExclusive;	/* true if waiting for exclusive access */
-	struct PGPROC *lwWaitLink;	/* next waiter for same LW lock */
+	/* Info about FlexLock the process is currently waiting for, if any. */
+	int			flWaitResult;	/* result of wait, or 0 if still waiting */
+	int			flWaitMode;		/* lock mode sought */
+	struct PGPROC *flWaitLink;	/* next waiter for same FlexLock */
 
 	/* Info about lock the process is currently waiting for, if any. */
 	/* waitLock and waitProcLock are NULL if not currently waiting. */
@@ -132,7 +132,7 @@ struct PGPROC
 	struct XidCache subxids;	/* cache for subtransaction XIDs */
 
 	/* Per-backend LWLock.  Protects fields below. */
-	LWLockId	backendLock;	/* protects the fields below */
+	FlexLockId	backendLock;	/* protects the fields below */
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	uint64		fpLockBits;		/* lock modes held for each fast-path slot */

procarraylock-v3.patchapplication/octet-stream; name=procarraylock-v3.patchDownload

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c3d3958..4d27c53 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -40,6 +40,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "utils/acl.h"
 #include "utils/attoptcache.h"
 #include "utils/datum.h"
@@ -222,9 +223,9 @@ analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
 	/*
 	 * OK, let's do it.  First let other backends know I'm in ANALYZE.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/*
 	 * Do the normal non-recursive ANALYZE.
@@ -249,9 +250,9 @@ analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
 	 * Reset my PGPROC flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	MyPgXact->vacuumFlags &= ~PROC_IN_ANALYZE;
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e70dbed..09aa32b 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -39,6 +39,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
@@ -895,11 +896,11 @@ vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool do_toast, bool for_wraparound)
 		 * MyProc->xid/xmin, else OldestXmin might appear to go backwards,
 		 * which is probably Not Good.
 		 */
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		ProcArrayLockAcquire(PAL_EXCLUSIVE);
 		MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
 		if (for_wraparound)
 			MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 	}
 
 	/*
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 19ff524..d457e3f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -52,6 +52,7 @@
 #include "access/twophase.h"
 #include "miscadmin.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/snapmgr.h"
@@ -261,7 +262,7 @@ ProcArrayAdd(PGPROC *proc)
 	ProcArrayStruct *arrayP = procArray;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	if (arrayP->numProcs >= arrayP->maxProcs)
 	{
@@ -270,7 +271,7 @@ ProcArrayAdd(PGPROC *proc)
 		 * fixed supply of PGPROC structs too, and so we should have failed
 		 * earlier.)
 		 */
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 		ereport(FATAL,
 				(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
 				 errmsg("sorry, too many clients already")));
@@ -300,7 +301,7 @@ ProcArrayAdd(PGPROC *proc)
 	arrayP->pgprocnos[index] = proc->pgprocno;
 	arrayP->numProcs++;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -325,7 +326,7 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 		DisplayXidCache();
 #endif
 
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	if (TransactionIdIsValid(latestXid))
 	{
@@ -351,13 +352,13 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 					(arrayP->numProcs - index - 1) * sizeof (int));
 			arrayP->pgprocnos[arrayP->numProcs - 1] = -1; /* for debugging */
 			arrayP->numProcs--;
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 			return;
 		}
 	}
 
 	/* Ooops */
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	elog(LOG, "failed to find proc %p in ProcArray", proc);
 }
@@ -383,54 +384,19 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 
 	if (TransactionIdIsValid(latestXid))
 	{
-		/*
-		 * We must lock ProcArrayLock while clearing our advertised XID, so
-		 * that we do not exit the set of "running" transactions while someone
-		 * else is taking a snapshot.  See discussion in
-		 * src/backend/access/transam/README.
-		 */
-		Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-		pgxact->xid = InvalidTransactionId;
-		proc->lxid = InvalidLocalTransactionId;
-		pgxact->xmin = InvalidTransactionId;
-		/* must be cleared with xid/xmin: */
-		pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		pgxact->inCommit = false; /* be sure this is cleared in abort */
-		proc->recoveryConflictPending = false;
-
-		/* Clear the subtransaction-XID cache too while holding the lock */
-		pgxact->nxids = 0;
-		pgxact->overflowed = false;
-
-		/* Also advance global latestCompletedXid while holding the lock */
-		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-								  latestXid))
-			ShmemVariableCache->latestCompletedXid = latestXid;
-
-		LWLockRelease(ProcArrayLock);
+		Assert(proc == MyProc);
+		ProcArrayLockClearTransaction(latestXid);		
 	}
 	else
 	{
-		/*
-		 * If we have no XID, we don't need to lock, since we won't affect
-		 * anyone else's calculation of a snapshot.  We might change their
-		 * estimate of global xmin, but that's OK.
-		 */
-		Assert(!TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		proc->lxid = InvalidLocalTransactionId;
 		pgxact->xmin = InvalidTransactionId;
 		/* must be cleared with xid/xmin: */
 		pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		pgxact->inCommit = false; /* be sure this is cleared in abort */
-		proc->recoveryConflictPending = false;
-
-		Assert(pgxact->nxids == 0);
-		Assert(pgxact->overflowed == false);
 	}
+
+	proc->lxid = InvalidLocalTransactionId;
+	pgxact->inCommit = false; /* be sure this is cleared in abort */
+	proc->recoveryConflictPending = false;
 }
 
 
@@ -562,7 +528,7 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	/*
 	 * Nobody else is running yet, but take locks anyhow
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	/*
 	 * KnownAssignedXids is sorted so we cannot just add the xids, we have to
@@ -669,7 +635,7 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	Assert(TransactionIdIsNormal(ShmemVariableCache->latestCompletedXid));
 	Assert(TransactionIdIsValid(ShmemVariableCache->nextXid));
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
 	if (standbyState == STANDBY_SNAPSHOT_READY)
@@ -724,7 +690,7 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
 	/*
 	 * Uses same locking as transaction commit
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	/*
 	 * Remove subxids from known-assigned-xacts.
@@ -737,7 +703,7 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
 	if (TransactionIdPrecedes(procArray->lastOverflowedXid, max_xid))
 		procArray->lastOverflowedXid = max_xid;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -829,7 +795,7 @@ TransactionIdIsInProgress(TransactionId xid)
 					 errmsg("out of memory")));
 	}
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	/*
 	 * Now that we have the lock, we can check latestCompletedXid; if the
@@ -837,7 +803,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	 */
 	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid, xid))
 	{
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 		xc_by_latest_xid_inc();
 		return true;
 	}
@@ -865,7 +831,7 @@ TransactionIdIsInProgress(TransactionId xid)
 		 */
 		if (TransactionIdEquals(pxid, xid))
 		{
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 			xc_by_main_xid_inc();
 			return true;
 		}
@@ -887,7 +853,7 @@ TransactionIdIsInProgress(TransactionId xid)
 
 			if (TransactionIdEquals(cxid, xid))
 			{
-				LWLockRelease(ProcArrayLock);
+				ProcArrayLockRelease();
 				xc_by_child_xid_inc();
 				return true;
 			}
@@ -915,7 +881,7 @@ TransactionIdIsInProgress(TransactionId xid)
 
 		if (KnownAssignedXidExists(xid))
 		{
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 			xc_by_known_assigned_inc();
 			return true;
 		}
@@ -931,7 +897,7 @@ TransactionIdIsInProgress(TransactionId xid)
 			nxids = KnownAssignedXidsGet(xids, xid);
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/*
 	 * If none of the relevant caches overflowed, we know the Xid is not
@@ -997,7 +963,7 @@ TransactionIdIsActive(TransactionId xid)
 	if (TransactionIdPrecedes(xid, RecentXmin))
 		return false;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (i = 0; i < arrayP->numProcs; i++)
 	{
@@ -1022,7 +988,7 @@ TransactionIdIsActive(TransactionId xid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1085,7 +1051,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 	/* Cannot look for individual databases during recovery */
 	Assert(allDbs || !RecoveryInProgress());
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	/*
 	 * We initialize the MIN() calculation with latestCompletedXid + 1. This
@@ -1140,7 +1106,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 		 */
 		TransactionId kaxmin = KnownAssignedXidsGetOldestXmin();
 
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 
 		if (TransactionIdIsNormal(kaxmin) &&
 			TransactionIdPrecedes(kaxmin, result))
@@ -1151,7 +1117,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 		/*
 		 * No other information needed, so release the lock immediately.
 		 */
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 
 		/*
 		 * Compute the cutoff XID by subtracting vacuum_defer_cleanup_age,
@@ -1280,7 +1246,7 @@ GetSnapshotData(Snapshot snapshot)
 	 * It is sufficient to get shared lock on ProcArrayLock, even if we are
 	 * going to set MyProc->xmin.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	/* xmax is always latestCompletedXid + 1 */
 	xmax = ShmemVariableCache->latestCompletedXid;
@@ -1418,7 +1384,7 @@ GetSnapshotData(Snapshot snapshot)
 
 	if (!TransactionIdIsValid(MyPgXact->xmin))
 		MyPgXact->xmin = TransactionXmin = xmin;
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/*
 	 * Update globalxmin to include actual process xids.  This is a slightly
@@ -1475,7 +1441,7 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
 		return false;
 
 	/* Get lock so source xact can't end while we're doing this */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1521,7 +1487,7 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
 		break;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1595,7 +1561,7 @@ GetRunningTransactionData(void)
 	 * Ensure that no xids enter or leave the procarray while we obtain
 	 * snapshot.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 	LWLockAcquire(XidGenLock, LW_SHARED);
 
 	latestCompletedXid = ShmemVariableCache->latestCompletedXid;
@@ -1658,7 +1624,7 @@ GetRunningTransactionData(void)
 	CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
 
 	/* We don't release XidGenLock here, the caller is responsible for that */
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
 	Assert(TransactionIdIsValid(CurrentRunningXacts->oldestRunningXid));
@@ -1691,7 +1657,7 @@ GetOldestActiveTransactionId(void)
 
 	Assert(!RecoveryInProgress());
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	oldestRunningXid = ShmemVariableCache->nextXid;
 
@@ -1720,7 +1686,7 @@ GetOldestActiveTransactionId(void)
 		 */
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return oldestRunningXid;
 }
@@ -1753,7 +1719,7 @@ GetTransactionsInCommit(TransactionId **xids_p)
 	xids = (TransactionId *) palloc(arrayP->maxProcs * sizeof(TransactionId));
 	nxids = 0;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1768,7 +1734,7 @@ GetTransactionsInCommit(TransactionId **xids_p)
 			xids[nxids++] = pxid;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	*xids_p = xids;
 	return nxids;
@@ -1790,7 +1756,7 @@ HaveTransactionsInCommit(TransactionId *xids, int nxids)
 	ProcArrayStruct *arrayP = procArray;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1818,7 +1784,7 @@ HaveTransactionsInCommit(TransactionId *xids, int nxids)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1840,7 +1806,7 @@ BackendPidGetProc(int pid)
 	if (pid == 0)				/* never match dummy PGPROCs */
 		return NULL;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1853,7 +1819,7 @@ BackendPidGetProc(int pid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1881,7 +1847,7 @@ BackendXidGetPid(TransactionId xid)
 	if (xid == InvalidTransactionId)	/* never match invalid xid */
 		return 0;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1896,7 +1862,7 @@ BackendXidGetPid(TransactionId xid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return result;
 }
@@ -1951,7 +1917,7 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
 	vxids = (VirtualTransactionId *)
 		palloc(sizeof(VirtualTransactionId) * arrayP->maxProcs);
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1989,7 +1955,7 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	*nvxids = count;
 	return vxids;
@@ -2048,7 +2014,7 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
 					 errmsg("out of memory")));
 	}
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2083,7 +2049,7 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	/* add the terminator */
 	vxids[count].backendId = InvalidBackendId;
@@ -2104,7 +2070,7 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
 	int			index;
 	pid_t		pid = 0;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2131,7 +2097,7 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return pid;
 }
@@ -2207,7 +2173,7 @@ CountDBBackends(Oid databaseid)
 	int			count = 0;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2221,7 +2187,7 @@ CountDBBackends(Oid databaseid)
 			count++;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return count;
 }
@@ -2237,7 +2203,7 @@ CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending)
 	pid_t		pid = 0;
 
 	/* tell all backends to die */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2263,7 +2229,7 @@ CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending)
 		}
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -2276,7 +2242,7 @@ CountUserBackends(Oid roleid)
 	int			count = 0;
 	int			index;
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	ProcArrayLockAcquire(PAL_SHARED);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -2289,7 +2255,7 @@ CountUserBackends(Oid roleid)
 			count++;
 	}
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 
 	return count;
 }
@@ -2337,7 +2303,7 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 
 		*nbackends = *nprepared = 0;
 
-		LWLockAcquire(ProcArrayLock, LW_SHARED);
+		ProcArrayLockAcquire(PAL_SHARED);
 
 		for (index = 0; index < arrayP->numProcs; index++)
 		{
@@ -2363,7 +2329,7 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 			}
 		}
 
-		LWLockRelease(ProcArrayLock);
+		ProcArrayLockRelease();
 
 		if (!found)
 			return false;		/* no conflicting backends, so done */
@@ -2416,7 +2382,7 @@ XidCacheRemoveRunningXids(TransactionId xid,
 	 * to abort subtransactions, but pending closer analysis we'd best be
 	 * conservative.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	/*
 	 * Under normal circumstances xid and xids[] will be in increasing order,
@@ -2464,7 +2430,7 @@ XidCacheRemoveRunningXids(TransactionId xid,
 							  latestXid))
 		ShmemVariableCache->latestCompletedXid = latestXid;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 #ifdef XIDCACHE_DEBUG
@@ -2631,7 +2597,7 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
 	/*
 	 * Uses same locking as transaction commit
 	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 	KnownAssignedXidsRemoveTree(xid, nsubxids, subxids);
 
@@ -2640,7 +2606,7 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
 							  max_xid))
 		ShmemVariableCache->latestCompletedXid = max_xid;
 
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -2650,9 +2616,9 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
 void
 ExpireAllKnownAssignedTransactionIds(void)
 {
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	KnownAssignedXidsRemovePreceding(InvalidTransactionId);
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 /*
@@ -2662,9 +2628,9 @@ ExpireAllKnownAssignedTransactionIds(void)
 void
 ExpireOldKnownAssignedTransactionIds(TransactionId xid)
 {
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ProcArrayLockAcquire(PAL_EXCLUSIVE);
 	KnownAssignedXidsRemovePreceding(xid);
-	LWLockRelease(ProcArrayLock);
+	ProcArrayLockRelease();
 }
 
 
@@ -2886,7 +2852,7 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 	{
 		/* must hold lock to compress */
 		if (!exclusive_lock)
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+			ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 		KnownAssignedXidsCompress(true);
 
@@ -2894,7 +2860,7 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 		/* note: we no longer care about the tail pointer */
 
 		if (!exclusive_lock)
-			LWLockRelease(ProcArrayLock);
+			ProcArrayLockRelease();
 
 		/*
 		 * If it still won't fit then we're out of memory
diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index 3730e51..27eaa97 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = flexlock.o lmgr.o lock.o proc.o deadlock.o lwlock.o spin.o s_lock.o \
-	predicate.o
+	procarraylock.o predicate.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/storage/lmgr/flexlock.c b/src/backend/storage/lmgr/flexlock.c
index cf0004b..434e9c7 100644
--- a/src/backend/storage/lmgr/flexlock.c
+++ b/src/backend/storage/lmgr/flexlock.c
@@ -30,6 +30,7 @@
 #include "storage/flexlock.h"
 #include "storage/flexlock_internals.h"
 #include "storage/predicate.h"
+#include "storage/procarraylock.h"
 #include "storage/spin.h"
 
 int	num_held_flexlocks = 0;
@@ -168,9 +169,14 @@ CreateFlexLocks(void)
 
 	FlexLockArray = (FlexLockPadded *) ptr;
 
-	/* All of the "fixed" FlexLocks are LWLocks. */
+	/* All of the "fixed" FlexLocks are LWLocks - except ProcArrayLock. */
 	for (id = 0, lock = FlexLockArray; id < NumFixedFlexLocks; id++, lock++)
-		FlexLockInit(&lock->flex, FLEXLOCK_TYPE_LWLOCK);
+	{
+		if (id == ProcArrayLock)
+			FlexLockInit(&lock->flex, FLEXLOCK_TYPE_PROCARRAYLOCK);
+		else
+			FlexLockInit(&lock->flex, FLEXLOCK_TYPE_LWLOCK);
+	}
 
 	/*
 	 * Initialize the dynamic-allocation counter, which is stored just before
@@ -242,13 +248,20 @@ FlexLockReleaseAll(void)
 {
 	while (num_held_flexlocks > 0)
 	{
+		FlexLockId	id;
+		FlexLock   *flex;
+
 		HOLD_INTERRUPTS();		/* match the upcoming RESUME_INTERRUPTS */
 
-		/*
-		 * FLEXTODO: When we have multiple types of flex locks, this will
-		 * need to call the appropriate release function for each lock type.
-		 */
-		LWLockRelease(held_flexlocks[num_held_flexlocks - 1]);
+		id = held_flexlocks[num_held_flexlocks - 1];
+		flex = &FlexLockArray[id].flex;
+		if (flex->locktype == FLEXLOCK_TYPE_LWLOCK)
+			LWLockRelease(id);
+		else
+		{
+			Assert(id == ProcArrayLock);
+			ProcArrayLockRelease();
+		}
 	}
 }
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index b402999..10ec83b 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -46,6 +46,7 @@
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procarraylock.h"
 #include "storage/procsignal.h"
 #include "storage/spin.h"
 #include "utils/timestamp.h"
@@ -1083,7 +1084,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 			PGPROC	   *autovac = GetBlockingAutoVacuumPgproc();
 			PGXACT	   *autovac_pgxact = &ProcGlobal->allPgXact[autovac->pgprocno];
 
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+			ProcArrayLockAcquire(PAL_EXCLUSIVE);
 
 			/*
 			 * Only do it if the worker is not working to protect against Xid
@@ -1099,7 +1100,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 					 pid);
 
 				/* don't hold the lock across the kill() syscall */
-				LWLockRelease(ProcArrayLock);
+				ProcArrayLockRelease();
 
 				/* send the autovacuum worker Back to Old Kent Road */
 				if (kill(pid, SIGINT) < 0)
@@ -1111,7 +1112,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 				}
 			}
 			else
-				LWLockRelease(ProcArrayLock);
+				ProcArrayLockRelease();
 
 			/* prevent signal from being resent more than once */
 			allow_autovacuum_cancel = false;
diff --git a/src/backend/storage/lmgr/procarraylock.c b/src/backend/storage/lmgr/procarraylock.c
new file mode 100644
index 0000000..e4fdd2d
--- /dev/null
+++ b/src/backend/storage/lmgr/procarraylock.c
@@ -0,0 +1,344 @@
+/*-------------------------------------------------------------------------
+ *
+ * procarraylock.c
+ *	  Lock management for the ProcArray
+ *
+ * Because the ProcArray data structure is highly trafficked, it is
+ * critical that mutual exclusion for ProcArray options be as efficient
+ * as possible.  A particular problem is transaction end (commit or abort)
+ * which cannot be done in parallel with snapshot acquisition.  We
+ * therefore include some special hacks to deal with this case efficiently.
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/lmgr/procarraylock.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "pg_trace.h"
+#include "access/transam.h"
+#include "storage/flexlock_internals.h"
+#include "storage/ipc.h"
+#include "storage/procarraylock.h"
+#include "storage/proc.h"
+#include "storage/spin.h"
+
+typedef struct ProcArrayLockStruct
+{
+	FlexLock	flex;			/* common FlexLock infrastructure */
+	char		exclusive;		/* # of exclusive holders (0 or 1) */
+	int			shared;			/* # of shared holders (0..MaxBackends) */
+	PGPROC	   *ending;			/* transactions wishing to clear state */
+	TransactionId	latest_ending_xid;	/* latest ending XID */
+} ProcArrayLockStruct;
+
+/* There is only one ProcArrayLock. */
+#define	ProcArrayLockPointer() \
+	(AssertMacro(FlexLockArray[ProcArrayLock].flex.locktype == \
+		FLEXLOCK_TYPE_PROCARRAYLOCK), \
+	 (volatile ProcArrayLockStruct *) &FlexLockArray[ProcArrayLock])
+
+/*
+ * ProcArrayLockAcquire - acquire a lightweight lock in the specified mode
+ *
+ * If the lock is not available, sleep until it is.
+ *
+ * Side effect: cancel/die interrupts are held off until lock release.
+ */
+void
+ProcArrayLockAcquire(ProcArrayLockMode mode)
+{
+	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
+	PGPROC	   *proc = MyProc;
+	bool		retry = false;
+	int			extraWaits = 0;
+
+	/*
+	 * We can't wait if we haven't got a PGPROC.  This should only occur
+	 * during bootstrap or shared memory initialization.  Put an Assert here
+	 * to catch unsafe coding practices.
+	 */
+	Assert(!(proc == NULL && IsUnderPostmaster));
+
+	/*
+	 * Lock out cancel/die interrupts until we exit the code section protected
+	 * by the ProcArrayLock.  This ensures that interrupts will not interfere
+     * with manipulations of data structures in shared memory.
+	 */
+	HOLD_INTERRUPTS();
+
+	/*
+	 * Loop here to try to acquire lock after each time we are signaled by
+	 * ProcArrayLockRelease.  See comments in LWLockAcquire for an explanation
+	 * of why do we not attempt to hand off the lock directly.
+	 */
+	for (;;)
+	{
+		bool		mustwait;
+
+		/* Acquire mutex.  Time spent holding mutex should be short! */
+		SpinLockAcquire(&lock->flex.mutex);
+
+		/* If retrying, allow LWLockRelease to release waiters again */
+		if (retry)
+			lock->flex.releaseOK = true;
+
+		/* If I can get the lock, do so quickly. */
+		if (mode == PAL_EXCLUSIVE)
+		{
+			if (lock->exclusive == 0 && lock->shared == 0)
+			{
+				lock->exclusive++;
+				mustwait = false;
+			}
+			else
+				mustwait = true;
+		}
+		else
+		{
+			if (lock->exclusive == 0)
+			{
+				lock->shared++;
+				mustwait = false;
+			}
+			else
+				mustwait = true;
+		}
+
+		if (!mustwait)
+			break;				/* got the lock */
+
+		/* Add myself to wait queue. */
+		FlexLockJoinWaitQueue(lock, (int) mode);
+
+		/* Can release the mutex now */
+		SpinLockRelease(&lock->flex.mutex);
+
+		/* Wait until awakened. */
+		FlexLockWait(ProcArrayLock, mode, extraWaits);
+
+		/* Now loop back and try to acquire lock again. */
+		retry = true;
+	}
+
+	/* We are done updating shared state of the lock itself. */
+	SpinLockRelease(&lock->flex.mutex);
+
+	TRACE_POSTGRESQL_FLEXLOCK_ACQUIRE(lockid, mode);
+
+	/* Add lock to list of locks held by this backend */
+	FlexLockRemember(ProcArrayLock);
+
+	/*
+	 * Fix the process wait semaphore's count for any absorbed wakeups.
+	 */
+	while (extraWaits-- > 0)
+		PGSemaphoreUnlock(&proc->sem);
+}
+
+/*
+ * ProcArrayLockClearTransaction - safely clear transaction details
+ *
+ * This can't be done while ProcArrayLock is held, but it's so fast that
+ * we can afford to do it while holding the spinlock, rather than acquiring
+ * and releasing the lock.
+ */
+void
+ProcArrayLockClearTransaction(TransactionId latestXid)
+{
+	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
+	PGPROC	   *proc = MyProc;
+	int			extraWaits = 0;
+	bool		mustwait;
+
+	HOLD_INTERRUPTS();
+
+	/* Acquire mutex.  Time spent holding mutex should be short! */
+	SpinLockAcquire(&lock->flex.mutex);
+
+	if (lock->exclusive == 0 && lock->shared == 0)
+	{
+		{
+			volatile PGPROC *vproc = proc;
+			volatile PGXACT *pgxact = &ProcGlobal->allPgXact[vproc->pgprocno];
+			/* If there are no lockers, clear the critical PGPROC fields. */
+			pgxact->xid = InvalidTransactionId;
+	        pgxact->xmin = InvalidTransactionId;
+	        /* must be cleared with xid/xmin: */
+	        pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
+			pgxact->nxids = 0;
+			pgxact->overflowed = false;
+		}
+		mustwait = false;
+
+        /* Also advance global latestCompletedXid while holding the lock */
+        if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
+                                  latestXid))
+            ShmemVariableCache->latestCompletedXid = latestXid;
+	}
+	else
+	{
+		/* Rats, must wait. */
+		proc->flWaitLink = lock->ending;
+		lock->ending = proc;
+		if (!TransactionIdIsValid(lock->latest_ending_xid) ||
+				TransactionIdPrecedes(lock->latest_ending_xid, latestXid)) 
+			lock->latest_ending_xid = latestXid;
+		mustwait = true;
+	}
+
+	/* Can release the mutex now */
+	SpinLockRelease(&lock->flex.mutex);
+
+	/*
+	 * If we were not able to perfom the operation immediately, we must wait.
+	 * But we need not retry after being awoken, because the last lock holder
+	 * to release the lock will do the work first, on our behalf.
+	 */
+	if (mustwait)
+	{
+		FlexLockWait(ProcArrayLock, 2, extraWaits);
+		while (extraWaits-- > 0)
+			PGSemaphoreUnlock(&proc->sem);
+	}
+
+	RESUME_INTERRUPTS();
+}
+
+/*
+ * ProcArrayLockRelease - release a previously acquired lock
+ */
+void
+ProcArrayLockRelease(void)
+{
+	volatile ProcArrayLockStruct *lock = ProcArrayLockPointer();
+	PGPROC	   *head;
+	PGPROC	   *ending = NULL;
+	PGPROC	   *proc;
+
+	FlexLockForget(ProcArrayLock);
+
+	/* Acquire mutex.  Time spent holding mutex should be short! */
+	SpinLockAcquire(&lock->flex.mutex);
+
+	/* Release my hold on lock */
+	if (lock->exclusive > 0)
+		lock->exclusive--;
+	else
+	{
+		Assert(lock->shared > 0);
+		lock->shared--;
+	}
+
+	/*
+	 * If the lock is now free, but there are some transactions trying to
+	 * end, we must clear the critical PGPROC fields for them, and save a
+	 * list of them so we can wake them up.
+	 */
+	if (lock->exclusive == 0 && lock->shared == 0 && lock->ending != NULL)
+	{
+		volatile PGPROC *vproc;
+
+		ending = lock->ending;
+		vproc = ending;
+
+		while (vproc != NULL)
+		{
+			volatile PGXACT *pgxact = &ProcGlobal->allPgXact[vproc->pgprocno];
+
+        	pgxact->xid = InvalidTransactionId;
+	        pgxact->xmin = InvalidTransactionId;
+	        /* must be cleared with xid/xmin: */
+	        pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
+			pgxact->nxids = 0;
+			pgxact->overflowed = false;
+			vproc = vproc->flWaitLink;
+		}
+
+		/* Also advance global latestCompletedXid */
+		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
+								  lock->latest_ending_xid))
+			ShmemVariableCache->latestCompletedXid = lock->latest_ending_xid;
+
+		/* Reset lock state. */
+		lock->ending = NULL;
+		lock->latest_ending_xid = InvalidTransactionId;
+	}
+
+	/*
+	 * See if I need to awaken any waiters.  If I released a non-last shared
+	 * hold, there cannot be anything to do.  Also, do not awaken any waiters
+	 * if someone has already awakened waiters that haven't yet acquired the
+	 * lock.
+	 */
+	head = lock->flex.head;
+	if (head != NULL)
+	{
+		if (lock->exclusive == 0 && lock->shared == 0 && lock->flex.releaseOK)
+		{
+			/*
+			 * Remove the to-be-awakened PGPROCs from the queue.  If the front
+			 * waiter wants exclusive lock, awaken him only. Otherwise awaken
+			 * as many waiters as want shared access.
+			 */
+			proc = head;
+			if (proc->flWaitMode != LW_EXCLUSIVE)
+			{
+				while (proc->flWaitLink != NULL &&
+					   proc->flWaitLink->flWaitMode != LW_EXCLUSIVE)
+					proc = proc->flWaitLink;
+			}
+			/* proc is now the last PGPROC to be released */
+			lock->flex.head = proc->flWaitLink;
+			proc->flWaitLink = NULL;
+			/* prevent additional wakeups until retryer gets to run */
+			lock->flex.releaseOK = false;
+		}
+		else
+		{
+			/* lock is still held, can't awaken anything */
+			head = NULL;
+		}
+	}
+
+	/* We are done updating shared state of the lock itself. */
+	SpinLockRelease(&lock->flex.mutex);
+
+	TRACE_POSTGRESQL_FLEXLOCK_RELEASE(lockid);
+
+	/*
+	 * Awaken any waiters I removed from the queue.
+	 */
+	while (head != NULL)
+	{
+		FlexLockDebug("LWLockRelease", lockid, "release waiter");
+		proc = head;
+		head = proc->flWaitLink;
+		proc->flWaitLink = NULL;
+		proc->flWaitResult = 1;		/* any non-zero value will do */
+		PGSemaphoreUnlock(&proc->sem);
+	}
+
+	/*
+	 * Also awaken any processes whose critical PGPROC fields I cleared
+	 */
+	while (ending != NULL)
+	{
+		FlexLockDebug("LWLockRelease", lockid, "release ending");
+		proc = ending;
+		ending = proc->flWaitLink;
+		proc->flWaitLink = NULL;
+		proc->flWaitResult = 1;		/* any non-zero value will do */
+		PGSemaphoreUnlock(&proc->sem);
+	}
+
+	/*
+	 * Now okay to allow cancel/die interrupts.
+	 */
+	RESUME_INTERRUPTS();
+}
diff --git a/src/include/storage/flexlock_internals.h b/src/include/storage/flexlock_internals.h
index a5a6bde..5749f8b 100644
--- a/src/include/storage/flexlock_internals.h
+++ b/src/include/storage/flexlock_internals.h
@@ -41,6 +41,7 @@ typedef struct FlexLock
 } FlexLock;
 
 #define FLEXLOCK_TYPE_LWLOCK			'l'
+#define FLEXLOCK_TYPE_PROCARRAYLOCK		'p'
 
 typedef union FlexLockPadded
 {
diff --git a/src/include/storage/procarraylock.h b/src/include/storage/procarraylock.h
new file mode 100644
index 0000000..678ca6f
--- /dev/null
+++ b/src/include/storage/procarraylock.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * procarraylock.h
+ *	  Lock management for the ProcArray
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/lwlock.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PROCARRAYLOCK_H
+#define PROCARRAYLOCK_H
+
+#include "storage/flexlock.h"
+
+typedef enum ProcArrayLockMode
+{
+	PAL_EXCLUSIVE,
+	PAL_SHARED
+} ProcArrayLockMode;
+
+extern void ProcArrayLockAcquire(ProcArrayLockMode mode);
+extern void ProcArrayLockClearTransaction(TransactionId latestXid);
+extern void ProcArrayLockRelease(void);
+
+#endif   /* PROCARRAYLOCK_H */