Patch to improve performance of replay of AccessExclusiveLock

Started by David Rowleyalmost 9 years ago22 messages
#1David Rowley
david.rowley@2ndquadrant.com
1 attachment(s)

Hackers,

I've attached a small patch which aims to improve the performance of
AccessExclusiveLock when there are many AccessExclusiveLock stored.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

recovery_lock_speedup_v1.patchapplication/octet-stream; name=recovery_lock_speedup_v1.patchDownload
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 6259070..a7b0403 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -38,7 +38,20 @@ int			vacuum_defer_cleanup_age;
 int			max_standby_archive_delay = 30 * 1000;
 int			max_standby_streaming_delay = 30 * 1000;
 
-static List *RecoveryLockList;
+/*
+ * Number of buckets to split RecoveryLockTable into.
+ * This must be a power of two.
+ */
+#define RECOVERYLOCKTABLE_SIZE 1024
+#define RECOVERYLOCKTABLE_MASK (RECOVERYLOCKTABLE_SIZE - 1)
+
+/*
+ * RecoveryLockTable is a poor man's hash table that allows us to partition
+ * the stored locks. Which partition a lock is stored in is determined by the
+ * xid which the lock belongs to. Splitting into partitions in this way avoids
+ * having to look through all locks to find one specific to a given xid.
+ */
+static List **RecoveryLockTable;
 
 static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
 									   ProcSignalReason reason);
@@ -64,6 +77,10 @@ InitRecoveryTransactionEnvironment(void)
 {
 	VirtualTransactionId vxid;
 
+	/* Setup the recovery lock table */
+	RecoveryLockTable = (List **)
+						palloc0(RECOVERYLOCKTABLE_SIZE * sizeof(List *));
+
 	/*
 	 * Initialize shared invalidation management for Startup process, being
 	 * careful to register ourselves as a sendOnly process so we don't need to
@@ -109,6 +126,9 @@ ShutdownRecoveryTransactionEnvironment(void)
 
 	/* Cleanup our VirtualTransaction */
 	VirtualXactLockTableCleanup();
+
+	/* Wipe out the RecoveryLockTable */
+	pfree(RecoveryLockTable);
 }
 
 
@@ -607,6 +627,8 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
 {
 	xl_standby_lock *newlock;
 	LOCKTAG		locktag;
+	size_t		pidx;
+
 
 	/* Already processed? */
 	if (!TransactionIdIsValid(xid) ||
@@ -624,7 +646,8 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
 	newlock->xid = xid;
 	newlock->dbOid = dbOid;
 	newlock->relOid = relOid;
-	RecoveryLockList = lappend(RecoveryLockList, newlock);
+	pidx = xid & RECOVERYLOCKTABLE_MASK;
+	RecoveryLockTable[pidx] = lappend(RecoveryLockTable[pidx], newlock);
 
 	SET_LOCKTAG_RELATION(locktag, newlock->dbOid, newlock->relOid);
 
@@ -637,18 +660,26 @@ StandbyReleaseLocks(TransactionId xid)
 	ListCell   *cell,
 			   *prev,
 			   *next;
+	size_t		pidx;
+
+	if (!TransactionIdIsValid(xid))
+	{
+		StandbyReleaseAllLocks();
+		return;
+	}
 
 	/*
 	 * Release all matching locks and remove them from list
 	 */
+	pidx = xid & RECOVERYLOCKTABLE_MASK;
 	prev = NULL;
-	for (cell = list_head(RecoveryLockList); cell; cell = next)
+	for (cell = list_head(RecoveryLockTable[pidx]); cell; cell = next)
 	{
 		xl_standby_lock *lock = (xl_standby_lock *) lfirst(cell);
 
 		next = lnext(cell);
 
-		if (!TransactionIdIsValid(xid) || lock->xid == xid)
+		if (lock->xid == xid)
 		{
 			LOCKTAG		locktag;
 
@@ -658,10 +689,10 @@ StandbyReleaseLocks(TransactionId xid)
 			SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid);
 			if (!LockRelease(&locktag, AccessExclusiveLock, true))
 				elog(LOG,
-					 "RecoveryLockList contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
+					 "RecoveryLockTable contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
 					 lock->xid, lock->dbOid, lock->relOid);
 
-			RecoveryLockList = list_delete_cell(RecoveryLockList, cell, prev);
+			RecoveryLockTable[pidx] = list_delete_cell(RecoveryLockTable[pidx], cell, prev);
 			pfree(lock);
 		}
 		else
@@ -671,7 +702,7 @@ StandbyReleaseLocks(TransactionId xid)
 
 /*
  * Release locks for a transaction tree, starting at xid down, from
- * RecoveryLockList.
+ * RecoveryLockTable.
  *
  * Called during WAL replay of COMMIT/ROLLBACK when in hot standby mode,
  * to remove any AccessExclusiveLocks requested by a transaction.
@@ -697,26 +728,31 @@ StandbyReleaseAllLocks(void)
 			   *prev,
 			   *next;
 	LOCKTAG		locktag;
+	size_t		pidx;
 
 	elog(trace_recovery(DEBUG2), "release all standby locks");
 
-	prev = NULL;
-	for (cell = list_head(RecoveryLockList); cell; cell = next)
+	for (pidx = 0; pidx < RECOVERYLOCKTABLE_SIZE; pidx++)
 	{
-		xl_standby_lock *lock = (xl_standby_lock *) lfirst(cell);
+		prev = NULL;
+		for (cell = list_head(RecoveryLockTable[pidx]); cell; cell = next)
+		{
+			xl_standby_lock *lock = (xl_standby_lock *) lfirst(cell);
 
-		next = lnext(cell);
+			next = lnext(cell);
 
-		elog(trace_recovery(DEBUG4),
-			 "releasing recovery lock: xid %u db %u rel %u",
-			 lock->xid, lock->dbOid, lock->relOid);
-		SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid);
-		if (!LockRelease(&locktag, AccessExclusiveLock, true))
-			elog(LOG,
-				 "RecoveryLockList contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
+			elog(trace_recovery(DEBUG4),
+				 "releasing recovery lock: xid %u db %u rel %u",
 				 lock->xid, lock->dbOid, lock->relOid);
-		RecoveryLockList = list_delete_cell(RecoveryLockList, cell, prev);
-		pfree(lock);
+			SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid);
+			if (!LockRelease(&locktag, AccessExclusiveLock, true))
+				elog(LOG,
+					 "RecoveryLockTable contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
+					 lock->xid, lock->dbOid, lock->relOid);
+			RecoveryLockTable[pidx] =
+				list_delete_cell(RecoveryLockTable[pidx], cell, prev);
+			pfree(lock);
+		}
 	}
 }
 
@@ -732,55 +768,59 @@ StandbyReleaseOldLocks(int nxids, TransactionId *xids)
 			   *prev,
 			   *next;
 	LOCKTAG		locktag;
+	int			pidx;
 
-	prev = NULL;
-	for (cell = list_head(RecoveryLockList); cell; cell = next)
+	for (pidx = 0; pidx < RECOVERYLOCKTABLE_SIZE; pidx++)
 	{
-		xl_standby_lock *lock = (xl_standby_lock *) lfirst(cell);
-		bool		remove = false;
+		prev = NULL;
+		for (cell = list_head(RecoveryLockTable[pidx]); cell; cell = next)
+		{
+			xl_standby_lock *lock = (xl_standby_lock *) lfirst(cell);
+			bool		remove = false;
 
-		next = lnext(cell);
+			next = lnext(cell);
 
-		Assert(TransactionIdIsValid(lock->xid));
+			Assert(TransactionIdIsValid(lock->xid));
 
-		if (StandbyTransactionIdIsPrepared(lock->xid))
-			remove = false;
-		else
-		{
-			int			i;
-			bool		found = false;
-
-			for (i = 0; i < nxids; i++)
+			if (StandbyTransactionIdIsPrepared(lock->xid))
+				remove = false;
+			else
 			{
-				if (lock->xid == xids[i])
+				int			i;
+				bool		found = false;
+
+				for (i = 0; i < nxids; i++)
 				{
-					found = true;
-					break;
+					if (lock->xid == xids[i])
+					{
+						found = true;
+						break;
+					}
 				}
-			}
 
-			/*
-			 * If its not a running transaction, remove it.
-			 */
-			if (!found)
-				remove = true;
-		}
+				/*
+				 * If its not a running transaction, remove it.
+				 */
+				if (!found)
+					remove = true;
+			}
 
-		if (remove)
-		{
-			elog(trace_recovery(DEBUG4),
-				 "releasing recovery lock: xid %u db %u rel %u",
-				 lock->xid, lock->dbOid, lock->relOid);
-			SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid);
-			if (!LockRelease(&locktag, AccessExclusiveLock, true))
-				elog(LOG,
-					 "RecoveryLockList contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
+			if (remove)
+			{
+				elog(trace_recovery(DEBUG4),
+					 "releasing recovery lock: xid %u db %u rel %u",
 					 lock->xid, lock->dbOid, lock->relOid);
-			RecoveryLockList = list_delete_cell(RecoveryLockList, cell, prev);
-			pfree(lock);
+				SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid);
+				if (!LockRelease(&locktag, AccessExclusiveLock, true))
+					elog(LOG,
+						 "RecoveryLockTable contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
+						 lock->xid, lock->dbOid, lock->relOid);
+				RecoveryLockTable[pidx] = list_delete_cell(RecoveryLockTable[pidx], cell, prev);
+				pfree(lock);
+			}
+			else
+				prev = cell;
 		}
-		else
-			prev = cell;
 	}
 }
 
#2Amit Kapila
amit.kapila16@gmail.com
In reply to: David Rowley (#1)
Re: Patch to improve performance of replay of AccessExclusiveLock

On Wed, Mar 1, 2017 at 5:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

Hackers,

I've attached a small patch which aims to improve the performance of
AccessExclusiveLock when there are many AccessExclusiveLock stored.

I could see that your idea is quite straightforward to improve
performance (basically, convert recovery lock list to recovery lock
hash table based on xid), however, it is not clear under what kind of
workload this will be helpful? Do you expect that many concurrent
sessions are trying to acquire different AEL locks?

Few comments on the patch:
1.
+/*
+ * Number of buckets to split RecoveryLockTable into.
+ * This must be a power of two.
+ */

+#define RECOVERYLOCKTABLE_SIZE 1024

On what basis did you decide to keep the lock table size as 1024, is
it just a guess, if so, then also adding a comment to indicate that
you think this is sufficient for current use cases seems better.
However, if it is based on some data, then it would be more
appropriate.

2.
@@ -607,6 +627,8 @@ StandbyAcquireAccessExclusiveLock(TransactionId
xid, Oid dbOid, Oid relOid)
{
xl_standby_lock *newlock;
LOCKTAG locktag;
+ size_t pidx;
+

Comment on top of this function needs some change, it still seems to
refer to RelationLockList.

* We keep a single dynamically expandible list of locks in local memory,
* RelationLockList, so we can keep track of the various entries made by
* the Startup process's virtual xid in the shared lock table.

3.
+ size_t pidx;
+
+ if (!TransactionIdIsValid(xid))
+ {
+ StandbyReleaseAllLocks();
+ return;
+ }

What is the need of this new code?

4.
In some cases, it might slow down the performance where you have to
traverse the complete hash table like StandbyReleaseOldLocks, however,
I think those cases will be less relative to other lock release calls
which are called at every commit, so probably this is okay.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Kapila (#2)
2 attachment(s)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 2 March 2017 at 16:06, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 1, 2017 at 5:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

Hackers,

I've attached a small patch which aims to improve the performance of
AccessExclusiveLock when there are many AccessExclusiveLock stored.

I could see that your idea is quite straightforward to improve
performance (basically, convert recovery lock list to recovery lock
hash table based on xid), however, it is not clear under what kind of
workload this will be helpful? Do you expect that many concurrent
sessions are trying to acquire different AEL locks?

Many thanks for looking over this.

The problem case is when many AccessExclusiveLocks are in the list,
each transaction which commits must look through the entire list for
locks which belong to it. This can be a problem when a small number of
transactions create large amount of temp tables, perhaps some batch
processing job. Other transactions, even ones which don't create any
temp tables must look through the list to release any locks belonging
to them. All we need to do to recreate this is have a bunch of
transactions creating temp tables, then try and execute some very
small and fast transactions at the same time.

It's possible to reproduce the problem by running the attached
temptable_bench.sql with:

pgbench -f temptable_bench.sql -j 10 -c 10 -T 60 -n postgres

while concurrently running

pgbench -f updatecounter.sql -T 60 -n postgres

after having done:

CREATE TABLE t1 (value int not null);
insert into t1 values(1);

The hot physical standby lag grows significantly during the run. I saw
up to 500MB on a short 1 minute run and perf record shows that
StandbyReleaseLocks() is consuming 90% of CPU time.

With the patch StandbyReleaseLocks() is down at about 2% CPU.

Few comments on the patch:
1.
+/*
+ * Number of buckets to split RecoveryLockTable into.
+ * This must be a power of two.
+ */

+#define RECOVERYLOCKTABLE_SIZE 1024

On what basis did you decide to keep the lock table size as 1024, is
it just a guess, if so, then also adding a comment to indicate that
you think this is sufficient for current use cases seems better.
However, if it is based on some data, then it would be more
appropriate.

That may need tweaking. Likely it could be smaller if we had some sort
of bloom filter to mark if the transaction had obtained any AEL locks,
that way it could skip. Initially I really didn't want to make the
patch too complex. I had thought that a fairly large hash table would
fix the problem well enough, as quite possibly most buckets would be
empty and non empty buckets have short lists.

2.
@@ -607,6 +627,8 @@ StandbyAcquireAccessExclusiveLock(TransactionId
xid, Oid dbOid, Oid relOid)
{
xl_standby_lock *newlock;
LOCKTAG locktag;
+ size_t pidx;
+

Comment on top of this function needs some change, it still seems to
refer to RelationLockList.

Will fix that.

* We keep a single dynamically expandible list of locks in local memory,
* RelationLockList, so we can keep track of the various entries made by
* the Startup process's virtual xid in the shared lock table.

3.
+ size_t pidx;
+
+ if (!TransactionIdIsValid(xid))
+ {
+ StandbyReleaseAllLocks();
+ return;
+ }

What is the need of this new code?

I was not much of a fan of the old way that the function coped with an
invalid xid. The test was being performed inside of the for loop. My
new for loop was not really compatible with that test, so I moved
handling of invalid xids to outside the loop. I think doing that with
today's code would be an improvement as it would mean less work inside
the slow loop.

4.
In some cases, it might slow down the performance where you have to
traverse the complete hash table like StandbyReleaseOldLocks, however,
I think those cases will be less relative to other lock release calls
which are called at every commit, so probably this is okay.

Yes, that may be true. Perhaps making the table smaller would help
bring that back to what it was, or perhaps the patch needs rethought
to use bloom filters to help identify if we need to release any locks.

Opinions on that would be welcome. Meanwhile I'll go off and play with
bloom filters to see if I can knock some percentage points of the perf
report for StandbyReleaseLocks().

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

temptable_bench.sql.bz2application/x-bzip2; name=temptable_bench.sql.bz2Download
updatecounter.sqlapplication/octet-stream; name=updatecounter.sqlDownload
#4Simon Riggs
simon@2ndquadrant.com
In reply to: David Rowley (#3)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 7 March 2017 at 10:01, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 2 March 2017 at 16:06, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 1, 2017 at 5:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

Hackers,

I've attached a small patch which aims to improve the performance of
AccessExclusiveLock when there are many AccessExclusiveLock stored.

Looks useful.

That may need tweaking. Likely it could be smaller if we had some sort
of bloom filter to mark if the transaction had obtained any AEL locks,
that way it could skip. Initially I really didn't want to make the
patch too complex. I had thought that a fairly large hash table would
fix the problem well enough, as quite possibly most buckets would be
empty and non empty buckets have short lists.

ISTM that we should mark each COMMIT if it has an AEL, so we can avoid
any overhead in the common case.

So an additional sub-record for the commit/abort wal record, via
include/access/xact.h

4.
In some cases, it might slow down the performance where you have to
traverse the complete hash table like StandbyReleaseOldLocks, however,
I think those cases will be less relative to other lock release calls
which are called at every commit, so probably this is okay.

Yes, that may be true. Perhaps making the table smaller would help
bring that back to what it was, or perhaps the patch needs rethought
to use bloom filters to help identify if we need to release any locks.

Opinions on that would be welcome. Meanwhile I'll go off and play with
bloom filters to see if I can knock some percentage points of the perf
report for StandbyReleaseLocks().

Didn't look at the code closely, but if the common case COMMITs don't
scan the list that would be most of the problem solved. If we did need
a hash table, we should just use a normal hash table?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5David Rowley
david.rowley@2ndquadrant.com
In reply to: David Rowley (#3)
1 attachment(s)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 7 March 2017 at 17:31, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 2 March 2017 at 16:06, Amit Kapila <amit.kapila16@gmail.com> wrote:

Few comments on the patch:
1.
+/*
+ * Number of buckets to split RecoveryLockTable into.
+ * This must be a power of two.
+ */

+#define RECOVERYLOCKTABLE_SIZE 1024

On what basis did you decide to keep the lock table size as 1024, is
it just a guess, if so, then also adding a comment to indicate that
you think this is sufficient for current use cases seems better.
However, if it is based on some data, then it would be more
appropriate.

I've performed some benchmarking using the test case from my previous
email and performed a perf record -g on the startup process during
each test.

I used various different sized hash tables, in the range of 32 to 8192
elements. The results are below:

@32 + 28.55% 25.20% postgres postgres [.] StandbyReleaseLocks
@64 + 16.07% 14.65% postgres postgres [.] StandbyReleaseLocks
@128 + 9.15% 8.07% postgres postgres [.] StandbyReleaseLocks
@256 + 5.80% 5.00% postgres postgres [.] StandbyReleaseLocks
@512 2.99% 2.63% postgres postgres [.] StandbyReleaseLocks
@1024 1.74% 1.58% postgres postgres [.] StandbyReleaseLocks
@2048 1.26% 1.14% postgres postgres [.] StandbyReleaseLocks
@4096 0.89% 0.83% postgres postgres [.] StandbyReleaseLocks
@8192 0.99% 0.93% postgres postgres [.] StandbyReleaseLocks

Seems to be diminishing returns after 1024 elements. So perhaps 512 or
1024 are in fact the best size. Of course, I'm only performing this on
my i7 laptop, these numbers may vary on larger hardware with more
transactions running.

2.
@@ -607,6 +627,8 @@ StandbyAcquireAccessExclusiveLock(TransactionId
xid, Oid dbOid, Oid relOid)
{
xl_standby_lock *newlock;
LOCKTAG locktag;
+ size_t pidx;
+

Comment on top of this function needs some change, it still seems to
refer to RelationLockList.

Will fix that.

Fixed in the attached.

4.
In some cases, it might slow down the performance where you have to
traverse the complete hash table like StandbyReleaseOldLocks, however,
I think those cases will be less relative to other lock release calls
which are called at every commit, so probably this is okay.

Yes, that may be true. Perhaps making the table smaller would help
bring that back to what it was, or perhaps the patch needs rethought
to use bloom filters to help identify if we need to release any locks.

Opinions on that would be welcome. Meanwhile I'll go off and play with
bloom filters to see if I can knock some percentage points of the perf
report for StandbyReleaseLocks().

So I quickly realised that bloom filters are impossible here as
there's no way to unmark the bit when locks are released as we cannot
know if the bit is used to mark another transaction's locks. I ended
up giving up on that idea and instead I added a RecoveryLockCount
variable that keeps track on the number of locks. This may also help
to speed up some cases where a transaction has many sub transactions
and there's no AEL locks held, as we can simply fast path out in
StandbyReleaseLockTree() without looping over each sub-xid and trying
to release locks which don't exist.

I've attached an updated patch

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

recovery_lock_v2.patchapplication/octet-stream; name=recovery_lock_v2.patchDownload
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 6259070..5f85851 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -38,7 +38,29 @@ int			vacuum_defer_cleanup_age;
 int			max_standby_archive_delay = 30 * 1000;
 int			max_standby_streaming_delay = 30 * 1000;
 
-static List *RecoveryLockList;
+/*
+ * RecoveryLockTable is a poor man's hash table that allows us to partition
+ * the stored locks. Which partition a lock is stored in is determined by the
+ * xid which the lock belongs to. The hash function is very simplistic and
+ * merely performs a binary AND against the final 0-based partition number.
+ * Splitting into partitions in this way avoids having to look through all
+ * locks to find one specific to a given xid.
+ */
+static List **RecoveryLockTable;
+
+/*
+ * Number of items in RecoveryLockTable. We can make us of this so we can
+ * fast-path out from any RecoveryLockTable lookups when there's no locks held
+ */
+static uint64 RecoveryLockCount;
+
+/*
+ * Number of buckets to split RecoveryLockTable into.
+ * This must be a power of two.
+ */
+#define RECOVERYLOCKTABLE_SIZE 1024
+#define RECOVERYLOCKTABLE_MASK (RECOVERYLOCKTABLE_SIZE - 1)
+
 
 static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
 									   ProcSignalReason reason);
@@ -64,6 +86,12 @@ InitRecoveryTransactionEnvironment(void)
 {
 	VirtualTransactionId vxid;
 
+	/* Setup the recovery lock table */
+	RecoveryLockTable = (List **)
+						palloc0(RECOVERYLOCKTABLE_SIZE * sizeof(List *));
+
+	RecoveryLockCount = 0;
+
 	/*
 	 * Initialize shared invalidation management for Startup process, being
 	 * careful to register ourselves as a sendOnly process so we don't need to
@@ -109,6 +137,9 @@ ShutdownRecoveryTransactionEnvironment(void)
 
 	/* Cleanup our VirtualTransaction */
 	VirtualXactLockTableCleanup();
+
+	/* Free memory used by the RecoveryLockTable */
+	pfree(RecoveryLockTable);
 }
 
 
@@ -586,9 +617,10 @@ StandbyLockTimeoutHandler(void)
  * We only keep track of AccessExclusiveLocks, which are only ever held by
  * one transaction on one relation.
  *
- * We keep a single dynamically expandible list of locks in local memory,
- * RelationLockList, so we can keep track of the various entries made by
- * the Startup process's virtual xid in the shared lock table.
+ * We maintain a simple hash table to store locks. This allows quick lookups
+ * to find locks which belong to a given xid and allows us to keep track of
+ * the various entries made by the Startup process's virtual xid in the shared
+ * lock table.
  *
  * We record the lock against the top-level xid, rather than individual
  * subtransaction xids. This means AccessExclusiveLocks held by aborted
@@ -607,6 +639,8 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
 {
 	xl_standby_lock *newlock;
 	LOCKTAG		locktag;
+	size_t		xidhash;
+
 
 	/* Already processed? */
 	if (!TransactionIdIsValid(xid) ||
@@ -624,7 +658,11 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
 	newlock->xid = xid;
 	newlock->dbOid = dbOid;
 	newlock->relOid = relOid;
-	RecoveryLockList = lappend(RecoveryLockList, newlock);
+
+	/* add the newlock to the hash table */
+	xidhash = xid & RECOVERYLOCKTABLE_MASK;
+	RecoveryLockTable[xidhash] = lappend(RecoveryLockTable[xidhash], newlock);
+	RecoveryLockCount++;
 
 	SET_LOCKTAG_RELATION(locktag, newlock->dbOid, newlock->relOid);
 
@@ -637,18 +675,31 @@ StandbyReleaseLocks(TransactionId xid)
 	ListCell   *cell,
 			   *prev,
 			   *next;
+	size_t		xidhash;
+
+	/* fast path out if there's no locks */
+	if (RecoveryLockCount == 0)
+		return;
+
+	if (!TransactionIdIsValid(xid))
+	{
+		StandbyReleaseAllLocks();
+		return;
+	}
 
 	/*
-	 * Release all matching locks and remove them from list
+	 * Release all matching locks and remove them from recover lock table
 	 */
+
+	xidhash = xid & RECOVERYLOCKTABLE_MASK;
 	prev = NULL;
-	for (cell = list_head(RecoveryLockList); cell; cell = next)
+	for (cell = list_head(RecoveryLockTable[xidhash]); cell; cell = next)
 	{
 		xl_standby_lock *lock = (xl_standby_lock *) lfirst(cell);
 
 		next = lnext(cell);
 
-		if (!TransactionIdIsValid(xid) || lock->xid == xid)
+		if (lock->xid == xid)
 		{
 			LOCKTAG		locktag;
 
@@ -658,10 +709,12 @@ StandbyReleaseLocks(TransactionId xid)
 			SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid);
 			if (!LockRelease(&locktag, AccessExclusiveLock, true))
 				elog(LOG,
-					 "RecoveryLockList contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
+					 "RecoveryLockTable contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
 					 lock->xid, lock->dbOid, lock->relOid);
 
-			RecoveryLockList = list_delete_cell(RecoveryLockList, cell, prev);
+			RecoveryLockTable[xidhash] =
+					list_delete_cell(RecoveryLockTable[xidhash], cell, prev);
+			RecoveryLockCount--;
 			pfree(lock);
 		}
 		else
@@ -671,7 +724,7 @@ StandbyReleaseLocks(TransactionId xid)
 
 /*
  * Release locks for a transaction tree, starting at xid down, from
- * RecoveryLockList.
+ * RecoveryLockTable.
  *
  * Called during WAL replay of COMMIT/ROLLBACK when in hot standby mode,
  * to remove any AccessExclusiveLocks requested by a transaction.
@@ -681,6 +734,10 @@ StandbyReleaseLockTree(TransactionId xid, int nsubxids, TransactionId *subxids)
 {
 	int			i;
 
+	/* we needn't bother trying if there's no locks */
+	if (RecoveryLockCount == 0)
+		return;
+
 	StandbyReleaseLocks(xid);
 
 	for (i = 0; i < nsubxids; i++)
@@ -697,27 +754,37 @@ StandbyReleaseAllLocks(void)
 			   *prev,
 			   *next;
 	LOCKTAG		locktag;
+	size_t		i;
 
 	elog(trace_recovery(DEBUG2), "release all standby locks");
 
-	prev = NULL;
-	for (cell = list_head(RecoveryLockList); cell; cell = next)
+	if (RecoveryLockCount == 0)
+		return;
+
+	for (i = 0; i < RECOVERYLOCKTABLE_SIZE; i++)
 	{
-		xl_standby_lock *lock = (xl_standby_lock *) lfirst(cell);
+		prev = NULL;
+		for (cell = list_head(RecoveryLockTable[i]); cell; cell = next)
+		{
+			xl_standby_lock *lock = (xl_standby_lock *) lfirst(cell);
 
-		next = lnext(cell);
+			next = lnext(cell);
 
-		elog(trace_recovery(DEBUG4),
-			 "releasing recovery lock: xid %u db %u rel %u",
-			 lock->xid, lock->dbOid, lock->relOid);
-		SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid);
-		if (!LockRelease(&locktag, AccessExclusiveLock, true))
-			elog(LOG,
-				 "RecoveryLockList contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
+			elog(trace_recovery(DEBUG4),
+				 "releasing recovery lock: xid %u db %u rel %u",
 				 lock->xid, lock->dbOid, lock->relOid);
-		RecoveryLockList = list_delete_cell(RecoveryLockList, cell, prev);
-		pfree(lock);
+			SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid);
+			if (!LockRelease(&locktag, AccessExclusiveLock, true))
+				elog(LOG,
+					 "RecoveryLockTable contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
+					 lock->xid, lock->dbOid, lock->relOid);
+			RecoveryLockTable[i] =
+						list_delete_cell(RecoveryLockTable[i], cell, prev);
+			RecoveryLockCount--;
+			pfree(lock);
+		}
 	}
+	Assert(RecoveryLockCount == 0);
 }
 
 /*
@@ -732,55 +799,64 @@ StandbyReleaseOldLocks(int nxids, TransactionId *xids)
 			   *prev,
 			   *next;
 	LOCKTAG		locktag;
+	size_t		i;
 
-	prev = NULL;
-	for (cell = list_head(RecoveryLockList); cell; cell = next)
-	{
-		xl_standby_lock *lock = (xl_standby_lock *) lfirst(cell);
-		bool		remove = false;
+	if (RecoveryLockCount == 0)
+		return;
 
-		next = lnext(cell);
+	for (i = 0; i < RECOVERYLOCKTABLE_SIZE; i++)
+	{
+		prev = NULL;
+		for (cell = list_head(RecoveryLockTable[i]); cell; cell = next)
+		{
+			xl_standby_lock *lock = (xl_standby_lock *) lfirst(cell);
+			bool		remove = false;
 
-		Assert(TransactionIdIsValid(lock->xid));
+			next = lnext(cell);
 
-		if (StandbyTransactionIdIsPrepared(lock->xid))
-			remove = false;
-		else
-		{
-			int			i;
-			bool		found = false;
+			Assert(TransactionIdIsValid(lock->xid));
 
-			for (i = 0; i < nxids; i++)
+			if (StandbyTransactionIdIsPrepared(lock->xid))
+				remove = false;
+			else
 			{
-				if (lock->xid == xids[i])
+				int			i;
+				bool		found = false;
+
+				for (i = 0; i < nxids; i++)
 				{
-					found = true;
-					break;
+					if (lock->xid == xids[i])
+					{
+						found = true;
+						break;
+					}
 				}
-			}
 
-			/*
-			 * If its not a running transaction, remove it.
-			 */
-			if (!found)
-				remove = true;
-		}
+				/*
+				 * If its not a running transaction, remove it.
+				 */
+				if (!found)
+					remove = true;
+			}
 
-		if (remove)
-		{
-			elog(trace_recovery(DEBUG4),
-				 "releasing recovery lock: xid %u db %u rel %u",
-				 lock->xid, lock->dbOid, lock->relOid);
-			SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid);
-			if (!LockRelease(&locktag, AccessExclusiveLock, true))
-				elog(LOG,
-					 "RecoveryLockList contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
+			if (remove)
+			{
+				elog(trace_recovery(DEBUG4),
+					 "releasing recovery lock: xid %u db %u rel %u",
 					 lock->xid, lock->dbOid, lock->relOid);
-			RecoveryLockList = list_delete_cell(RecoveryLockList, cell, prev);
-			pfree(lock);
+				SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid);
+				if (!LockRelease(&locktag, AccessExclusiveLock, true))
+					elog(LOG,
+						 "RecoveryLockTable contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
+						 lock->xid, lock->dbOid, lock->relOid);
+				RecoveryLockTable[i] =
+						list_delete_cell(RecoveryLockTable[i], cell, prev);
+				RecoveryLockCount--;
+				pfree(lock);
+			}
+			else
+				prev = cell;
 		}
-		else
-			prev = cell;
 	}
 }
 
#6David Rowley
david.rowley@2ndquadrant.com
In reply to: Simon Riggs (#4)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 7 March 2017 at 23:20, Simon Riggs <simon@2ndquadrant.com> wrote:

On 7 March 2017 at 10:01, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 2 March 2017 at 16:06, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 1, 2017 at 5:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

Hackers,

I've attached a small patch which aims to improve the performance of
AccessExclusiveLock when there are many AccessExclusiveLock stored.

Looks useful.

Thanks.

That may need tweaking. Likely it could be smaller if we had some sort
of bloom filter to mark if the transaction had obtained any AEL locks,
that way it could skip. Initially I really didn't want to make the
patch too complex. I had thought that a fairly large hash table would
fix the problem well enough, as quite possibly most buckets would be
empty and non empty buckets have short lists.

ISTM that we should mark each COMMIT if it has an AEL, so we can avoid
any overhead in the common case.

So an additional sub-record for the commit/abort wal record, via
include/access/xact.h

That would be ideal if we could do that, but doing that for so many
possible transaction IDs seems impractical. I had previously thought I
could do this in a lossy way with bloom filters for the hash table,
but I'd failed to realise at that time that I'd be unable to unmark
the bloom filter bit when releasing the locks, as that bit may be used
by other xid.

4.
In some cases, it might slow down the performance where you have to
traverse the complete hash table like StandbyReleaseOldLocks, however,
I think those cases will be less relative to other lock release calls
which are called at every commit, so probably this is okay.

Yes, that may be true. Perhaps making the table smaller would help
bring that back to what it was, or perhaps the patch needs rethought
to use bloom filters to help identify if we need to release any locks.

Opinions on that would be welcome. Meanwhile I'll go off and play with
bloom filters to see if I can knock some percentage points of the perf
report for StandbyReleaseLocks().

Didn't look at the code closely, but if the common case COMMITs don't
scan the list that would be most of the problem solved. If we did need
a hash table, we should just use a normal hash table?

I've tried to address that to some degree and added code which skips
the lookups when there's no AEL locks held at all. There's a
possibility for speeding up replay of COMMIT of a transaction with
many sub transactions where no AEL locks are held at all. See
StandbyReleaseLockTree().

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Simon Riggs
simon@2ndquadrant.com
In reply to: David Rowley (#6)
1 attachment(s)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 7 March 2017 at 19:22, David Rowley <david.rowley@2ndquadrant.com> wrote:

That may need tweaking. Likely it could be smaller if we had some sort
of bloom filter to mark if the transaction had obtained any AEL locks,
that way it could skip. Initially I really didn't want to make the
patch too complex. I had thought that a fairly large hash table would
fix the problem well enough, as quite possibly most buckets would be
empty and non empty buckets have short lists.

ISTM that we should mark each COMMIT if it has an AEL, so we can avoid
any overhead in the common case.

So an additional sub-record for the commit/abort wal record, via
include/access/xact.h

That would be ideal if we could do that, but doing that for so many
possible transaction IDs seems impractical.

Don't understand this. I'm talking about setting a flag on
commit/abort WAL records, like the attached.

We just need to track info so we can set the flag at EOXact and we're done.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

partial.patchapplication/octet-stream; name=partial.patchDownload
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 82f9a3c..bf055a2 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5427,7 +5427,8 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 		 * via their top-level xid only, so no need to provide subxact list,
 		 * which will save time when replaying commits.
 		 */
-		StandbyReleaseLockTree(xid, 0, NULL);
+		if (parsed->xinfo & XACT_XINFO_HAS_AE_LOCKS)
+			StandbyReleaseLockTree(xid, 0, NULL);
 	}
 
 	if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
@@ -5563,7 +5564,8 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid)
 		/*
 		 * Release locks, if any. There are no invalidations to send.
 		 */
-		StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);
+		if (parsed->xinfo & XACT_XINFO_HAS_AE_LOCKS)
+			StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);
 	}
 
 	/* Make sure files supposed to be dropped are dropped */
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index e7d1191..d8ae939 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -137,6 +137,7 @@ typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
 #define XACT_XINFO_HAS_INVALS			(1U << 3)
 #define XACT_XINFO_HAS_TWOPHASE			(1U << 4)
 #define XACT_XINFO_HAS_ORIGIN			(1U << 5)
+#define XACT_XINFO_HAS_AE_LOCKS			(1U << 6)
 
 /*
  * Also stored in xinfo, these indicating a variety of additional actions that
#8Andres Freund
andres@anarazel.de
In reply to: David Rowley (#5)
Re: Patch to improve performance of replay of AccessExclusiveLock

Hi,

On 2017-03-08 00:15:05 +1300, David Rowley wrote:

-static List *RecoveryLockList;
+/*
+ * RecoveryLockTable is a poor man's hash table that allows us to partition
+ * the stored locks. Which partition a lock is stored in is determined by the
+ * xid which the lock belongs to. The hash function is very simplistic and
+ * merely performs a binary AND against the final 0-based partition number.
+ * Splitting into partitions in this way avoids having to look through all
+ * locks to find one specific to a given xid.
+ */
+static List **RecoveryLockTable;

Why are we open coding this? That strikes me as a bad plan...

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9David Rowley
david.rowley@2ndquadrant.com
In reply to: Simon Riggs (#7)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 8 March 2017 at 01:13, Simon Riggs <simon@2ndquadrant.com> wrote:

Don't understand this. I'm talking about setting a flag on
commit/abort WAL records, like the attached.

There's nothing setting a flag in the attached. I only see the
checking of the flag.

We just need to track info so we can set the flag at EOXact and we're done.

Do you have an idea where to store that information? I'd assume we'd
want to store something during LogAccessExclusiveLocks() and check
that in XactLogCommitRecord() and XactLogAbortRecord(). I don't see
anything existing which might help with that today. Do you think I
should introduce some new global variable for that? Or do you propose
calling GetRunningTransactionLocks() again while generating the
Commit/Abort record?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Simon Riggs
simon@2ndquadrant.com
In reply to: David Rowley (#9)
1 attachment(s)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 8 March 2017 at 10:02, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 8 March 2017 at 01:13, Simon Riggs <simon@2ndquadrant.com> wrote:

Don't understand this. I'm talking about setting a flag on
commit/abort WAL records, like the attached.

There's nothing setting a flag in the attached. I only see the
checking of the flag.

Yeh, it wasn't a full patch, just a way marker to the summit for you.

We just need to track info so we can set the flag at EOXact and we're done.

Do you have an idea where to store that information? I'd assume we'd
want to store something during LogAccessExclusiveLocks() and check
that in XactLogCommitRecord() and XactLogAbortRecord(). I don't see
anything existing which might help with that today. Do you think I
should introduce some new global variable for that? Or do you propose
calling GetRunningTransactionLocks() again while generating the
Commit/Abort record?

Seemed easier to write it than explain further. Please see what you think.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

hs_skip_overhead_AEL_on_standby.v1.patchapplication/octet-stream; name=hs_skip_overhead_AEL_on_standby.v1.patchDownload
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 0a8edb9..1727c56 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -2056,11 +2056,15 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	/* See notes in RecordTransactionCommit */
 	MyPgXact->delayChkpt = true;
 
-	/* Emit the XLOG commit record */
+	/*
+	 * Emit the XLOG commit record. Note that we mark 2PC commits as potentially
+	 * having AccessExclusiveLocks since we don't know whether or not they do.
+	 */
 	recptr = XactLogCommitRecord(committs,
 								 nchildren, children, nrels, rels,
 								 ninvalmsgs, invalmsgs,
 								 initfileinval, false,
+								 true,
 								 xid);
 
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 82f9a3c..b45e919 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -115,6 +115,11 @@ TransactionId *ParallelCurrentXids;
  */
 bool		MyXactAccessedTempRel = false;
 
+/*
+ * MyXactAcquiredAccessExclusiveLock is set when we log an
+ * AccessExclusiveLock. (Global so we can set from standby.c)
+ */
+bool		MyXactAcquiredAccessExclusiveLock = false;
 
 /*
  *	transaction states - transaction state from server perspective
@@ -1231,6 +1236,7 @@ RecordTransactionCommit(void)
 							nchildren, children, nrels, rels,
 							nmsgs, invalMessages,
 							RelcacheInitFileInval, forceSyncCommit,
+							MyXactAcquiredAccessExclusiveLock,
 							InvalidTransactionId /* plain commit */ );
 
 		if (replorigin)
@@ -1846,6 +1852,7 @@ StartTransaction(void)
 	XactIsoLevel = DefaultXactIsoLevel;
 	forceSyncCommit = false;
 	MyXactAccessedTempRel = false;
+	MyXactAcquiredAccessExclusiveLock = false;
 
 	/*
 	 * reinitialize within-transaction counters
@@ -5100,7 +5107,8 @@ xactGetCommittedChildren(TransactionId **ptr)
  * Log the commit record for a plain or twophase transaction commit.
  *
  * A 2pc commit will be emitted when twophase_xid is valid, a plain one
- * otherwise.
+ * otherwise. 2pc commits set acquiredAccessExclusiveLock, which is a
+ * false positive but that's OK since it was just for performance.
  */
 XLogRecPtr
 XactLogCommitRecord(TimestampTz commit_time,
@@ -5108,6 +5116,7 @@ XactLogCommitRecord(TimestampTz commit_time,
 					int nrels, RelFileNode *rels,
 					int nmsgs, SharedInvalidationMessage *msgs,
 					bool relcacheInval, bool forceSync,
+					bool acquiredAccessExclusiveLock,
 					TransactionId twophase_xid)
 {
 	xl_xact_commit xlrec;
@@ -5139,6 +5148,8 @@ XactLogCommitRecord(TimestampTz commit_time,
 		xl_xinfo.xinfo |= XACT_COMPLETION_UPDATE_RELCACHE_FILE;
 	if (forceSyncCommit)
 		xl_xinfo.xinfo |= XACT_COMPLETION_FORCE_SYNC_COMMIT;
+	if (acquiredAccessExclusiveLock)
+		xl_xinfo.xinfo |= XACT_XINFO_HAS_AE_LOCKS;
 
 	/*
 	 * Check if the caller would like to ask standbys for immediate feedback
@@ -5427,7 +5438,8 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 		 * via their top-level xid only, so no need to provide subxact list,
 		 * which will save time when replaying commits.
 		 */
-		StandbyReleaseLockTree(xid, 0, NULL);
+		if (parsed->xinfo & XACT_XINFO_HAS_AE_LOCKS)
+			StandbyReleaseLockTree(xid, 0, NULL);
 	}
 
 	if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
@@ -5563,7 +5575,8 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid)
 		/*
 		 * Release locks, if any. There are no invalidations to send.
 		 */
-		StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);
+		if (parsed->xinfo & XACT_XINFO_HAS_AE_LOCKS)
+			StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);
 	}
 
 	/* Make sure files supposed to be dropped are dropped */
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 6259070..0ec81eb 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -1063,6 +1063,7 @@ LogAccessExclusiveLock(Oid dbOid, Oid relOid)
 	xlrec.relOid = relOid;
 
 	LogAccessExclusiveLocks(1, &xlrec);
+	MyXactAcquiredAccessExclusiveLock = true;
 }
 
 /*
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index e7d1191..3c9348c 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -73,6 +73,7 @@ extern int	synchronous_commit;
 
 /* Kluge for 2PC support */
 extern bool MyXactAccessedTempRel;
+extern bool MyXactAcquiredAccessExclusiveLock;
 
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
@@ -137,6 +138,7 @@ typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
 #define XACT_XINFO_HAS_INVALS			(1U << 3)
 #define XACT_XINFO_HAS_TWOPHASE			(1U << 4)
 #define XACT_XINFO_HAS_ORIGIN			(1U << 5)
+#define XACT_XINFO_HAS_AE_LOCKS			(1U << 6)
 
 /*
  * Also stored in xinfo, these indicating a variety of additional actions that
@@ -364,6 +366,7 @@ extern XLogRecPtr XactLogCommitRecord(TimestampTz commit_time,
 					int nrels, RelFileNode *rels,
 					int nmsgs, SharedInvalidationMessage *msgs,
 					bool relcacheInval, bool forceSync,
+					bool acquiredAccessExclusiveLock,
 					TransactionId twophase_xid);
 
 extern XLogRecPtr XactLogAbortRecord(TimestampTz abort_time,
#11David Rowley
david.rowley@2ndquadrant.com
In reply to: Simon Riggs (#10)
1 attachment(s)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 16 March 2017 at 13:31, Simon Riggs <simon@2ndquadrant.com> wrote:

On 8 March 2017 at 10:02, David Rowley <david.rowley@2ndquadrant.com>
wrote:

On 8 March 2017 at 01:13, Simon Riggs <simon@2ndquadrant.com> wrote:

Don't understand this. I'm talking about setting a flag on
commit/abort WAL records, like the attached.

There's nothing setting a flag in the attached. I only see the
checking of the flag.

Yeh, it wasn't a full patch, just a way marker to the summit for you.

We just need to track info so we can set the flag at EOXact and we're

done.

Do you have an idea where to store that information? I'd assume we'd
want to store something during LogAccessExclusiveLocks() and check
that in XactLogCommitRecord() and XactLogAbortRecord(). I don't see
anything existing which might help with that today. Do you think I
should introduce some new global variable for that? Or do you propose
calling GetRunningTransactionLocks() again while generating the
Commit/Abort record?

Seemed easier to write it than explain further. Please see what you think.

Thanks. I had been looking for some struct to store the flag in. I'd not
considered just adding a new global variable.

I was in fact just working on this again earlier, and I had come up with
the attached.

I was just trying to verify if it was going to do the correct thing in
regards to subtransactions and I got a little confused at the comments at
the top of StandbyAcquireAccessExclusiveLock():

* We record the lock against the top-level xid, rather than individual
* subtransaction xids. This means AccessExclusiveLocks held by aborted
* subtransactions are not released as early as possible on standbys.

if this is true, and it seems to be per LogAccessExclusiveLock(),
does StandbyReleaseLockTree() need to release the locks for sub
transactions too?

This code:

for (i = 0; i < nsubxids; i++)
StandbyReleaseLocks(subxids[i]);

FWIW I tested the attached and saw that with my test cases I showed earlier
that StandbyReleaseLockTree() was down to 0.74% in the perf report. Likely
the gain will be even better if I ran a few more CPUs on small simple
transactions which are not taking Access Exclusive locks, so I agree this
is a better way forward than what I had in my original patch. Your patch
will have the same effect, so much better than the 1.74% I saw in my perf
report for the 1024 bucket hash table.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

mark_xact_ael.patchapplication/octet-stream; name=mark_xact_ael.patchDownload
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 82f9a3c..4f6551d 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -186,6 +186,7 @@ typedef struct TransactionStateData
 	bool		prevXactReadOnly;		/* entry-time xact r/o state */
 	bool		startedInRecovery;		/* did we start in recovery? */
 	bool		didLogXid;		/* has xid been included in WAL record? */
+	bool		didLogAELock;	/* did we log any access exclusive locks? */
 	int			parallelModeLevel;		/* Enter/ExitParallelMode counter */
 	struct TransactionStateData *parent;		/* back link to parent */
 } TransactionStateData;
@@ -217,6 +218,7 @@ static TransactionStateData TopTransactionStateData = {
 	false,						/* entry-time xact r/o state */
 	false,						/* startedInRecovery */
 	false,						/* didLogXid */
+	false,						/* didLogAELock */
 	0,							/* parallelMode */
 	NULL						/* link to parent state block */
 };
@@ -447,6 +449,17 @@ MarkCurrentTransactionIdLoggedIfAny(void)
 		CurrentTransactionState->didLogXid = true;
 }
 
+/*
+ * MarkCurrentTransactionIDLoggedAELocks
+ *
+ * Remember if the current xid logged any Access Exclusive Locks
+ */
+void
+MarkCurrentTransactionIDLoggedAELocks(void)
+{
+	if (TransactionIdIsValid(CurrentTransactionState->transactionId))
+		CurrentTransactionState->didLogAELock = true;
+}
 
 /*
  *	GetStableLatestTransactionId
@@ -5191,6 +5204,9 @@ XactLogCommitRecord(TimestampTz commit_time,
 		xl_origin.origin_timestamp = replorigin_session_origin_timestamp;
 	}
 
+	if (CurrentTransactionState->didLogAELock)
+		xl_xinfo.xinfo |= XACT_XINFO_HAS_AE_LOCKS;
+
 	if (xl_xinfo.xinfo != 0)
 		info |= XLOG_XACT_HAS_INFO;
 
@@ -5325,6 +5341,9 @@ XactLogAbortRecord(TimestampTz abort_time,
 	if (xl_xinfo.xinfo & XACT_XINFO_HAS_TWOPHASE)
 		XLogRegisterData((char *) (&xl_twophase), sizeof(xl_xact_twophase));
 
+	if (CurrentTransactionState->didLogAELock)
+		xl_xinfo.xinfo |= XACT_XINFO_HAS_AE_LOCKS;
+
 	return XLogInsert(RM_XACT_ID, info);
 }
 
@@ -5427,7 +5446,8 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 		 * via their top-level xid only, so no need to provide subxact list,
 		 * which will save time when replaying commits.
 		 */
-		StandbyReleaseLockTree(xid, 0, NULL);
+		if (parsed->xinfo & XACT_XINFO_HAS_AE_LOCKS)
+			StandbyReleaseLockTree(xid, 0, NULL);
 	}
 
 	if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
@@ -5563,7 +5583,8 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid)
 		/*
 		 * Release locks, if any. There are no invalidations to send.
 		 */
-		StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);
+		if (parsed->xinfo & XACT_XINFO_HAS_AE_LOCKS)
+			StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);
 	}
 
 	/* Make sure files supposed to be dropped are dropped */
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 6259070..d7bd084 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -1040,6 +1040,7 @@ LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks)
 	XLogRegisterData((char *) &xlrec, offsetof(xl_standby_locks, locks));
 	XLogRegisterData((char *) locks, nlocks * sizeof(xl_standby_lock));
 	XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT);
+	MarkCurrentTransactionIDLoggedAELocks();
 
 	(void) XLogInsert(RM_STANDBY_ID, XLOG_STANDBY_LOCK);
 }
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index e7d1191..0bb0f87 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -137,6 +137,7 @@ typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
 #define XACT_XINFO_HAS_INVALS			(1U << 3)
 #define XACT_XINFO_HAS_TWOPHASE			(1U << 4)
 #define XACT_XINFO_HAS_ORIGIN			(1U << 5)
+#define XACT_XINFO_HAS_AE_LOCKS			(1U << 6)
 
 /*
  * Also stored in xinfo, these indicating a variety of additional actions that
@@ -316,6 +317,7 @@ extern TransactionId GetCurrentTransactionIdIfAny(void);
 extern TransactionId GetStableLatestTransactionId(void);
 extern SubTransactionId GetCurrentSubTransactionId(void);
 extern void MarkCurrentTransactionIdLoggedIfAny(void);
+extern void MarkCurrentTransactionIDLoggedAELocks(void);
 extern bool SubTransactionIsActive(SubTransactionId subxid);
 extern CommandId GetCurrentCommandId(bool used);
 extern TimestampTz GetCurrentTransactionStartTimestamp(void);
#12Amit Kapila
amit.kapila16@gmail.com
In reply to: David Rowley (#11)
Re: Patch to improve performance of replay of AccessExclusiveLock

On Thu, Mar 16, 2017 at 7:22 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

On 16 March 2017 at 13:31, Simon Riggs <simon@2ndquadrant.com> wrote:

On 8 March 2017 at 10:02, David Rowley <david.rowley@2ndquadrant.com>
wrote:

On 8 March 2017 at 01:13, Simon Riggs <simon@2ndquadrant.com> wrote:

Don't understand this. I'm talking about setting a flag on
commit/abort WAL records, like the attached.

There's nothing setting a flag in the attached. I only see the
checking of the flag.

Yeh, it wasn't a full patch, just a way marker to the summit for you.

We just need to track info so we can set the flag at EOXact and we're
done.

Do you have an idea where to store that information? I'd assume we'd
want to store something during LogAccessExclusiveLocks() and check
that in XactLogCommitRecord() and XactLogAbortRecord(). I don't see
anything existing which might help with that today. Do you think I
should introduce some new global variable for that? Or do you propose
calling GetRunningTransactionLocks() again while generating the
Commit/Abort record?

Seemed easier to write it than explain further. Please see what you think.

Thanks. I had been looking for some struct to store the flag in. I'd not
considered just adding a new global variable.

I was in fact just working on this again earlier, and I had come up with the
attached.

I think this will not work if you take AEL in one of the
subtransaction and then commit the transaction. The reason is that
you are using CurrentTransactionState to remember AEL locks and during
commit (at the time of XactLogCommitRecord) only top level
CurrentTransactionState will be available which is not same as
subtransactions CurrentTransactionState. You can try with a simple
test like below:

Create table t1(c1 int);
Begin;
insert into t1 values(1);
savepoint s1;
insert into t1 values(2);
savepoint s2;
insert into t1 values(3);
Lock t1 in Access Exclusive mode;
commit;

Now, we might want to store this info in top level transaction state
to make such cases work, but I am not sure if it is better than what
Simon has proposed in his approach. Although, I have not evaluated in
detail, but it seems Simon's patch looks better way to solve this
problem (In his patch, there is an explicit handling of two-phase
commits which seems more robust).

Few other comments on patch:
1.
@@ -5325,6 +5341,9 @@ XactLogAbortRecord(TimestampTz abort_time,
if (xl_xinfo.xinfo &
XACT_XINFO_HAS_TWOPHASE)
XLogRegisterData((char *) (&xl_twophase), sizeof
(xl_xact_twophase));

+ if (CurrentTransactionState->didLogAELock)
+ xl_xinfo.xinfo |=
XACT_XINFO_HAS_AE_LOCKS;

You need to set xl_info before the below line in XactLogAbortRecord()
to get it logged properly.
if (xl_xinfo.xinfo != 0)
info |= XLOG_XACT_HAS_INFO;

2.
Explicitly Initialize didLogAELock in StartTransaction as we do for didLogXid.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Simon Riggs (#10)
Re: Patch to improve performance of replay of AccessExclusiveLock

On Thu, Mar 16, 2017 at 6:01 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 8 March 2017 at 10:02, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 8 March 2017 at 01:13, Simon Riggs <simon@2ndquadrant.com> wrote:

Don't understand this. I'm talking about setting a flag on
commit/abort WAL records, like the attached.

There's nothing setting a flag in the attached. I only see the
checking of the flag.

Yeh, it wasn't a full patch, just a way marker to the summit for you.

We just need to track info so we can set the flag at EOXact and we're done.

Do you have an idea where to store that information? I'd assume we'd
want to store something during LogAccessExclusiveLocks() and check
that in XactLogCommitRecord() and XactLogAbortRecord(). I don't see
anything existing which might help with that today. Do you think I
should introduce some new global variable for that? Or do you propose
calling GetRunningTransactionLocks() again while generating the
Commit/Abort record?

Seemed easier to write it than explain further. Please see what you think.

@@ -5563,7 +5575,8 @@ xact_redo_abort(xl_xact_parsed_abort *parsed,
TransactionId xid)
  /*
  * Release locks, if any. There are no invalidations to send.
  */
- StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);
+ if (parsed->xinfo & XACT_XINFO_HAS_AE_LOCKS)
+ StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);

The patch uses XACT_XINFO_HAS_AE_LOCKS during abort replay, but during
normal operation (XactLogAbortRecord()), it doesn't seem to log the
information. Is this an oversight or do you have some reason for
doing so?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Simon Riggs
simon@2ndquadrant.com
In reply to: David Rowley (#11)
1 attachment(s)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 16 March 2017 at 09:52, David Rowley <david.rowley@2ndquadrant.com> wrote:

I was just trying to verify if it was going to do the correct thing in
regards to subtransactions and I got a little confused at the comments at
the top of StandbyAcquireAccessExclusiveLock():

* We record the lock against the top-level xid, rather than individual
* subtransaction xids. This means AccessExclusiveLocks held by aborted
* subtransactions are not released as early as possible on standbys.

if this is true, and it seems to be per LogAccessExclusiveLock(), does
StandbyReleaseLockTree() need to release the locks for sub transactions too?

This code:

for (i = 0; i < nsubxids; i++)
StandbyReleaseLocks(subxids[i]);

Yes, you are correct, good catch. It's not broken, but the code does
nothing, slowly.

We have two choices... remove this code or assign locks against subxids.

1. In xact_redo_abort() we can just pass NULL to
StandbyReleaseLockTree(), keeping the other code as is for later
fixing by another approach.

2. In LogAccessExclusiveLock() we can use GetCurrentTransactionId()
rather than GetTopTransactionId(), so that we assign the lock to the
subxid rather than the top xid. That could increase lock traffic, but
less likely. It also solves the problem of early release when AELs
held by subxids.

(2) looks safe enough, so patch attached.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

assign_aels_against_subxids.v1.patchapplication/octet-stream; name=assign_aels_against_subxids.v1.patchDownload
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 6259070..90e71d3 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -590,10 +590,6 @@ StandbyLockTimeoutHandler(void)
  * RelationLockList, so we can keep track of the various entries made by
  * the Startup process's virtual xid in the shared lock table.
  *
- * We record the lock against the top-level xid, rather than individual
- * subtransaction xids. This means AccessExclusiveLocks held by aborted
- * subtransactions are not released as early as possible on standbys.
- *
  * List elements use type xl_rel_lock, since the WAL record type exactly
  * matches the information that we need to keep track of.
  *
@@ -1052,7 +1048,7 @@ LogAccessExclusiveLock(Oid dbOid, Oid relOid)
 {
 	xl_standby_lock xlrec;
 
-	xlrec.xid = GetTopTransactionId();
+	xlrec.xid = GetCurrentTransactionId();
 
 	/*
 	 * Decode the locktag back to the original values, to avoid sending lots
@@ -1083,7 +1079,7 @@ LogAccessExclusiveLockPrepare(void)
 	 * GetRunningTransactionLocks() might see a lock associated with an
 	 * InvalidTransactionId which we later assert cannot happen.
 	 */
-	(void) GetTopTransactionId();
+	(void) GetCurrentTransactionId();
 }
 
 /*
#15Simon Riggs
simon@2ndquadrant.com
In reply to: Amit Kapila (#13)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 16 March 2017 at 19:08, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Mar 16, 2017 at 6:01 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 8 March 2017 at 10:02, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 8 March 2017 at 01:13, Simon Riggs <simon@2ndquadrant.com> wrote:

Don't understand this. I'm talking about setting a flag on
commit/abort WAL records, like the attached.

There's nothing setting a flag in the attached. I only see the
checking of the flag.

Yeh, it wasn't a full patch, just a way marker to the summit for you.

We just need to track info so we can set the flag at EOXact and we're done.

Do you have an idea where to store that information? I'd assume we'd
want to store something during LogAccessExclusiveLocks() and check
that in XactLogCommitRecord() and XactLogAbortRecord(). I don't see
anything existing which might help with that today. Do you think I
should introduce some new global variable for that? Or do you propose
calling GetRunningTransactionLocks() again while generating the
Commit/Abort record?

Seemed easier to write it than explain further. Please see what you think.

@@ -5563,7 +5575,8 @@ xact_redo_abort(xl_xact_parsed_abort *parsed,
TransactionId xid)
/*
* Release locks, if any. There are no invalidations to send.
*/
- StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);
+ if (parsed->xinfo & XACT_XINFO_HAS_AE_LOCKS)
+ StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);

The patch uses XACT_XINFO_HAS_AE_LOCKS during abort replay, but during
normal operation (XactLogAbortRecord()), it doesn't seem to log the
information. Is this an oversight or do you have some reason for
doing so?

Good catch, thanks. Yes, I changed my mind on whether it was needed
during implementation.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Simon Riggs
simon@2ndquadrant.com
In reply to: David Rowley (#11)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 16 March 2017 at 09:52, David Rowley <david.rowley@2ndquadrant.com> wrote:

Seemed easier to write it than explain further. Please see what you think.

Thanks. I had been looking for some struct to store the flag in. I'd not
considered just adding a new global variable.

As Amit says, I don't see the gain from adding that to each xact state.

I'd suggest refactoring my patch so that the existign
MyXactAccessedTempRel becomes MyXactFlags and we can just set a flag
in the two cases (temp rels and has-aels). That way we still have one
Global but it doesn't get any uglier than it already is.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17David Rowley
david.rowley@2ndquadrant.com
In reply to: Simon Riggs (#14)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 18 March 2017 at 21:52, Simon Riggs <simon@2ndquadrant.com> wrote:

2. In LogAccessExclusiveLock() we can use GetCurrentTransactionId()
rather than GetTopTransactionId(), so that we assign the lock to the
subxid rather than the top xid. That could increase lock traffic, but
less likely. It also solves the problem of early release when AELs
held by subxids.

(2) looks safe enough, so patch attached.

I might be missing something here, but what's the point in doing this
if we're not releasing the lock any earlier?

Going by the other patch you posted we're only recording if the top
level xact has an AEL, so we'll never try to release the locks at the
end of the subxact, since the new flag bit won't be set when the
subxact ends.

Seems it'll just be better to remove the dead code from
StandbyReleaseLockTree(), as that'll just mean one pass over the
RecoveryLockList when it comes to releasing it at the end the top
level transaction. Probably we'd want to just move all of
StandbyReleaseLocks()'s code into StandbyReleaseLockTree(), since the
StandbyReleaseLockTree() would otherwise just end up calling the
static function.

Probably I misunderstood this again. Let me know if that's the case.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Kapila (#12)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 17 March 2017 at 00:04, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Mar 16, 2017 at 7:22 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

On 16 March 2017 at 13:31, Simon Riggs <simon@2ndquadrant.com> wrote:

On 8 March 2017 at 10:02, David Rowley <david.rowley@2ndquadrant.com>
wrote:

On 8 March 2017 at 01:13, Simon Riggs <simon@2ndquadrant.com> wrote:

Don't understand this. I'm talking about setting a flag on
commit/abort WAL records, like the attached.

There's nothing setting a flag in the attached. I only see the
checking of the flag.

Yeh, it wasn't a full patch, just a way marker to the summit for you.

We just need to track info so we can set the flag at EOXact and we're
done.

Do you have an idea where to store that information? I'd assume we'd
want to store something during LogAccessExclusiveLocks() and check
that in XactLogCommitRecord() and XactLogAbortRecord(). I don't see
anything existing which might help with that today. Do you think I
should introduce some new global variable for that? Or do you propose
calling GetRunningTransactionLocks() again while generating the
Commit/Abort record?

Seemed easier to write it than explain further. Please see what you think.

Thanks. I had been looking for some struct to store the flag in. I'd not
considered just adding a new global variable.

I was in fact just working on this again earlier, and I had come up with the
attached.

I think this will not work if you take AEL in one of the
subtransaction and then commit the transaction. The reason is that
you are using CurrentTransactionState to remember AEL locks and during
commit (at the time of XactLogCommitRecord) only top level
CurrentTransactionState will be available which is not same as
subtransactions CurrentTransactionState. You can try with a simple
test like below:

Create table t1(c1 int);
Begin;
insert into t1 values(1);
savepoint s1;
insert into t1 values(2);
savepoint s2;
insert into t1 values(3);
Lock t1 in Access Exclusive mode;
commit;

Now, we might want to store this info in top level transaction state
to make such cases work, but I am not sure if it is better than what
Simon has proposed in his approach. Although, I have not evaluated in
detail, but it seems Simon's patch looks better way to solve this
problem (In his patch, there is an explicit handling of two-phase
commits which seems more robust).

Few other comments on patch:
1.
@@ -5325,6 +5341,9 @@ XactLogAbortRecord(TimestampTz abort_time,
if (xl_xinfo.xinfo &
XACT_XINFO_HAS_TWOPHASE)
XLogRegisterData((char *) (&xl_twophase), sizeof
(xl_xact_twophase));

+ if (CurrentTransactionState->didLogAELock)
+ xl_xinfo.xinfo |=
XACT_XINFO_HAS_AE_LOCKS;

You need to set xl_info before the below line in XactLogAbortRecord()
to get it logged properly.
if (xl_xinfo.xinfo != 0)
info |= XLOG_XACT_HAS_INFO;

2.
Explicitly Initialize didLogAELock in StartTransaction as we do for didLogXid.

Thanks for looking over this Amit. I'll drop my version of the patch
and make the changes described by Simon about moving
MyXactAccessedTempRel and MyXactAcquiredAccessExclusiveLock into a
bitmap fiags variable.

Thanks for confirming my concerns about the subxacts. I was confused
over how it was meant to work due to StandbyReleaseLockTree() having
some dead code.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19David Rowley
david.rowley@2ndquadrant.com
In reply to: Simon Riggs (#16)
4 attachment(s)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 18 March 2017 at 21:59, Simon Riggs <simon@2ndquadrant.com> wrote:

As Amit says, I don't see the gain from adding that to each xact state.

I'd suggest refactoring my patch so that the existign
MyXactAccessedTempRel becomes MyXactFlags and we can just set a flag
in the two cases (temp rels and has-aels). That way we still have one
Global but it doesn't get any uglier than it already is.

OK, now that I understand things a bit better, I've gone over your
patch and changing things around that it combines the two variables
into a flags variable which will be better suited for future
expansion.

I also fixed up the missing code from the Abort transaction. This is patch 0001

0002:

I was thinking it might be a good idea to back patch the patch which
removes the bogus code from StandbyReleaseLockTree(). Any users of
sub-transactions are currently paying a price that they've no reason
to, especially so when there's AELs held by other concurrent
transactions. We already know this code is slow, having this mistake
in there is not helping any.

0003:

Is intended to be patched atop of 0002 (for master only) and revises
this code further to remove the StandbyReleaseLockTree() function. To
me it seems better to do this to clear up any confusion about what's
going on here. This patch does close the door a little on tracking
AELs on sub-xacts. I really think if we're going to do that, then it
won't be that hard to put it back again. Seems better to be consistent
here and not leave any code around that causes people to be confused
about the same thing as I was confused about. This patch does get rid
of StandbyReleaseLockTree() which isn't static. Are we likely to break
any 3rd party code by doing that?

0004:

Again master only, is a further small optimization and of
StandbyReleaseLocks() to further tighten the slow loop over each held
lock. I also think that it's better to be more explicit about the case
of when the function would release all locks. Although I'm not all
that sure in which case the function will actually be called with an
invalid xid.

Patch series is attached.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Speed-up-replay-of-Access-Exclusive-Locks-on-hot-sta.patchapplication/octet-stream; name=0001-Speed-up-replay-of-Access-Exclusive-Locks-on-hot-sta.patchDownload
From 932b7e0873e8938cd3ef73be6f944774120d0ef9 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Mon, 20 Mar 2017 20:06:26 +1300
Subject: [PATCH 1/4] Speed up replay of Access Exclusive Locks on hot standby

A hot standby replica keeps a list of Access Exclusive locks for a top
level transaction. These locks are released when the top level transaction
ends. Searching of this list is O(n2), and each transaction had to pay the
price of searching this list for locks, even if it didn't take any AE
locks itself.

This patch optimizes this case by having the master server track which
transactions took AE locks, and passes that along to the standby server in
the commit/abort record. This allows the standby to only try to release
locks for transactions which actually took any.

Author: Simon Riggs, slightly revised by David Rowley
---
 src/backend/access/heap/heapam.c      |  4 ++--
 src/backend/access/transam/twophase.c | 12 ++++++++++--
 src/backend/access/transam/xact.c     | 33 +++++++++++++++++++++------------
 src/backend/commands/tablecmds.c      |  2 +-
 src/backend/storage/ipc/standby.c     |  1 +
 src/include/access/xact.h             | 27 ++++++++++++++++++++++++---
 6 files changed, 59 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8526137..b147f64 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1132,7 +1132,7 @@ relation_open(Oid relationId, LOCKMODE lockmode)
 
 	/* Make note that we've accessed a temporary relation */
 	if (RelationUsesLocalBuffers(r))
-		MyXactAccessedTempRel = true;
+		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPREL;
 
 	pgstat_initstats(r);
 
@@ -1178,7 +1178,7 @@ try_relation_open(Oid relationId, LOCKMODE lockmode)
 
 	/* Make note that we've accessed a temporary relation */
 	if (RelationUsesLocalBuffers(r))
-		MyXactAccessedTempRel = true;
+		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPREL;
 
 	pgstat_initstats(r);
 
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index f09941d..4b4999f 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -2065,11 +2065,15 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	/* See notes in RecordTransactionCommit */
 	MyPgXact->delayChkpt = true;
 
-	/* Emit the XLOG commit record */
+	/*
+	 * Emit the XLOG commit record. Note that we mark 2PC commits as potentially
+	 * having AccessExclusiveLocks since we don't know whether or not they do.
+	 */
 	recptr = XactLogCommitRecord(committs,
 								 nchildren, children, nrels, rels,
 								 ninvalmsgs, invalmsgs,
 								 initfileinval, false,
+						 MyXactFlags | XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK,
 								 xid);
 
 
@@ -2146,10 +2150,14 @@ RecordTransactionAbortPrepared(TransactionId xid,
 
 	START_CRIT_SECTION();
 
-	/* Emit the XLOG abort record */
+	/*
+	 * Emit the XLOG commit record. Note that we mark 2PC aborts as potentially
+	 * having AccessExclusiveLocks since we don't know whether or not they do.
+	 */
 	recptr = XactLogAbortRecord(GetCurrentTimestamp(),
 								nchildren, children,
 								nrels, rels,
+						 MyXactFlags | XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK,
 								xid);
 
 	/* Always flush, since we're about to remove the 2PC state file */
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 02e0779..c8751c6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -109,12 +109,13 @@ int			nParallelCurrentXids = 0;
 TransactionId *ParallelCurrentXids;
 
 /*
- * MyXactAccessedTempRel is set when a temporary relation is accessed.
- * We don't allow PREPARE TRANSACTION in that case.  (This is global
- * so that it can be set from heapam.c.)
+ * Miscellaneous flag bits to record events which occur on the top level
+ * transaction. These flags are only persisted in MyXactFlags and are intended
+ * so we remember to do certain things later on in the transaction. This is
+ * globally accessible, so can be set from anywhere in the code that requires
+ * recording flags.
  */
-bool		MyXactAccessedTempRel = false;
-
+int  MyXactFlags;
 
 /*
  *	transaction states - transaction state from server perspective
@@ -1231,6 +1232,7 @@ RecordTransactionCommit(void)
 							nchildren, children, nrels, rels,
 							nmsgs, invalMessages,
 							RelcacheInitFileInval, forceSyncCommit,
+							MyXactFlags,
 							InvalidTransactionId /* plain commit */ );
 
 		if (replorigin)
@@ -1583,7 +1585,7 @@ RecordTransactionAbort(bool isSubXact)
 	XactLogAbortRecord(xact_time,
 					   nchildren, children,
 					   nrels, rels,
-					   InvalidTransactionId);
+					   MyXactFlags, InvalidTransactionId);
 
 	/*
 	 * Report the latest async abort LSN, so that the WAL writer knows to
@@ -1845,7 +1847,7 @@ StartTransaction(void)
 	XactDeferrable = DefaultXactDeferrable;
 	XactIsoLevel = DefaultXactIsoLevel;
 	forceSyncCommit = false;
-	MyXactAccessedTempRel = false;
+	MyXactFlags = 0;
 
 	/*
 	 * reinitialize within-transaction counters
@@ -2260,7 +2262,7 @@ PrepareTransaction(void)
 	 * cases, such as a temp table created and dropped all within the
 	 * transaction.  That seems to require much more bookkeeping though.
 	 */
-	if (MyXactAccessedTempRel)
+	if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPREL))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("cannot PREPARE a transaction that has operated on temporary tables")));
@@ -5108,7 +5110,7 @@ XactLogCommitRecord(TimestampTz commit_time,
 					int nrels, RelFileNode *rels,
 					int nmsgs, SharedInvalidationMessage *msgs,
 					bool relcacheInval, bool forceSync,
-					TransactionId twophase_xid)
+					int xactflags, TransactionId twophase_xid)
 {
 	xl_xact_commit xlrec;
 	xl_xact_xinfo xl_xinfo;
@@ -5139,6 +5141,8 @@ XactLogCommitRecord(TimestampTz commit_time,
 		xl_xinfo.xinfo |= XACT_COMPLETION_UPDATE_RELCACHE_FILE;
 	if (forceSyncCommit)
 		xl_xinfo.xinfo |= XACT_COMPLETION_FORCE_SYNC_COMMIT;
+	if ((xactflags & XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK))
+		xl_xinfo.xinfo |= XACT_XINFO_HAS_AE_LOCKS;
 
 	/*
 	 * Check if the caller would like to ask standbys for immediate feedback
@@ -5251,7 +5255,7 @@ XLogRecPtr
 XactLogAbortRecord(TimestampTz abort_time,
 				   int nsubxacts, TransactionId *subxacts,
 				   int nrels, RelFileNode *rels,
-				   TransactionId twophase_xid)
+				   int xactflags, TransactionId twophase_xid)
 {
 	xl_xact_abort xlrec;
 	xl_xact_xinfo xl_xinfo;
@@ -5276,6 +5280,9 @@ XactLogAbortRecord(TimestampTz abort_time,
 
 	xlrec.xact_time = abort_time;
 
+	if ((xactflags & XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK))
+		xl_xinfo.xinfo |= XACT_XINFO_HAS_AE_LOCKS;
+
 	if (nsubxacts > 0)
 	{
 		xl_xinfo.xinfo |= XACT_XINFO_HAS_SUBXACTS;
@@ -5427,7 +5434,8 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 		 * via their top-level xid only, so no need to provide subxact list,
 		 * which will save time when replaying commits.
 		 */
-		StandbyReleaseLockTree(xid, 0, NULL);
+		if (parsed->xinfo & XACT_XINFO_HAS_AE_LOCKS)
+			StandbyReleaseLockTree(xid, 0, NULL);
 	}
 
 	if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
@@ -5563,7 +5571,8 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid)
 		/*
 		 * Release locks, if any. There are no invalidations to send.
 		 */
-		StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);
+		if (parsed->xinfo & XACT_XINFO_HAS_AE_LOCKS)
+			StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);
 	}
 
 	/* Make sure files supposed to be dropped are dropped */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 86329e5..3b28e8c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -12471,7 +12471,7 @@ PreCommit_on_commit_actions(void)
 				 * relations, we can skip truncating ON COMMIT DELETE ROWS
 				 * tables, as they must still be empty.
 				 */
-				if (MyXactAccessedTempRel)
+				if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPREL))
 					oids_to_truncate = lappend_oid(oids_to_truncate, oc->relid);
 				break;
 			case ONCOMMIT_DROP:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 6259070..7be8fd4 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -1063,6 +1063,7 @@ LogAccessExclusiveLock(Oid dbOid, Oid relOid)
 	xlrec.relOid = relOid;
 
 	LogAccessExclusiveLocks(1, &xlrec);
+	MyXactFlags |= XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK;
 }
 
 /*
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index e7d1191..5b37c05 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -71,8 +71,27 @@ typedef enum
 /* Synchronous commit level */
 extern int	synchronous_commit;
 
-/* Kluge for 2PC support */
-extern bool MyXactAccessedTempRel;
+/*
+ * Miscellaneous flag bits to record events which occur on the top level
+ * transaction. These flags are only persisted in MyXactFlags and are intended
+ * so we remember to do certain things later in the transaction. This is
+ * globally accessible, so can be set from anywhere in the code which requires
+ * recording flags.
+ */
+extern int  MyXactFlags;
+
+/*
+ * XACT_FLAGS_ACCESSEDTEMPREL - set when a temporary relation is accessed. We
+ * don't allow PREPARE TRANSACTION in that case.
+ */
+#define XACT_FLAGS_ACCESSEDTEMPREL				(1U << 0)
+
+/*
+ * XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK - records whether the top level xact
+ * logged any Access Exclusive Locks.
+ */
+#define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
+
 
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
@@ -137,6 +156,7 @@ typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
 #define XACT_XINFO_HAS_INVALS			(1U << 3)
 #define XACT_XINFO_HAS_TWOPHASE			(1U << 4)
 #define XACT_XINFO_HAS_ORIGIN			(1U << 5)
+#define XACT_XINFO_HAS_AE_LOCKS			(1U << 6)
 
 /*
  * Also stored in xinfo, these indicating a variety of additional actions that
@@ -364,12 +384,13 @@ extern XLogRecPtr XactLogCommitRecord(TimestampTz commit_time,
 					int nrels, RelFileNode *rels,
 					int nmsgs, SharedInvalidationMessage *msgs,
 					bool relcacheInval, bool forceSync,
+					int xactflags,
 					TransactionId twophase_xid);
 
 extern XLogRecPtr XactLogAbortRecord(TimestampTz abort_time,
 				   int nsubxacts, TransactionId *subxacts,
 				   int nrels, RelFileNode *rels,
-				   TransactionId twophase_xid);
+				   int xactflags, TransactionId twophase_xid);
 extern void xact_redo(XLogReaderState *record);
 
 /* xactdesc.c */
-- 
1.9.5.msysgit.1

0002-Remove-bogus-code-to-release-locks-of-subxacts.patchapplication/octet-stream; name=0002-Remove-bogus-code-to-release-locks-of-subxacts.patchDownload
From c9ecf8adcf26ea02701e68ce40f150ff23f2ed1a Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Mon, 20 Mar 2017 20:26:13 +1300
Subject: [PATCH 2/4] Remove bogus code to release locks of subxacts

This code mistakenly assumed that sub transactions could have Access
Exclusive Locks stored in the RecoveryLockList, but
StandbyAcquireAccessExclusiveLock clearly states that the locks are stored
against the top level xid only. LogAccessExclusiveLock() agreed on this.

The code in question was doing no harm other than slowing down the replay of
all transaction commit/abort records of transactions which had any
subtransactions when any running transaction existed which held AELs.

Author: David Rowley
---
 src/backend/storage/ipc/standby.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 7be8fd4..60b0513 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -670,8 +670,11 @@ StandbyReleaseLocks(TransactionId xid)
 }
 
 /*
- * Release locks for a transaction tree, starting at xid down, from
- * RecoveryLockList.
+ * Release locks for a transaction tree.
+ *
+ * Technically we only track Access Exclusive Locks against the top level
+ * transaction, so all we must do here is release the locks for a single
+ * transaction ID, and ignore any subtransactions.
  *
  * Called during WAL replay of COMMIT/ROLLBACK when in hot standby mode,
  * to remove any AccessExclusiveLocks requested by a transaction.
@@ -679,12 +682,12 @@ StandbyReleaseLocks(TransactionId xid)
 void
 StandbyReleaseLockTree(TransactionId xid, int nsubxids, TransactionId *subxids)
 {
-	int			i;
-
 	StandbyReleaseLocks(xid);
 
-	for (i = 0; i < nsubxids; i++)
-		StandbyReleaseLocks(subxids[i]);
+	/*
+	 * If we ever are to track AELs on the subxact which the lock belongs to
+	 * then here marks the spot of where those should be released.
+	 */
 }
 
 /*
-- 
1.9.5.msysgit.1

0003-Get-rid-of-StandbyReleaseLockTree.patchapplication/octet-stream; name=0003-Get-rid-of-StandbyReleaseLockTree.patchDownload
From 0e3741181c354932a914161f84a740d8c21bf218 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Mon, 20 Mar 2017 20:52:27 +1300
Subject: [PATCH 3/4] Get rid of StandbyReleaseLockTree

This function didn't really have to release the lock tree as the locks
were really only stored against the top level transaction. Instead let's
just use StandbyReleaseLocks(), which should help clear up any confusion.

Author: David Rowley
---
 src/backend/access/transam/twophase.c |  2 +-
 src/backend/access/transam/xact.c     |  8 +++-----
 src/backend/storage/ipc/standby.c     | 29 +++++++----------------------
 src/include/storage/standby.h         |  3 +--
 4 files changed, 12 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 4b4999f..14c8be9 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -2014,7 +2014,7 @@ RecoverPreparedTransactions(void)
 			 * additional locks at any one time.
 			 */
 			if (InHotStandby)
-				StandbyReleaseLockTree(xid, hdr->nsubxacts, subxids);
+				StandbyReleaseLocks(xid);
 
 			/*
 			 * We're done with recovering this transaction. Clear
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index c8751c6..02a864e 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5430,12 +5430,10 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 		/*
 		 * Release locks, if any. We do this for both two phase and normal one
 		 * phase transactions. In effect we are ignoring the prepare phase and
-		 * just going straight to lock release. At commit we release all locks
-		 * via their top-level xid only, so no need to provide subxact list,
-		 * which will save time when replaying commits.
+		 * just going straight to lock release.
 		 */
 		if (parsed->xinfo & XACT_XINFO_HAS_AE_LOCKS)
-			StandbyReleaseLockTree(xid, 0, NULL);
+			StandbyReleaseLocks(xid);
 	}
 
 	if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
@@ -5572,7 +5570,7 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid)
 		 * Release locks, if any. There are no invalidations to send.
 		 */
 		if (parsed->xinfo & XACT_XINFO_HAS_AE_LOCKS)
-			StandbyReleaseLockTree(xid, parsed->nsubxacts, parsed->subxacts);
+			StandbyReleaseLocks(xid);
 	}
 
 	/* Make sure files supposed to be dropped are dropped */
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 60b0513..26f6671 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -631,7 +631,13 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
 	LockAcquireExtended(&locktag, AccessExclusiveLock, true, false, false);
 }
 
-static void
+/*
+ * Release locks for a transaction.
+ *
+ * Called during WAL replay of COMMIT/ROLLBACK when in hot standby mode,
+ * to remove any AccessExclusiveLocks requested by a transaction.
+ */
+void
 StandbyReleaseLocks(TransactionId xid)
 {
 	ListCell   *cell,
@@ -670,27 +676,6 @@ StandbyReleaseLocks(TransactionId xid)
 }
 
 /*
- * Release locks for a transaction tree.
- *
- * Technically we only track Access Exclusive Locks against the top level
- * transaction, so all we must do here is release the locks for a single
- * transaction ID, and ignore any subtransactions.
- *
- * Called during WAL replay of COMMIT/ROLLBACK when in hot standby mode,
- * to remove any AccessExclusiveLocks requested by a transaction.
- */
-void
-StandbyReleaseLockTree(TransactionId xid, int nsubxids, TransactionId *subxids)
-{
-	StandbyReleaseLocks(xid);
-
-	/*
-	 * If we ever are to track AELs on the subxact which the lock belongs to
-	 * then here marks the spot of where those should be released.
-	 */
-}
-
-/*
  * Called at end of recovery and when we see a shutdown checkpoint.
  */
 void
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 3ecc446..4246c93 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -47,8 +47,7 @@ extern void StandbyLockTimeoutHandler(void);
  * by transactions and running-xacts snapshots.
  */
 extern void StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid);
-extern void StandbyReleaseLockTree(TransactionId xid,
-					   int nsubxids, TransactionId *subxids);
+extern void StandbyReleaseLocks(TransactionId xid);
 extern void StandbyReleaseAllLocks(void);
 extern void StandbyReleaseOldLocks(int nxids, TransactionId *xids);
 
-- 
1.9.5.msysgit.1

0004-Small-optimization-for-StandbyReleaseLocks.patchapplication/octet-stream; name=0004-Small-optimization-for-StandbyReleaseLocks.patchDownload
From 8e8c21280c6737f1588e94d439ee9512144f34e0 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Mon, 20 Mar 2017 21:01:10 +1300
Subject: [PATCH 4/4] Small optimization for StandbyReleaseLocks

Here the code would check in if the passed in xid was a valid transaction
id for each item in the RecoveryLockList. If the xid was invalid, the loop
would end up releasing all locks in a rather round about and slightly
wasteful way.  It seems better to perform this check before the start of the
loop to further tighten the loop, which is known to be quite slow when many
AELs are stored. We can simply use StandbyReleaseAllLocks() which also
makes the code more explicit about what its trying to do.

Author: David Rowley
---
 src/backend/storage/ipc/standby.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 26f6671..dc2c03f 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -644,8 +644,15 @@ StandbyReleaseLocks(TransactionId xid)
 			   *prev,
 			   *next;
 
+	/* release all locks when xid is not valid */
+	if (!TransactionIdIsValid(xid))
+	{
+		StandbyReleaseAllLocks();
+		return;
+	}
+
 	/*
-	 * Release all matching locks and remove them from list
+	 * Otherwise release all matching locks and remove them from list
 	 */
 	prev = NULL;
 	for (cell = list_head(RecoveryLockList); cell; cell = next)
@@ -654,7 +661,7 @@ StandbyReleaseLocks(TransactionId xid)
 
 		next = lnext(cell);
 
-		if (!TransactionIdIsValid(xid) || lock->xid == xid)
+		if (lock->xid == xid)
 		{
 			LOCKTAG		locktag;
 
-- 
1.9.5.msysgit.1

#20Simon Riggs
simon@2ndquadrant.com
In reply to: David Rowley (#19)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 20 March 2017 at 08:31, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 18 March 2017 at 21:59, Simon Riggs <simon@2ndquadrant.com> wrote:

As Amit says, I don't see the gain from adding that to each xact state.

I'd suggest refactoring my patch so that the existign
MyXactAccessedTempRel becomes MyXactFlags and we can just set a flag
in the two cases (temp rels and has-aels). That way we still have one
Global but it doesn't get any uglier than it already is.

OK, now that I understand things a bit better, I've gone over your
patch and changing things around that it combines the two variables
into a flags variable which will be better suited for future
expansion.

I also fixed up the missing code from the Abort transaction. This is patch 0001

This looks good to go, will final check and apply. Thanks,

0002:

I was thinking it might be a good idea to back patch the patch which
removes the bogus code from StandbyReleaseLockTree(). Any users of
sub-transactions are currently paying a price that they've no reason
to, especially so when there's AELs held by other concurrent
transactions. We already know this code is slow, having this mistake
in there is not helping any.

0003:

Is intended to be patched atop of 0002 (for master only) and revises
this code further to remove the StandbyReleaseLockTree() function. To
me it seems better to do this to clear up any confusion about what's
going on here. This patch does close the door a little on tracking
AELs on sub-xacts. I really think if we're going to do that, then it
won't be that hard to put it back again. Seems better to be consistent
here and not leave any code around that causes people to be confused
about the same thing as I was confused about. This patch does get rid
of StandbyReleaseLockTree() which isn't static. Are we likely to break
any 3rd party code by doing that?

We've solved the performance problem due to your insight, so ISTM ok
now to enable AELs on subxids, since not having them causes locks to
be held for too long, which is a problem also. Rearranging the code
doesn't gain us any further performance, but as you say it prevents
prevents us solving the AELs on subxids problem.

Given that, do you agree to me applying assign_aels_against_subxids.v1.patch
as well?

We can discuss any backpatches after the CF is done.

0004:

Again master only, is a further small optimization and of
StandbyReleaseLocks() to further tighten the slow loop over each held
lock. I also think that it's better to be more explicit about the case
of when the function would release all locks. Although I'm not all
that sure in which case the function will actually be called with an
invalid xid.

Patch series is attached.

This does improve things a little more, but we already hit 95% of the
gain, so I'd prefer to keep the code as similar as possible.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21David Rowley
david.rowley@2ndquadrant.com
In reply to: Simon Riggs (#20)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 22 March 2017 at 22:27, Simon Riggs <simon@2ndquadrant.com> wrote:

On 20 March 2017 at 08:31, David Rowley <david.rowley@2ndquadrant.com> wrote:

0003:

Is intended to be patched atop of 0002 (for master only) and revises
this code further to remove the StandbyReleaseLockTree() function. To
me it seems better to do this to clear up any confusion about what's
going on here. This patch does close the door a little on tracking
AELs on sub-xacts. I really think if we're going to do that, then it
won't be that hard to put it back again. Seems better to be consistent
here and not leave any code around that causes people to be confused
about the same thing as I was confused about. This patch does get rid
of StandbyReleaseLockTree() which isn't static. Are we likely to break
any 3rd party code by doing that?

We've solved the performance problem due to your insight, so ISTM ok
now to enable AELs on subxids, since not having them causes locks to
be held for too long, which is a problem also. Rearranging the code
doesn't gain us any further performance, but as you say it prevents
prevents us solving the AELs on subxids problem.

Given that, do you agree to me applying assign_aels_against_subxids.v1.patch
as well?

Does applying assign_aels_against_subxids.v1.patch still need to keep
the loop to release the subxacts? Won't this be gone already with the
subxact commit/abort record replays?
This does possibly mean that we perform more loops over the
RecoveryLockList even if the subxact does not have an AELs, but its
parent xact does. Wonder if this is a good price to pay for releasing
the locks earlier?

We can discuss any backpatches after the CF is done.

0004:

Again master only, is a further small optimization and of
StandbyReleaseLocks() to further tighten the slow loop over each held
lock. I also think that it's better to be more explicit about the case
of when the function would release all locks. Although I'm not all
that sure in which case the function will actually be called with an
invalid xid.

Patch series is attached.

This does improve things a little more, but we already hit 95% of the
gain, so I'd prefer to keep the code as similar as possible.

Seems fair. Its just a blemish that I saw every time I looked at the
code and was inspired not to see it anymore. I agree that there won't
be much gain from it.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Simon Riggs
simon@2ndquadrant.com
In reply to: David Rowley (#21)
Re: Patch to improve performance of replay of AccessExclusiveLock

On 22 March 2017 at 13:19, David Rowley <david.rowley@2ndquadrant.com> wrote:

Given that, do you agree to me applying assign_aels_against_subxids.v1.patch
as well?

Does applying assign_aels_against_subxids.v1.patch still need to keep
the loop to release the subxacts? Won't this be gone already with the
subxact commit/abort record replays?

No it is still required because aborts and commits might have
subcommitted subxids.

This does possibly mean that we perform more loops over the
RecoveryLockList even if the subxact does not have an AELs, but its
parent xact does. Wonder if this is a good price to pay for releasing
the locks earlier?

We'd be performing the same number of loops as we do now. It's just
now they would have a purpose.

But we aren't doing it at all unless the top level xid has at least
one AEL, so the bulk of the problem is gone.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers