suboverflowed subtransactions concurrency performance optimize

Started by Pengchengliuover 4 years ago30 messages
#1Pengchengliu
pengchengliu@tju.edu.cn
2 attachment(s)

Hi hackers,
I wrote a patch to resolve the subtransactions concurrency performance
problems when suboverflowed.

When we use more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions per
transaction concurrency, it will lead to subtransactions performance
problems.
All backend will be stuck at acquiring lock SubtransSLRULock.

The reproduce steps in PG master branch:

1, init a cluster, append postgresql.conf as below:

max_connections = '2500'
max_files_per_process = '2000'
max_locks_per_transaction = '64'
max_parallel_maintenance_workers = '8'
max_parallel_workers = '60'
max_parallel_workers_per_gather = '6'
max_prepared_transactions = '15000'
max_replication_slots = '10'
max_wal_senders = '64'
max_worker_processes = '250'
shared_buffers = 8GB

2, create table and insert some records as below:

CREATE UNLOGGED TABLE contend (
id integer,
val integer NOT NULL
)
WITH (fillfactor='50');

INSERT INTO contend (id, val)
SELECT i, 0
FROM generate_series(1, 10000) AS i;

VACUUM (ANALYZE) contend;

3, The script subtrans_128.sql in attachment. use pgbench with
subtrans_128.sql as below.
pgbench -d postgres -p 33800 -n -r -f subtrans_128.sql -c 500 -j 500 -T
3600

4, After for a while, we can get the stuck result. We can query
pg_stat_activity. All backends wait event is SubtransSLRULock.
We can use pert top and try find the root cause. The result of perf top
as below:
66.20% postgres [.] pg_atomic_compare_exchange_u32_impl
29.30% postgres [.] pg_atomic_fetch_sub_u32_impl
1.46% postgres [.] pg_atomic_read_u32
1.34% postgres [.] TransactionIdIsCurrentTransactionId
0.75% postgres [.] SimpleLruReadPage_ReadOnly
0.14% postgres [.] LWLockAttemptLock
0.14% postgres [.] LWLockAcquire
0.12% postgres [.] pg_atomic_compare_exchange_u32
0.09% postgres [.] HeapTupleSatisfiesMVCC
0.06% postgres [.] heapgetpage
0.03% postgres [.] sentinel_ok
0.03% postgres [.] XidInMVCCSnapshot
0.03% postgres [.] slot_deform_heap_tuple
0.03% postgres [.] ExecInterpExpr
0.02% postgres [.] AllocSetCheck
0.02% postgres [.] HeapTupleSatisfiesVisibility
0.02% postgres [.] LWLockRelease
0.02% postgres [.] TransactionIdPrecedes
0.02% postgres [.] SubTransGetParent
0.01% postgres [.] heapgettup_pagemode
0.01% postgres [.] CheckForSerializableConflictOutNeeded

After view the subtrans codes, it is easy to find that the global LWLock
SubtransSLRULock is the bottleneck of subtrans concurrently.

When a bakcend session assign more than PGPROC_MAX_CACHED_SUBXIDS(64)
subtransactions, we will get a snapshot with suboverflowed.
A suboverflowed snapshot does not contain all data required to determine
visibility, so PostgreSQL will occasionally have to resort to pg_subtrans.
These pages are cached in shared buffers, but you can see the overhead of
looking them up in the high rank of SimpleLruReadPage_ReadOnly in the perf
output.

To resolve this performance problem, we think about a solution which cache
SubtransSLRU to local cache.
First we can query parent transaction id from SubtransSLRU, and copy the
SLRU page to local cache page.
After that if we need query parent transaction id again, we can query it
from local cache directly.
It will reduce the number of acquire and release LWLock SubtransSLRULock
observably.

From all snapshots, we can get the latest xmin. All transaction id which
precedes this xmin, it muse has been commited/abortd.
Their parent/top transaction has been written subtrans SLRU. Then we can
cache the subtrans SLRU and copy it to local cache.

Use the same produce steps above, with our patch we cannot get the stuck
result.
Note that append our GUC parameter in postgresql.conf. This optimize is off
in default.
local_cache_subtrans_pages=128

The patch is base on PG master branch
0d906b2c0b1f0d625ff63d9ace906556b1c66a68

Our project in https://github.com/ADBSQL/AntDB, Welcome to follow us,
AntDB, AsiaInfo's PG-based distributed database product

Thanks
Pengcheng

Attachments:

subtrans_128.sqlapplication/octet-stream; name=subtrans_128.sqlDownload
subtrans_local_optimize.patchtext/plain; name=subtrans_local_optimize.patchDownload
diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c
index 01ee7ac6d2..9b047c863a 100644
--- a/src/backend/access/rmgrdesc/standbydesc.c
+++ b/src/backend/access/rmgrdesc/standbydesc.c
@@ -129,6 +129,8 @@ standby_desc_invalidations(StringInfo buf,
 			appendStringInfo(buf, " relmap db %u", msg->rm.dbId);
 		else if (msg->id == SHAREDINVALSNAPSHOT_ID)
 			appendStringInfo(buf, " snapshot %u", msg->sn.relId);
+		else if (msg->id == SUBTRANS_INVALID_PAGE_ID)
+			appendStringInfo(buf, " subtrans page %u", msg->spp.pageno);
 		else
 			appendStringInfo(buf, " unrecognized id %d", msg->id);
 	}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6a8e521f89..339425f5b7 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,7 +33,9 @@
 #include "access/transam.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
-
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/inval.h"
 
 /*
  * Defines for SubTrans page sizes.  A page is the same BLCKSZ as is used
@@ -53,7 +55,23 @@
 
 #define TransactionIdToPage(xid) ((xid) / (TransactionId) SUBTRANS_XACTS_PER_PAGE)
 #define TransactionIdToEntry(xid) ((xid) % (TransactionId) SUBTRANS_XACTS_PER_PAGE)
+#define SubtransMaxUsageCount	25
+#define SubtransSubUsageCount	5
 
+typedef struct LocalBufferData
+{
+	int		page_id;		/* for debug and check */
+	bool	in_htab;		/* in SubtransLocalBuffHtab? */
+	uint8	usage_count;	/* for remove */
+	uint16	valid_offset;	/* how many entry is valid */
+	TransactionId	xids[SUBTRANS_XACTS_PER_PAGE];
+} LocalBufferData,*LocalBuffer;
+
+typedef struct LocalBufferHash
+{
+	int			page_id;
+	LocalBuffer	lbuffer;
+} LocalBufferHash;
 
 /*
  * Link to shared-memory data structures for SUBTRANS control
@@ -62,10 +80,392 @@ static SlruCtlData SubTransCtlData;
 
 #define SubTransCtl  (&SubTransCtlData)
 
+static HTAB *SubtransLocalBuffHtab = NULL;
+static LocalBuffer subtrans_local_buffers = NULL;
+static LocalBuffer subtrans_local_buf_end;
+static LocalBuffer subtrans_local_buf_next;
+int local_cache_subtrans_page_num = 0;
+int slru_subtrans_page_num = 32;
 
 static int	ZeroSUBTRANSPage(int pageno);
 static bool SubTransPagePrecedes(int page1, int page2);
+static LocalBuffer SubtransAllocLocalBuffer(void);
+
+#define HASH_PAGE_NO(no_)	(no_)
+static uint32
+hash_page_no(const void *key, Size keysize)
+{
+	Assert(keysize == sizeof(uint32));
+	return HASH_PAGE_NO(*(uint32*)key);
+}
+
+/* create subtrans local htab */
+static HTAB*
+CreateSubtransLocalHtab(long nelem, LocalBuffer *header)
+{
+	HASHCTL		hctl;
+	HTAB	   *htab;
+	LocalBuffer	lbuffers;
+	Assert(nelem > 0);
+
+	memset(&hctl, 0, sizeof(hctl));
+	hctl.keysize = sizeof(((LocalBufferHash*)0)->page_id);
+	hctl.entrysize = sizeof(LocalBufferHash);
+	hctl.hcxt = TopMemoryContext;
+	hctl.hash = hash_page_no;
+
+	htab = hash_create("hash subtrans local buffer",
+					   nelem,
+					   &hctl,
+					   HASH_ELEM | HASH_CONTEXT | HASH_FUNCTION);
+	lbuffers = MemoryContextAllocExtended(TopMemoryContext,
+										  sizeof(LocalBufferData) * nelem,
+										  MCXT_ALLOC_HUGE|MCXT_ALLOC_NO_OOM|MCXT_ALLOC_ZERO);
+	if (lbuffers == NULL)
+	{
+		hash_destroy(htab);
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+	}
+
+	*header = lbuffers;
+	return htab;
+}
+
+static int
+compare_lbuffer_index(const void *a, const void *b)
+{
+	LocalBuffer l = &subtrans_local_buffers[*(uint32*)a];
+	LocalBuffer r = &subtrans_local_buffers[*(uint32*)b];
+
+	StaticAssertStmt(sizeof(l->usage_count) == 1 && offsetof(LocalBufferData, usage_count) == offsetof(LocalBufferData, in_htab) + 1,
+					 "Subtrans LocalBufferData stmt error");
+	/* compare the field in_htab and usage_count in LocalBuffer */
+	return memcmp(&r->in_htab, &l->in_htab, 2);
+}
+
+/* handle reset GUC local_cache_subtrans_page_num */
+void
+SubtransResetGucCacheNum(int newval, void *extra)
+{
+	HTAB		  *volatile htabNew;
+	LocalBufferData *volatile lbuffers_new;
+	LocalBuffer				lbuffer;
+	HASH_SEQ_STATUS			seq_status;
+	LocalBufferHash		   *old_local_hash;
+	LocalBufferHash		   *new_local_hash;
+	bool					found;
+
+	if (local_cache_subtrans_page_num == newval ||
+		SubtransLocalBuffHtab == NULL)
+		return;
+
+	if (local_cache_subtrans_page_num == 0 || newval == 0)
+	{
+		if (SubtransLocalBuffHtab != NULL)
+		{
+			hash_destroy(SubtransLocalBuffHtab);
+			SubtransLocalBuffHtab = NULL;
+			pfree(subtrans_local_buffers);
+			subtrans_local_buffers = NULL;
+			subtrans_local_buf_end = NULL;
+			subtrans_local_buf_next = NULL;
+		}
+		return;
+	}
+
+	htabNew = CreateSubtransLocalHtab(newval, (LocalBuffer*)&lbuffers_new);
+	PG_TRY();
+	{
+		/* when new value more than old value. */
+		if (newval > local_cache_subtrans_page_num)
+		{
+			/* traverse the old htab, and copy it to new htab */
+			hash_seq_init(&seq_status, SubtransLocalBuffHtab);
+			while ((old_local_hash = hash_seq_search(&seq_status)) != NULL)
+			{
+				Assert(old_local_hash->lbuffer->page_id == old_local_hash->page_id);
+				new_local_hash = hash_search_with_hash_value(htabNew,
+															 &old_local_hash->page_id,
+															 HASH_PAGE_NO(old_local_hash->page_id),
+															 HASH_ENTER,
+															 &found);
+				Assert(found == false);
+				new_local_hash->lbuffer = &lbuffers_new[old_local_hash->lbuffer - subtrans_local_buffers];
+			}
+			memcpy(lbuffers_new, subtrans_local_buffers, sizeof(lbuffers_new[0]) * local_cache_subtrans_page_num);
+		}else
+		{
+			uint32	i,index;
+			uint32 *indexes = palloc(sizeof(uint32) * local_cache_subtrans_page_num);
+			for (i=0;i<local_cache_subtrans_page_num;++i)
+				indexes[i] = i;
+			/* qsort the local buffer index descending order, according in_htab and usage_count in local buffer, */
+			qsort(indexes, local_cache_subtrans_page_num, sizeof(indexes[0]), compare_lbuffer_index);
+
+			/* copy the older local buffer to the new local buffer. */
+			for (i=0;i<newval;++i)
+			{
+				index = indexes[i];
+				memcpy(&lbuffers_new[i], &subtrans_local_buffers[index], sizeof(lbuffers_new[0]));
+			}
+
+			/* add new local buffer to new htab. */
+			for (i=0;i<newval;++i)
+			{
+				lbuffer = &lbuffers_new[i];
+				if (lbuffer->in_htab == false)
+					break;
+				new_local_hash = hash_search_with_hash_value(htabNew,
+															 &lbuffer->page_id,
+															 HASH_PAGE_NO(lbuffer->page_id),
+															 HASH_ENTER,
+															 &found);
+				Assert(found == false);
+				new_local_hash->lbuffer = lbuffer;
+			}
+			pfree(indexes);
+		}
+	}PG_CATCH();
+	{
+		hash_destroy(htabNew);
+		pfree(lbuffers_new);
+		PG_RE_THROW();
+	}PG_END_TRY();
+
+	/* update subtrans_local_buffers, subtrans_local_buf_next and subtrans_local_buf_end. */
+	subtrans_local_buf_next = &lbuffers_new[subtrans_local_buf_next - subtrans_local_buffers];
+	if (subtrans_local_buf_next - lbuffers_new >= newval)
+		subtrans_local_buf_next = lbuffers_new;
+	subtrans_local_buf_end = &lbuffers_new[newval];
+
+	hash_destroy(SubtransLocalBuffHtab);
+	SubtransLocalBuffHtab = htabNew;
+	pfree(subtrans_local_buffers);
+	subtrans_local_buffers = lbuffers_new;
+}
+
+/*
+ * create local subtrans hash table.
+ */
+static void
+SubtransCreateLocalBufHtab(void)
+{
+	StaticAssertStmt(SUBTRANS_XACTS_PER_PAGE <= UINT16_MAX,
+					 "must change type of LocalBufferData::valid_offset");
+
+	Assert(local_cache_subtrans_page_num > 0);
+	SubtransLocalBuffHtab = CreateSubtransLocalHtab(local_cache_subtrans_page_num,
+													&subtrans_local_buffers);
+	subtrans_local_buf_end = &subtrans_local_buffers[local_cache_subtrans_page_num];
+	subtrans_local_buf_next = subtrans_local_buffers;
+}
+
+/* According xmin and pageno, calculate the valid off set */
+/* All xids which precedes valid off set in this page, we can get its parent xid */
+static inline uint16
+SubtransGetPageValidOffset(int pageno)
+{
+	TransactionId xmin;
+	TransactionId page_xlast;
+	TransactionId page_xfirst;
+
+	/* The xid start and end in this page */
+	page_xfirst = ((TransactionId)pageno) * SUBTRANS_XACTS_PER_PAGE;
+	page_xlast = page_xfirst + (SUBTRANS_XACTS_PER_PAGE - 1);
+
+	xmin = GetSnapshotsXmin();
+	if (xmin == InvalidTransactionId ||
+		TransactionIdPrecedes(xmin, page_xfirst))
+	{
+		return 0;
+	}else if (TransactionIdPrecedes(page_xlast, xmin))
+	{
+		return SUBTRANS_XACTS_PER_PAGE;
+	}
+	return xmin - page_xfirst;
+}
+
+/*
+ * Allocate a local buffer from local buffer array.
+ * If there are unused local buffer left, we can use it directly.
+ * If all local buffer have beed used, pick a victim and reomve it from htab.
+ */
+static LocalBuffer
+SubtransAllocLocalBuffer(void)
+{
+refind_:
+	if(unlikely(subtrans_local_buf_next >= subtrans_local_buf_end))
+		subtrans_local_buf_next = subtrans_local_buffers;
+
+	/* Get a unused local buffer, or get a local buffer which usage_count equals 0 */
+	if (subtrans_local_buf_next->in_htab == false ||
+		subtrans_local_buf_next->usage_count == 0)
+	{
+		LocalBuffer result = subtrans_local_buf_next++;
+		if (result->in_htab)
+		{
+			/* if this local buffer has been used beofre, we should remove it from htab. */
+			LocalBufferHash *item PG_USED_FOR_ASSERTS_ONLY;
+			item = hash_search_with_hash_value(SubtransLocalBuffHtab,
+											   &result->page_id,
+											   HASH_PAGE_NO(result->page_id),
+											   HASH_REMOVE,
+											   NULL);
+			Assert(item && item->lbuffer == result);
+		}
+		result->in_htab = false;
+		return result;
+	}
+
+	/* When usage_count more than SubtransSubUsageCount, minus SubtransSubUsageCount. */
+	/* Then asap get a victim page which usage_count equals 0 */
+	if (subtrans_local_buf_next->usage_count > SubtransSubUsageCount)
+		subtrans_local_buf_next->usage_count -= SubtransSubUsageCount;
+	else
+		subtrans_local_buf_next->usage_count = 0;
+	++subtrans_local_buf_next;
+	goto refind_;
+}
+
+//#define DEBUG_LBUF_UPDATE	1
+#ifdef DEBUG_LBUF_UPDATE
+static void
+DEBUG_LOG_LBUF_UPDATE(LocalBuffer lbuffer, TransactionId xid)
+{
+	static StringInfoData msg = {.data = NULL};
+	int		i;
+
+	if (errstart(LOG, __FILE__, __LINE__, PG_FUNCNAME_MACRO, TEXTDOMAIN))
+	{
+
+		if (unlikely(msg.data == NULL))
+		{
+			MemoryContext old_contex = MemoryContextSwitchTo(TopMemoryContext);
+			initStringInfo(&msg);
+			MemoryContextSwitchTo(old_contex);
+		}
+
+		resetStringInfo(&msg);
+		for (i=0;i<lengthof(lbuffer->xids);++i)
+			appendStringInfo(&msg, " %u", lbuffer->xids[i]);
+
+		errfinish(errmsg("SUBTRANS UPDATE page=%d(%u)%s", lbuffer->page_id, xid, msg.data));
+	}
+}
+#else
+#define DEBUG_LOG_LBUF_UPDATE(lbuffer, xid)	((void)true)
+#endif
+
+/*
+ * According xid and pageid, read subtrans and copy it to local buffer.
+ * According xid and set the valid offset for local page.
+ * valid offset, it means if the transaction id precedes valid offset,
+ * we can get its parend id from local buffer.
+ */
+static inline void
+SubtransSyncLocalBuffer(LocalBuffer lbuffer, TransactionId xid)
+{
+	int		slotno;
+	uint16	valid_offset;
+	Assert(TransactionIdToPage(xid) == lbuffer->page_id);
+
+	valid_offset = SubtransGetPageValidOffset(lbuffer->page_id);
+
+	/* read the subtrans cache and copy to local buffer. */
+	slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, lbuffer->page_id, xid);
+	memcpy(lbuffer->xids, SubTransCtl->shared->page_buffer[slotno], BLCKSZ);
+	LWLockRelease(SubtransSLRULock);
+
+	/* update local buffer off set */
+	if (valid_offset > lbuffer->valid_offset)
+		lbuffer->valid_offset = valid_offset;
+
+	DEBUG_LOG_LBUF_UPDATE(lbuffer, xid);
+}
+
+/*
+ * According pageno and xid, try to get local buffer.
+ * If local cache miss, read subtrans SLRU cache and copy all pge to loal buffer. 
+ */
+static inline LocalBuffer
+SubtransReadLocalBuffer(int pageno, TransactionId xid, bool force_load, bool *foundPtr)
+{
+	LocalBufferHash *item;
+	LocalBuffer		lbuffer;
+	Assert(SubtransLocalBuffHtab != NULL);
+
+	item = hash_search_with_hash_value(SubtransLocalBuffHtab,
+									   &pageno,
+									   HASH_PAGE_NO(pageno),
+									   HASH_FIND,
+									   foundPtr);
+	if (item == NULL)
+	{
+		bool			found;
+
+		if (force_load == false)
+			return NULL;
+
+		lbuffer = SubtransAllocLocalBuffer();
+		Assert(lbuffer->in_htab == false);
+
+		lbuffer->page_id = pageno;
+		lbuffer->valid_offset = 0;
+		/* sync sutrans SLRU buffer to local buffer */
+		SubtransSyncLocalBuffer(lbuffer, xid);
+
+		/* Add local buffer to htab*/
+		item = hash_search_with_hash_value(SubtransLocalBuffHtab,
+										   &pageno,
+										   HASH_PAGE_NO(pageno),
+										   HASH_ENTER,
+										   &found);
+		Assert(found == false);
+		item->lbuffer = lbuffer;
+		lbuffer->in_htab = true;
+		lbuffer->usage_count = 1;
+	}else
+	{
+		lbuffer = item->lbuffer;
+		Assert(lbuffer->page_id == pageno && item->page_id == pageno);
+
+		/* update local buffer usage_count */
+		if (lbuffer->usage_count < SubtransMaxUsageCount)
+			++(lbuffer->usage_count);
+	}
 
+	return lbuffer;
+}
+
+/*
+ * Remove the local buffer accoding pageno.
+ */
+void SubtransRemoveLocalPage(int pageno)
+{
+	LocalBuffer		lbuffer;
+	LocalBufferHash *item PG_USED_FOR_ASSERTS_ONLY;
+
+	if (SubtransLocalBuffHtab == NULL)
+		return;
+
+	for (lbuffer=subtrans_local_buffers; lbuffer<subtrans_local_buf_end; lbuffer++)
+	{
+		if (lbuffer->in_htab &&
+			SubTransPagePrecedes(lbuffer->page_id, pageno))
+		{
+			item = hash_search_with_hash_value(SubtransLocalBuffHtab,
+											   &lbuffer->page_id,
+											   HASH_PAGE_NO(lbuffer->page_id),
+											   HASH_REMOVE,
+											   NULL);
+			Assert(item && item->lbuffer == lbuffer);
+			lbuffer->in_htab = false;
+			lbuffer->usage_count = 0;
+		}
+	}
+}
 
 /*
  * Record the parent of a subtransaction in the subtrans log.
@@ -77,6 +477,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
 	TransactionId *ptr;
+	LocalBuffer	lbuffer = NULL;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
@@ -99,7 +500,23 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
+	if (SubtransLocalBuffHtab)
+	{
+		/* get the local buffer according pageno and xid */
+		lbuffer = SubtransReadLocalBuffer(pageno, xid, false, NULL);
+		if (lbuffer != NULL)
+			memcpy(lbuffer->xids, SubTransCtl->shared->page_buffer[slotno], BLCKSZ);
+	}
+
 	LWLockRelease(SubtransSLRULock);
+
+	if (lbuffer != NULL)
+	{
+		Assert(lbuffer->page_id == pageno);
+		/* Update local valid off set */
+		lbuffer->valid_offset = SubtransGetPageValidOffset(pageno);
+		DEBUG_LOG_LBUF_UPDATE(lbuffer, xid);
+	}
 }
 
 /*
@@ -113,6 +530,7 @@ SubTransGetParent(TransactionId xid)
 	int			slotno;
 	TransactionId *ptr;
 	TransactionId parent;
+	LocalBuffer	lbuffer = NULL;
 
 	/* Can't ask about stuff that might not be around anymore */
 	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
@@ -121,6 +539,26 @@ SubTransGetParent(TransactionId xid)
 	if (!TransactionIdIsNormal(xid))
 		return InvalidTransactionId;
 
+	/* try to query local buffer first, if enable local subtrans optimize. */
+	if (local_cache_subtrans_page_num > 0)
+	{
+		bool	found;
+		if (unlikely(SubtransLocalBuffHtab == NULL))
+			SubtransCreateLocalBufHtab();
+
+		/* get the local buffer from htab or sync from SLRU cache */
+		lbuffer = SubtransReadLocalBuffer(pageno, xid, true, &found);
+		if (found == false ||								/* from shared memory */
+			entryno < lbuffer->valid_offset ||				/* or make sure value is valid */
+			TransactionIdIsValid(lbuffer->xids[entryno]))	/* or parent is valid */
+			return lbuffer->xids[entryno];
+
+		/* sync local buffer from shared memory */
+		SubtransSyncLocalBuffer(lbuffer, xid);
+
+		return lbuffer->xids[entryno];
+	}
+
 	/* lock is acquired by SimpleLruReadPage_ReadOnly */
 
 	slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid);
@@ -155,6 +593,56 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	/* Can't ask about stuff that might not be around anymore */
 	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
 
+	/* when enable local subtrans optimize */
+	if (local_cache_subtrans_page_num > 0)
+	{
+		LocalBuffer	lbuffer = NULL;
+		int			pageno;
+		int			entryno;
+		bool		found;
+
+		/* Bootstrap and frozen XIDs have no parent */
+		if (!TransactionIdIsNormal(xid))
+			return InvalidTransactionId;
+
+		if (unlikely(SubtransLocalBuffHtab == NULL))
+			SubtransCreateLocalBufHtab();
+
+		while (TransactionIdIsValid(parentXid))
+		{
+			previousXid = parentXid;
+			if (TransactionIdPrecedes(parentXid, TransactionXmin))
+				goto end_get_;
+
+			entryno = TransactionIdToEntry(parentXid);
+			pageno = TransactionIdToPage(parentXid);
+			if (lbuffer == NULL ||
+				lbuffer->page_id != pageno)
+				lbuffer = SubtransReadLocalBuffer(pageno, parentXid, true, &found);
+
+			Assert(lbuffer->page_id == pageno);
+			if (found == true &&								/* from local cache */
+				entryno >= lbuffer->valid_offset &&				/* not make sure value is valid */
+				!TransactionIdIsValid(lbuffer->xids[entryno]))	/* parent is invalid */
+			{
+				/* sync from shared memory */
+				SubtransSyncLocalBuffer(lbuffer, parentXid);
+			}
+
+			parentXid = lbuffer->xids[entryno];
+
+			/*
+			 * By convention the parent xid gets allocated first, so should always
+			 * precede the child xid. Anything else points to a corrupted data
+			 * structure that could lead to an infinite loop, so exit.
+			 */
+			if (!TransactionIdPrecedes(parentXid, previousXid))
+				elog(ERROR, "pg_subtrans contains invalid entry: xid %u points to parent xid %u",
+					 previousXid, parentXid);
+		}
+		goto end_get_;
+	}
+
 	while (TransactionIdIsValid(parentXid))
 	{
 		previousXid = parentXid;
@@ -172,6 +660,7 @@ SubTransGetTopmostTransaction(TransactionId xid)
 				 previousXid, parentXid);
 	}
 
+end_get_:
 	Assert(TransactionIdIsValid(previousXid));
 
 	return previousXid;
@@ -184,14 +673,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(slru_subtrans_page_num, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", slru_subtrans_page_num, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
@@ -351,6 +840,7 @@ TruncateSUBTRANS(TransactionId oldestXact)
 	cutoffPage = TransactionIdToPage(oldestXact);
 
 	SimpleLruTruncate(SubTransCtl, cutoffPage);
+	CacheInvalSubtransPage(cutoffPage);
 }
 
 
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 9352c68090..bf49943d7e 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -98,6 +98,7 @@
 
 #include "access/htup_details.h"
 #include "access/xact.h"
+#include "access/subtrans.h"
 #include "catalog/catalog.h"
 #include "catalog/pg_constraint.h"
 #include "miscadmin.h"
@@ -668,10 +669,33 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
 		else if (msg->sn.dbId == MyDatabaseId)
 			InvalidateCatalogSnapshot();
 	}
+	else if (msg->id == SUBTRANS_INVALID_PAGE_ID)
+	{
+		SubtransRemoveLocalPage(msg->spp.pageno);
+	}
 	else
 		elog(FATAL, "unrecognized SI message ID: %d", msg->id);
 }
 
+/*
+ *		CacheInvalSubtransPage
+ *
+ *		when truncate SUBTRANS SLRU page, send invalid msg to all backends.
+ *		Then all backends can remove subtrans from local buffer.
+ */
+void
+CacheInvalSubtransPage(int pageno)
+{
+	SharedInvalidationMessage msg;
+
+	msg.spp.id = SUBTRANS_INVALID_PAGE_ID;
+	msg.spp.pageno = pageno;
+	/* check AddCatcacheInvalidationMessage() for an explanation */
+	VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
+
+	SendSharedInvalidMessages(&msg, 1);
+}
+
 /*
  *		InvalidateSystemCaches
  *
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 467b0fd6fe..2979dc8ba4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -41,6 +41,7 @@
 #include "access/twophase.h"
 #include "access/xact.h"
 #include "access/xlog_internal.h"
+#include "access/subtrans.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
 #include "catalog/storage.h"
@@ -2673,6 +2674,28 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"local_cache_subtrans_pages", PGC_SIGHUP, RESOURCES_MEM,
+			gettext_noop("Optimize subtrans perfermance, local cache subtrans page number from SLRU."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&local_cache_subtrans_page_num,
+		0, 0, UINT32_MAX/(BLCKSZ/sizeof(TransactionId))/2,
+		NULL, SubtransResetGucCacheNum, NULL
+	},
+
+	{
+		{"slru_subtrans_pages", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for subtrans SLRU."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&slru_subtrans_page_num,
+		32, 32, INT_MAX / 2,
+		NULL, NULL, NULL
+	},
+
 	/*
 	 * See also CheckRequiredParameterValues() if this parameter changes
 	 */
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 2968c7f7b7..bf35d2db87 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -901,6 +901,40 @@ xmin_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg)
 		return 0;
 }
 
+/*
+ * Get the latest xmin from all snapshot. 
+ * It means all transaction id which precedes xmin has been commited/aborted.
+ */
+TransactionId
+GetSnapshotsXmin(void)
+{
+	TransactionId xmin = TransactionXmin;
+	Snapshot snaps[] = {CurrentSnapshot, FirstXactSnapshot, SecondarySnapshot, CatalogSnapshot, HistoricSnapshot};
+	uint32 i;
+
+	ActiveSnapshotElt *asnap = ActiveSnapshot;
+	while (asnap != NULL)
+	{
+		if (xmin == InvalidTransactionId ||
+			TransactionIdPrecedes(xmin, asnap->as_snap->xmin))
+			xmin = asnap->as_snap->xmin;
+		asnap = asnap->as_next;
+	}
+
+	for (i=lengthof(snaps); i>0;)
+	{
+		--i;
+		if (snaps[i] != NULL &&
+			(xmin == InvalidTransactionId ||
+			 TransactionIdPrecedes(xmin, snaps[i]->xmin)))
+		{
+			xmin = snaps[i]->xmin;
+		}
+	}
+
+	return xmin;
+}
+
 /*
  * SnapshotResetXmin
  *
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index d0ab44ae82..0eb9ab3971 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,8 +11,10 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
+/* GUC variables */
 /* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
+extern int slru_subtrans_page_num;
+extern int local_cache_subtrans_page_num;
 
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
@@ -25,5 +27,6 @@ extern void StartupSUBTRANS(TransactionId oldestActiveXID);
 extern void CheckPointSUBTRANS(void);
 extern void ExtendSUBTRANS(TransactionId newestXact);
 extern void TruncateSUBTRANS(TransactionId oldestXact);
-
+extern void SubtransRemoveLocalPage(int pageno);
+extern void SubtransResetGucCacheNum(int newval, void *extra);
 #endif							/* SUBTRANS_H */
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index f03dc23b14..4ac411c536 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -103,6 +103,14 @@ typedef struct
 
 #define SHAREDINVALSNAPSHOT_ID	(-5)
 
+typedef struct
+{
+	int8		id;				/* type field --- must be first */
+	int			pageno;			/* page no */
+} SubtransInvalPageMsg;
+
+#define SUBTRANS_INVALID_PAGE_ID	(-6)
+
 typedef struct
 {
 	int8		id;				/* type field --- must be first */
@@ -119,6 +127,7 @@ typedef union
 	SharedInvalSmgrMsg sm;
 	SharedInvalRelmapMsg rm;
 	SharedInvalSnapshotMsg sn;
+	SubtransInvalPageMsg spp;
 } SharedInvalidationMessage;
 
 
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index 770672890b..8adb1f37bb 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -50,6 +50,8 @@ extern void CacheInvalidateRelcacheByRelid(Oid relid);
 
 extern void CacheInvalidateSmgr(RelFileNodeBackend rnode);
 
+extern void CacheInvalSubtransPage(int pageno);
+
 extern void CacheInvalidateRelmap(Oid databaseId);
 
 extern void CacheRegisterSyscacheCallback(int cacheid,
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 44539fe15a..16df0ea1a2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -125,6 +125,8 @@ extern void UnregisterSnapshot(Snapshot snapshot);
 extern Snapshot RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner);
 extern void UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner owner);
 
+extern TransactionId GetSnapshotsXmin(void);
+
 extern void AtSubCommit_Snapshot(int level);
 extern void AtSubAbort_Snapshot(int level);
 extern void AtEOXact_Snapshot(bool isCommit, bool resetXmin);
#2Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Pengchengliu (#1)
Re: suboverflowed subtransactions concurrency performance optimize

Hi Pengcheng!

You are solving important problem, thank you!

30 авг. 2021 г., в 13:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а):

To resolve this performance problem, we think about a solution which cache
SubtransSLRU to local cache.
First we can query parent transaction id from SubtransSLRU, and copy the
SLRU page to local cache page.
After that if we need query parent transaction id again, we can query it
from local cache directly.

A copy of SLRU in each backend's cache can consume a lot of memory. Why create a copy if we can optimise shared representation of SLRU?

JFYI There is a related patch to make SimpleLruReadPage_ReadOnly() faster for bigger SLRU buffers[0]https://commitfest.postgresql.org/34/2627/.
Also Nik Samokhvalov recently published interesting investigation on the topic, but for some reason his message did not pass the moderation. [1]/messages/by-id/BE73A0BB-5929-40F4-BAF8-55323DE39561@yandex-team.ru

Also it's important to note that there was a community request to move SLRUs to shared_buffers [2]/messages/by-id/20180814213500.GA74618@60f81dc409fc.ant.amazon.com.

Thanks!

Best regards, Andrey Borodin.

[0]: https://commitfest.postgresql.org/34/2627/
[1]: /messages/by-id/BE73A0BB-5929-40F4-BAF8-55323DE39561@yandex-team.ru
[2]: /messages/by-id/20180814213500.GA74618@60f81dc409fc.ant.amazon.com

#3Pengchengliu
pengchengliu@tju.edu.cn
In reply to: Andrey Borodin (#2)
RE: suboverflowed subtransactions concurrency performance optimize

Hi Andrey,
Thanks a lot for your replay and reference information.

The default NUM_SUBTRANS_BUFFERS is 32. My implementation is local_cache_subtrans_pages can be adjusted dynamically.
If we configure local_cache_subtrans_pages as 64, every backend use only extra 64*8192=512KB memory.
So the local cache is similar to the first level cache. And subtrans SLRU is the second level cache.
And I think extra memory is very well worth it. It really resolve massive subtrans stuck issue which I mentioned in previous email.

I have view the patch of [0]https://commitfest.postgresql.org/34/2627/ before. For SLRU buffers adding GUC configuration parameters are very nice.
I think for subtrans, its optimize is not enough. For SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then call SubTransGetParent in loop.
Prevent acquire/release SubtransSLRULock in SubTransGetTopmostTransaction-> SubTransGetParent in loop.
After I apply this patch which I optimize SubTransGetTopmostTransaction, with my test case, I still get stuck result.

[1]: /messages/by-id/BE73A0BB-5929-40F4-BAF8-55323DE39561@yandex-team.ru
With the test case which I mentioned in previous mail, It was still stuck. In default there is 2048 subtrans in one page.
When some processes get the top transaction in one page, they should pin/unpin and lock/unlock the same page repeatedly.
I found than it was stuck at pin/unpin page for some backends.

Compare test results, pgbench with subtrans_128.sql
Concurrency PG master PG master with path[0]https://commitfest.postgresql.org/34/2627/ Local cache optimize
300 stuck stuck no stuck
500 stuck stuck no stuck
1000 stuck stuck no stuck

Maybe we can test different approach with my test case. For massive concurrency, if it will not be stuck, we can get a good solution.

[0]: https://commitfest.postgresql.org/34/2627/
[1]: /messages/by-id/BE73A0BB-5929-40F4-BAF8-55323DE39561@yandex-team.ru

Thanks
Pengcheng

-----Original Message-----
From: Andrey Borodin <x4mmm@yandex-team.ru>
Sent: 2021年8月30日 18:25
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: pgsql-hackers@postgresql.org
Subject: Re: suboverflowed subtransactions concurrency performance optimize

Hi Pengcheng!

You are solving important problem, thank you!

30 авг. 2021 г., в 13:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а):

To resolve this performance problem, we think about a solution which
cache SubtransSLRU to local cache.
First we can query parent transaction id from SubtransSLRU, and copy
the SLRU page to local cache page.
After that if we need query parent transaction id again, we can query
it from local cache directly.

A copy of SLRU in each backend's cache can consume a lot of memory. Why create a copy if we can optimise shared representation of SLRU?

JFYI There is a related patch to make SimpleLruReadPage_ReadOnly() faster for bigger SLRU buffers[0]https://commitfest.postgresql.org/34/2627/.
Also Nik Samokhvalov recently published interesting investigation on the topic, but for some reason his message did not pass the moderation. [1]/messages/by-id/BE73A0BB-5929-40F4-BAF8-55323DE39561@yandex-team.ru

Also it's important to note that there was a community request to move SLRUs to shared_buffers [2]/messages/by-id/20180814213500.GA74618@60f81dc409fc.ant.amazon.com.

Thanks!

Best regards, Andrey Borodin.

[0]: https://commitfest.postgresql.org/34/2627/
[1]: /messages/by-id/BE73A0BB-5929-40F4-BAF8-55323DE39561@yandex-team.ru
[2]: /messages/by-id/20180814213500.GA74618@60f81dc409fc.ant.amazon.com

#4Zhihong Yu
zyu@yugabyte.com
In reply to: Pengchengliu (#1)
Re: suboverflowed subtransactions concurrency performance optimize

On Mon, Aug 30, 2021 at 1:43 AM Pengchengliu <pengchengliu@tju.edu.cn>
wrote:

Hi hackers,
I wrote a patch to resolve the subtransactions concurrency performance
problems when suboverflowed.

When we use more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions per
transaction concurrency, it will lead to subtransactions performance
problems.
All backend will be stuck at acquiring lock SubtransSLRULock.

The reproduce steps in PG master branch:

1, init a cluster, append postgresql.conf as below:

max_connections = '2500'
max_files_per_process = '2000'
max_locks_per_transaction = '64'
max_parallel_maintenance_workers = '8'
max_parallel_workers = '60'
max_parallel_workers_per_gather = '6'
max_prepared_transactions = '15000'
max_replication_slots = '10'
max_wal_senders = '64'
max_worker_processes = '250'
shared_buffers = 8GB

2, create table and insert some records as below:

CREATE UNLOGGED TABLE contend (
id integer,
val integer NOT NULL
)
WITH (fillfactor='50');

INSERT INTO contend (id, val)
SELECT i, 0
FROM generate_series(1, 10000) AS i;

VACUUM (ANALYZE) contend;

3, The script subtrans_128.sql in attachment. use pgbench with
subtrans_128.sql as below.
pgbench -d postgres -p 33800 -n -r -f subtrans_128.sql -c 500 -j 500 -T
3600

4, After for a while, we can get the stuck result. We can query
pg_stat_activity. All backends wait event is SubtransSLRULock.
We can use pert top and try find the root cause. The result of perf top
as below:
66.20% postgres [.] pg_atomic_compare_exchange_u32_impl
29.30% postgres [.] pg_atomic_fetch_sub_u32_impl
1.46% postgres [.] pg_atomic_read_u32
1.34% postgres [.] TransactionIdIsCurrentTransactionId
0.75% postgres [.] SimpleLruReadPage_ReadOnly
0.14% postgres [.] LWLockAttemptLock
0.14% postgres [.] LWLockAcquire
0.12% postgres [.] pg_atomic_compare_exchange_u32
0.09% postgres [.] HeapTupleSatisfiesMVCC
0.06% postgres [.] heapgetpage
0.03% postgres [.] sentinel_ok
0.03% postgres [.] XidInMVCCSnapshot
0.03% postgres [.] slot_deform_heap_tuple
0.03% postgres [.] ExecInterpExpr
0.02% postgres [.] AllocSetCheck
0.02% postgres [.] HeapTupleSatisfiesVisibility
0.02% postgres [.] LWLockRelease
0.02% postgres [.] TransactionIdPrecedes
0.02% postgres [.] SubTransGetParent
0.01% postgres [.] heapgettup_pagemode
0.01% postgres [.] CheckForSerializableConflictOutNeeded

After view the subtrans codes, it is easy to find that the global LWLock
SubtransSLRULock is the bottleneck of subtrans concurrently.

When a bakcend session assign more than PGPROC_MAX_CACHED_SUBXIDS(64)
subtransactions, we will get a snapshot with suboverflowed.
A suboverflowed snapshot does not contain all data required to determine
visibility, so PostgreSQL will occasionally have to resort to pg_subtrans.
These pages are cached in shared buffers, but you can see the overhead of
looking them up in the high rank of SimpleLruReadPage_ReadOnly in the perf
output.

To resolve this performance problem, we think about a solution which cache
SubtransSLRU to local cache.
First we can query parent transaction id from SubtransSLRU, and copy the
SLRU page to local cache page.
After that if we need query parent transaction id again, we can query it
from local cache directly.
It will reduce the number of acquire and release LWLock SubtransSLRULock
observably.

From all snapshots, we can get the latest xmin. All transaction id which
precedes this xmin, it muse has been commited/abortd.
Their parent/top transaction has been written subtrans SLRU. Then we can
cache the subtrans SLRU and copy it to local cache.

Use the same produce steps above, with our patch we cannot get the stuck
result.
Note that append our GUC parameter in postgresql.conf. This optimize is off
in default.
local_cache_subtrans_pages=128

The patch is base on PG master branch
0d906b2c0b1f0d625ff63d9ace906556b1c66a68

Our project in https://github.com/ADBSQL/AntDB, Welcome to follow us,
AntDB, AsiaInfo's PG-based distributed database product

Thanks
Pengcheng

Hi,

+ uint16 valid_offset; /* how many entry is valid */

how many entry is -> how many entries are

+int slru_subtrans_page_num = 32;

Looks like the variable represents the number of subtrans pages. Maybe name
the variable slru_subtrans_page_count ?

+ if (lbuffer->in_htab == false)

The condition can be written as 'if (!lbuffer->in_htab)'

For SubtransAllocLocalBuffer(), you can enclose the body of method in while
loop so that you don't use goto statement.

Cheers

#5Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Pengchengliu (#3)
Re: suboverflowed subtransactions concurrency performance optimize

<html><head></head><body dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="ApplePlainTextBody"><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="ApplePlainTextBody"><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="ApplePlainTextBody"><div class="ApplePlainTextBody"><br><br><blockquote type="cite">31 авг. 2021 г., в 11:43, Pengchengliu &lt;pengchengliu@tju.edu.cn&gt; написал(а):<br><br>Hi Andrey,<br> Thanks a lot for your replay and reference information.<br><br> The default NUM_SUBTRANS_BUFFERS is 32. My implementation is local_cache_subtrans_pages can be adjusted dynamically.<br> If we configure local_cache_subtrans_pages as 64, every backend use only extra 64*8192=512KB memory. <br> So the local cache is similar to the first level cache. And subtrans SLRU is the second level cache.<br> And I think extra memory is very well worth it. It really resolve massive subtrans stuck issue which I mentioned in previous email.<br><br> I have view the patch of [0] before. For SLRU buffers adding GUC configuration parameters are very nice.<br> I think for subtrans, its optimize is not enough. For SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then call SubTransGetParent in loop.<br> Prevent acquire/release &nbsp;SubtransSLRULock in SubTransGetTopmostTransaction-&gt; SubTransGetParent in loop.<br> After I apply this patch which I &nbsp;optimize SubTransGetTopmostTransaction, &nbsp;with my test case, I still get stuck result.<br></blockquote><br>SubTransGetParent() acquires only Shared lock on SubtransSLRULock. The problem may arise only when someone reads page from disk. But if you have big enough cache - this will never happen. And this cache will be much less than 512KB*max_connections.<br><br>I think if we really want to fix exclusive SubtransSLRULock I think best option would be to split SLRU control lock into array of locks - one for each bank (in v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch). With this approach we will have to rename s/bank/partition/g for consistency with locks and buffers partitions. I really liked having my own banks, but consistency worth it anyway.<br><br>Thanks!<br><br>Best regards, Andrey Borodin.</div></div></div></body></html>

#6Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Pengchengliu (#3)
Re: suboverflowed subtransactions concurrency performance optimize

Sorry, for some reason Mail.app converted message to html and mailing list mangled this html into mess. I'm resending previous message as plain text again. Sorry for the noise.

31 авг. 2021 г., в 11:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а):

Hi Andrey,
Thanks a lot for your replay and reference information.

The default NUM_SUBTRANS_BUFFERS is 32. My implementation is local_cache_subtrans_pages can be adjusted dynamically.
If we configure local_cache_subtrans_pages as 64, every backend use only extra 64*8192=512KB memory.
So the local cache is similar to the first level cache. And subtrans SLRU is the second level cache.
And I think extra memory is very well worth it. It really resolve massive subtrans stuck issue which I mentioned in previous email.

I have view the patch of [0] before. For SLRU buffers adding GUC configuration parameters are very nice.
I think for subtrans, its optimize is not enough. For SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then call SubTransGetParent in loop.
Prevent acquire/release SubtransSLRULock in SubTransGetTopmostTransaction-> SubTransGetParent in loop.
After I apply this patch which I optimize SubTransGetTopmostTransaction, with my test case, I still get stuck result.

SubTransGetParent() acquires only Shared lock on SubtransSLRULock. The problem may arise only when someone reads page from disk. But if you have big enough cache - this will never happen. And this cache will be much less than 512KB*max_connections.

I think if we really want to fix exclusive SubtransSLRULock I think best option would be to split SLRU control lock into array of locks - one for each bank (in v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch). With this approach we will have to rename s/bank/partition/g for consistency with locks and buffers partitions. I really liked having my own banks, but consistency worth it anyway.

Thanks!

Best regards, Andrey Borodin.

#7Pengchengliu
pengchengliu@tju.edu.cn
In reply to: Andrey Borodin (#6)
RE: suboverflowed subtransactions concurrency performance optimize

Hi Andrey,

I think if we really want to fix exclusive SubtransSLRULock I think best option would be to split SLRU control lock into array of locks

I agree with you. If we can resolve the performance issue with this approach, It should be a good solution.

one for each bank (in v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch)

I have tested with this patch. And I have modified NUM_SUBTRANS_BUFFERS to 128. With 500 concurrence, it would not be stuck indeed. But the performance is very bad. For a sequence scan table, it uses more than one minute.
I think it is unacceptable in a production environment.

postgres=# select count(*) from contend ;
count
-------
10127
(1 row)

Time: 86011.593 ms (01:26.012)
postgres=# select count(*) from contend ;
count
-------
10254
(1 row)
Time: 79399.949 ms (01:19.400)

With my local subtrans optimize approach, the same env and the same test script and 500 concurrence, a sequence scan, it uses only less than 10 seconds.

postgres=# select count(*) from contend ;
count
-------
10508
(1 row)

Time: 7104.283 ms (00:07.104)

postgres=# select count(*) from contend ;
count
-------
13175
(1 row)

Time: 6602.635 ms (00:06.603)
Thanks
Pengcheng

-----Original Message-----
From: Andrey Borodin <x4mmm@yandex-team.ru>
Sent: 2021年9月3日 14:51
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: pgsql-hackers@postgresql.org
Subject: Re: suboverflowed subtransactions concurrency performance optimize

Sorry, for some reason Mail.app converted message to html and mailing list mangled this html into mess. I'm resending previous message as plain text again. Sorry for the noise.

31 авг. 2021 г., в 11:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а):

Hi Andrey,
Thanks a lot for your replay and reference information.

The default NUM_SUBTRANS_BUFFERS is 32. My implementation is local_cache_subtrans_pages can be adjusted dynamically.
If we configure local_cache_subtrans_pages as 64, every backend use only extra 64*8192=512KB memory.
So the local cache is similar to the first level cache. And subtrans SLRU is the second level cache.
And I think extra memory is very well worth it. It really resolve massive subtrans stuck issue which I mentioned in previous email.

I have view the patch of [0] before. For SLRU buffers adding GUC configuration parameters are very nice.
I think for subtrans, its optimize is not enough. For SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then call SubTransGetParent in loop.
Prevent acquire/release SubtransSLRULock in SubTransGetTopmostTransaction-> SubTransGetParent in loop.
After I apply this patch which I optimize SubTransGetTopmostTransaction, with my test case, I still get stuck result.

SubTransGetParent() acquires only Shared lock on SubtransSLRULock. The problem may arise only when someone reads page from disk. But if you have big enough cache - this will never happen. And this cache will be much less than 512KB*max_connections.

I think if we really want to fix exclusive SubtransSLRULock I think best option would be to split SLRU control lock into array of locks - one for each bank (in v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch). With this approach we will have to rename s/bank/partition/g for consistency with locks and buffers partitions. I really liked having my own banks, but consistency worth it anyway.

Thanks!

Best regards, Andrey Borodin.

#8Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Andrey Borodin (#2)
1 attachment(s)
Re: suboverflowed subtransactions concurrency performance optimize

On Mon, 30 Aug 2021 at 11:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi Pengcheng!

You are solving important problem, thank you!

30 авг. 2021 г., в 13:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а):

To resolve this performance problem, we think about a solution which cache
SubtransSLRU to local cache.
First we can query parent transaction id from SubtransSLRU, and copy the
SLRU page to local cache page.
After that if we need query parent transaction id again, we can query it
from local cache directly.

A copy of SLRU in each backend's cache can consume a lot of memory.

Yes, copying the whole SLRU into local cache seems overkill.

Why create a copy if we can optimise shared representation of SLRU?

transam.c uses a single item cache to prevent thrashing from repeated
lookups, which reduces problems with shared access to SLRUs.
multitrans.c also has similar.

I notice that subtrans. doesn't have this, but could easily do so.
Patch attached, which seems separate to other attempts at tuning.

On review, I think it is also possible that we update subtrans ONLY if
someone uses >PGPROC_MAX_CACHED_SUBXIDS.
This would make subtrans much smaller and avoid one-entry-per-page
which is a major source of cacheing.
This would means some light changes in GetSnapshotData().
Let me know if that seems interesting also?

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

subtrans_single_item_cache.v1.patchapplication/octet-stream; name=subtrans_single_item_cache.v1.patchDownload
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6a8e521f89..8aa3d9e53c 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -54,6 +54,14 @@
 #define TransactionIdToPage(xid) ((xid) / (TransactionId) SUBTRANS_XACTS_PER_PAGE)
 #define TransactionIdToEntry(xid) ((xid) % (TransactionId) SUBTRANS_XACTS_PER_PAGE)
 
+/*
+ * Single-item cache for results of SubTransGetTopmostTransaction.  It's worth having
+ * such a cache because we frequently find ourselves repeatedly checking the
+ * same XID, for example when scanning a table just after a bulk insert,
+ * update, or delete.
+ */
+static TransactionId cachedFetchXid = InvalidTransactionId;
+static TransactionId cachedFetchTopmostXid = InvalidTransactionId;
 
 /*
  * Link to shared-memory data structures for SUBTRANS control
@@ -155,6 +163,13 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	/* Can't ask about stuff that might not be around anymore */
 	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
 
+	/*
+	 * Before going to the subtrans log, check our single item cache to
+	 * see if we know the result from a previous/recent request.
+	 */
+	if (TransactionIdEquals(xid, cachedFetchXid))
+		return cachedFetchTopmostXid;
+
 	while (TransactionIdIsValid(parentXid))
 	{
 		previousXid = parentXid;
@@ -174,6 +189,9 @@ SubTransGetTopmostTransaction(TransactionId xid)
 
 	Assert(TransactionIdIsValid(previousXid));
 
+	cachedFetchXid = xid;
+	cachedFetchTopmostXid = previousXid;
+
 	return previousXid;
 }
 
#9Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Simon Riggs (#8)
Re: suboverflowed subtransactions concurrency performance optimize

30 нояб. 2021 г., в 17:19, Simon Riggs <simon.riggs@enterprisedb.com> написал(а):

On Mon, 30 Aug 2021 at 11:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi Pengcheng!

You are solving important problem, thank you!

30 авг. 2021 г., в 13:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а):

To resolve this performance problem, we think about a solution which cache
SubtransSLRU to local cache.
First we can query parent transaction id from SubtransSLRU, and copy the
SLRU page to local cache page.
After that if we need query parent transaction id again, we can query it
from local cache directly.

A copy of SLRU in each backend's cache can consume a lot of memory.

Yes, copying the whole SLRU into local cache seems overkill.

Why create a copy if we can optimise shared representation of SLRU?

transam.c uses a single item cache to prevent thrashing from repeated
lookups, which reduces problems with shared access to SLRUs.
multitrans.c also has similar.

I notice that subtrans. doesn't have this, but could easily do so.
Patch attached, which seems separate to other attempts at tuning.

I think this definitely makes sense to do.

On review, I think it is also possible that we update subtrans ONLY if
someone uses >PGPROC_MAX_CACHED_SUBXIDS.
This would make subtrans much smaller and avoid one-entry-per-page
which is a major source of cacheing.
This would means some light changes in GetSnapshotData().
Let me know if that seems interesting also?

I'm afraid of unexpected performance degradation. When the system runs fine, you provision a VM of some vCPU\RAM, and then some backend uses a little more than 64 subtransactions and all the system is stuck. Or will it affect only backend using more than 64 subtransactions?

Best regards, Andrey Borodin.

#10Dilip Kumar
dilipbalaut@gmail.com
In reply to: Simon Riggs (#8)
Re: suboverflowed subtransactions concurrency performance optimize

On Tue, Nov 30, 2021 at 5:49 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

transam.c uses a single item cache to prevent thrashing from repeated
lookups, which reduces problems with shared access to SLRUs.
multitrans.c also has similar.

I notice that subtrans. doesn't have this, but could easily do so.
Patch attached, which seems separate to other attempts at tuning.

Yeah, this definitely makes sense.

On review, I think it is also possible that we update subtrans ONLY if
someone uses >PGPROC_MAX_CACHED_SUBXIDS.
This would make subtrans much smaller and avoid one-entry-per-page
which is a major source of cacheing.
This would means some light changes in GetSnapshotData().
Let me know if that seems interesting also?

Do you mean to say avoid setting the sub-transactions parent if the
number of sun-transactions is not crossing PGPROC_MAX_CACHED_SUBXIDS?
But the TransactionIdDidCommit(), might need to fetch the parent if
the transaction status is TRANSACTION_STATUS_SUB_COMMITTED, so how
would we handle that?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#11Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Dilip Kumar (#10)
Re: suboverflowed subtransactions concurrency performance optimize

On Fri, 3 Dec 2021 at 01:27, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On review, I think it is also possible that we update subtrans ONLY if
someone uses >PGPROC_MAX_CACHED_SUBXIDS.
This would make subtrans much smaller and avoid one-entry-per-page
which is a major source of cacheing.
This would means some light changes in GetSnapshotData().
Let me know if that seems interesting also?

Do you mean to say avoid setting the sub-transactions parent if the
number of sun-transactions is not crossing PGPROC_MAX_CACHED_SUBXIDS?
But the TransactionIdDidCommit(), might need to fetch the parent if
the transaction status is TRANSACTION_STATUS_SUB_COMMITTED, so how
would we handle that?

TRANSACTION_STATUS_SUB_COMMITTED is set as a transient state during
final commit.
In that case, the top-level xid is still in procarray when nsubxids <
PGPROC_MAX_CACHED_SUBXIDS
so we need not consult pg_subtrans in that case, see step 4 of
TransactionIdIsInProgress()

--
Simon Riggs http://www.EnterpriseDB.com/

#12Dilip Kumar
dilipbalaut@gmail.com
In reply to: Simon Riggs (#11)
Re: suboverflowed subtransactions concurrency performance optimize

On Fri, Dec 3, 2021 at 5:00 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Fri, 3 Dec 2021 at 01:27, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On review, I think it is also possible that we update subtrans ONLY if
someone uses >PGPROC_MAX_CACHED_SUBXIDS.
This would make subtrans much smaller and avoid one-entry-per-page
which is a major source of cacheing.
This would means some light changes in GetSnapshotData().
Let me know if that seems interesting also?

Do you mean to say avoid setting the sub-transactions parent if the
number of sun-transactions is not crossing PGPROC_MAX_CACHED_SUBXIDS?
But the TransactionIdDidCommit(), might need to fetch the parent if
the transaction status is TRANSACTION_STATUS_SUB_COMMITTED, so how
would we handle that?

TRANSACTION_STATUS_SUB_COMMITTED is set as a transient state during
final commit.
In that case, the top-level xid is still in procarray when nsubxids <
PGPROC_MAX_CACHED_SUBXIDS
so we need not consult pg_subtrans in that case, see step 4 of.
TransactionIdIsInProgress()

Okay I see, that there is a rule that before calling
TransactionIdDidCommit(), we must consult TransactionIdIsInProgress()
for non MVCC snapshot or XidInMVCCSnapshot(). Okay so now I don't
have this concern, thanks for clarifying. I will think more about
this approach from other aspects.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#13Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Andrey Borodin (#9)
Re: suboverflowed subtransactions concurrency performance optimize

On Wed, 1 Dec 2021 at 06:41, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On review, I think it is also possible that we update subtrans ONLY if
someone uses >PGPROC_MAX_CACHED_SUBXIDS.
This would make subtrans much smaller and avoid one-entry-per-page
which is a major source of cacheing.
This would means some light changes in GetSnapshotData().
Let me know if that seems interesting also?

I'm afraid of unexpected performance degradation. When the system runs fine, you provision a VM of some vCPU\RAM, and then some backend uses a little more than 64 subtransactions and all the system is stuck. Or will it affect only backend using more than 64 subtransactions?

That is the objective: to isolate the effect to only those that
overflow. It seems possible.

--
Simon Riggs http://www.EnterpriseDB.com/

#14Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Dilip Kumar (#10)
1 attachment(s)
Re: suboverflowed subtransactions concurrency performance optimize

On Fri, 3 Dec 2021 at 06:27, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Nov 30, 2021 at 5:49 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

transam.c uses a single item cache to prevent thrashing from repeated
lookups, which reduces problems with shared access to SLRUs.
multitrans.c also has similar.

I notice that subtrans. doesn't have this, but could easily do so.
Patch attached, which seems separate to other attempts at tuning.

Yeah, this definitely makes sense.

On review, I think it is also possible that we update subtrans ONLY if
someone uses >PGPROC_MAX_CACHED_SUBXIDS.
This would make subtrans much smaller and avoid one-entry-per-page
which is a major source of cacheing.
This would means some light changes in GetSnapshotData().
Let me know if that seems interesting also?

Do you mean to say avoid setting the sub-transactions parent if the
number of sun-transactions is not crossing PGPROC_MAX_CACHED_SUBXIDS?

Yes.

This patch shows where I'm going, with changes in GetSnapshotData()
and XidInMVCCSnapshot() and XactLockTableWait().
Passes make check, but needs much more, so this is review-only at this
stage to give a flavour of what is intended.

(No where near replacing the subtrans module as I envisage as the
final outcome, meaning we don't need ExtendSUBTRANS()).

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

rethink_subtrans.v4.patchapplication/octet-stream; name=rethink_subtrans.v4.patchDownload
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index a22bf375f8..e2ff7b6f2d 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -172,7 +172,12 @@ GetNewTransactionId(bool isSubXact)
 	 * XID before we zero the page.  Fortunately, a page of the commit log
 	 * holds 32K or more transactions, so we don't have to do this very often.
 	 *
-	 * Extend pg_subtrans and pg_commit_ts too.
+	 * Extend pg_commit_ts too. We no longer extend pg_subtrans here because
+	 * subtransaction parents are only recorded if they overflow, or if they
+	 * are used in serializable transactions. Since that considerably reduces
+	 * the read/write traffic of the subtrans module, we expect to be able
+	 * to replace that module with a simple shmem hash table rather than SLRU,
+	 * since subtrans is not persistent.
 	 */
 	ExtendCLOG(xid);
 	ExtendCommitTs(xid);
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index ca6f6d57d3..9e0a42b2a6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -247,6 +247,7 @@ static TransactionStateData TopTransactionStateData = {
  */
 static int	nUnreportedXids;
 static TransactionId unreportedXids[PGPROC_MAX_CACHED_SUBXIDS];
+static int	nAllocatedXids;
 
 static TransactionState CurrentTransactionState = &TopTransactionStateData;
 
@@ -640,7 +641,9 @@ AssignTransactionId(TransactionState s)
 	if (!isSubXact)
 		XactTopFullTransactionId = s->fullTransactionId;
 
-	if (isSubXact)
+	if (isSubXact &&
+		(nAllocatedXids++ >= PGPROC_MAX_CACHED_SUBXIDS ||
+		 IsolationIsSerializable()))
 		SubTransSetParent(XidFromFullTransactionId(s->fullTransactionId),
 						  XidFromFullTransactionId(s->parent->fullTransactionId));
 
@@ -1997,6 +2000,7 @@ StartTransaction(void)
 	 * initialize reported xid accounting
 	 */
 	nUnreportedXids = 0;
+	nAllocatedXids = 0;
 	s->didLogXid = false;
 
 	/*
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0ccfb3a20a..9a69eec6be 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -262,6 +262,9 @@ static ProcArrayStruct *procArray;
 
 static PGPROC *allProcs;
 
+static TransactionId CachedSubXid = InvalidTransactionId;
+static TransactionId CachedParentXid = InvalidTransactionId;
+
 /*
  * Bookkeeping for tracking emulated transactions in recovery
  */
@@ -1425,6 +1428,8 @@ TransactionIdIsInProgress(TransactionId xid)
 	other_xids = ProcGlobal->xids;
 	other_subxidstates = ProcGlobal->subxidStates;
 
+	CachedSubXid = CachedParentXid = InvalidTransactionId;
+
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
 	/*
@@ -1493,6 +1498,15 @@ TransactionIdIsInProgress(TransactionId xid)
 			{
 				LWLockRelease(ProcArrayLock);
 				xc_by_child_xid_inc();
+
+				/*
+				 * Remember the parent xid, for use during XactLockTableWait().
+				 * We do this because it is cheaper than looking up pg_subtrans,
+				 * and also allows us to remove pg_subtrans completely.
+				 */
+				CachedSubXid = cxid;
+				CachedParentXid = pxid;
+
 				return true;
 			}
 		}
@@ -1579,6 +1593,25 @@ TransactionIdIsInProgress(TransactionId xid)
 	return false;
 }
 
+/*
+ * Allow the topmost xid to be accessed from the cache, in a prior call to
+ * TransactionIdIsInProgress(). Specifically designed for use in
+ * XactLockTableWait()
+ */
+bool
+GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid)
+{
+	bool found = false;
+
+	if (CachedSubXid == xid)
+	{
+		*pxid = CachedParentXid;
+		found = true;
+	}
+
+	return found;
+}
+
 /*
  * TransactionIdIsActive -- is xid the top-level XID of an active backend?
  *
@@ -2354,8 +2387,7 @@ GetSnapshotData(Snapshot snapshot)
 			xip[count++] = xid;
 
 			/*
-			 * Save subtransaction XIDs if possible (if we've already
-			 * overflowed, there's no point).  Note that the subxact XIDs must
+			 * Save subtransaction XIDs, always. Note that the subxact XIDs must
 			 * be later than their parent, so no need to check them against
 			 * xmin.  We could filter against xmax, but it seems better not to
 			 * do that much work while holding the ProcArrayLock.
@@ -2368,27 +2400,23 @@ GetSnapshotData(Snapshot snapshot)
 			 *
 			 * Again, our own XIDs are not included in the snapshot.
 			 */
-			if (!suboverflowed)
+			if (subxidStates[pgxactoff].overflowed)
+				suboverflowed = true;
+
 			{
+				int			nsubxids = subxidStates[pgxactoff].count;
 
-				if (subxidStates[pgxactoff].overflowed)
-					suboverflowed = true;
-				else
+				if (nsubxids > 0)
 				{
-					int			nsubxids = subxidStates[pgxactoff].count;
-
-					if (nsubxids > 0)
-					{
-						int			pgprocno = pgprocnos[pgxactoff];
-						PGPROC	   *proc = &allProcs[pgprocno];
+					int			pgprocno = pgprocnos[pgxactoff];
+					PGPROC	   *proc = &allProcs[pgprocno];
 
-						pg_read_barrier();	/* pairs with GetNewTransactionId */
+					pg_read_barrier();	/* pairs with GetNewTransactionId */
 
-						memcpy(snapshot->subxip + subcount,
-							   (void *) proc->subxids.xids,
-							   nsubxids * sizeof(TransactionId));
-						subcount += nsubxids;
-					}
+					memcpy(snapshot->subxip + subcount,
+						   (void *) proc->subxids.xids,
+						   nsubxids * sizeof(TransactionId));
+					subcount += nsubxids;
 				}
 			}
 		}
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 2db0424ad9..a7e3472191 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -666,6 +666,8 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
 
 	for (;;)
 	{
+		TransactionId pxid = InvalidTransactionId;
+
 		Assert(TransactionIdIsValid(xid));
 		Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
 
@@ -675,6 +677,17 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
 
 		LockRelease(&tag, ShareLock, false);
 
+		/*
+		 * If a transaction has no lock, it might be a top-level
+		 * transaction, in which case it will be no longer shown as in progress.
+		 *
+		 * If a transaction is a subtransaction, then it could have committed,
+		 * but the parent (sub)transaction might still be in progress. Since
+		 * we do not remove the subxid from procarray until top-level commit
+		 * we may find it still running with this check. If not, it could
+		 * be that it is an overflowed subtransaction, in which case we need
+		 * to discover if a parent is registered for it.
+		 */
 		if (!TransactionIdIsInProgress(xid))
 			break;
 
@@ -696,7 +709,18 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
 		if (!first)
 			pg_usleep(1000L);
 		first = false;
-		xid = SubTransGetTopmostTransaction(xid);
+
+		/*
+		 * In most cases, we can get the parent xid from the cache we set
+		 * when we executed TransactionIdIsInProgress(). If not, we have
+		 * to ask subtrans for the parent.
+		 * This logic would be duplicated in ConditionalXactLockTableWait().
+		 */
+		if (GetTopmostTransactionIdFromProcArray(xid, &pxid) &&
+			TransactionIdIsValid(pxid))
+			xid = pxid;
+		else
+			xid = SubTransGetTopmostTransaction(xid);
 	}
 
 	if (oper != XLTW_None)
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index dca1bc8afc..5f95dcea74 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -2256,7 +2256,8 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc)
 bool
 XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 {
-	uint32		i;
+	uint32		i, j;
+	bool		parent = false;
 
 	/*
 	 * Make a quick range check to eliminate most XIDs without looking at the
@@ -2273,93 +2274,67 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 	if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
 		return true;
 
+recheck_parent:
+
 	/*
 	 * Snapshot information is stored slightly differently in snapshots taken
-	 * during recovery.
+	 * during recovery. If xip is populated, search it first, since it is
+	 * smaller and thus we assume the best place to start.
 	 */
 	if (!snapshot->takenDuringRecovery)
 	{
-		/*
-		 * If the snapshot contains full subxact data, the fastest way to
-		 * check things is just to compare the given XID against both subxact
-		 * XIDs and top-level XIDs.  If the snapshot overflowed, we have to
-		 * use pg_subtrans to convert a subxact XID to its parent XID, but
-		 * then we need only look at top-level XIDs not subxacts.
-		 */
-		if (!snapshot->suboverflowed)
-		{
-			/* we have full data, so search subxip */
-			int32		j;
-
-			for (j = 0; j < snapshot->subxcnt; j++)
-			{
-				if (TransactionIdEquals(xid, snapshot->subxip[j]))
-					return true;
-			}
-
-			/* not there, fall through to search xip[] */
-		}
-		else
-		{
-			/*
-			 * Snapshot overflowed, so convert xid to top-level.  This is safe
-			 * because we eliminated too-old XIDs above.
-			 */
-			xid = SubTransGetTopmostTransaction(xid);
-
-			/*
-			 * If xid was indeed a subxact, we might now have an xid < xmin,
-			 * so recheck to avoid an array scan.  No point in rechecking
-			 * xmax.
-			 */
-			if (TransactionIdPrecedes(xid, snapshot->xmin))
-				return false;
-		}
-
 		for (i = 0; i < snapshot->xcnt; i++)
 		{
 			if (TransactionIdEquals(xid, snapshot->xip[i]))
 				return true;
 		}
+
+		/* If we didn't find the parent here, its definitely not in snapshot */
+		if (parent)
+			return false;
 	}
-	else
+
+	/*
+	 * If a top-level xid did not overflow then we will find its subxids here,
+	 * if it is present in the snapshot.
+	 */
+	for (j = 0; j < snapshot->subxcnt; j++)
 	{
-		int32		j;
+		if (TransactionIdEquals(xid, snapshot->subxip[j]))
+			return true;
+	}
 
+	/*
+	 * If we didn't find the xid yet it must either not be in snapshot, or
+	 * in the case that a toplevel xid overflowed, it might be stored in
+	 * subtrans. This logic is important because it means we don't need
+	 * to write to subtrans until we overflow the procarray cache, greatly
+	 * reducing the traffic into subtrans.
+	 */
+	if (!parent && snapshot->suboverflowed)
+	{
 		/*
-		 * In recovery we store all xids in the subxact array because it is by
-		 * far the bigger array, and we mostly don't know which xids are
-		 * top-level and which are subxacts. The xip array is empty.
-		 *
-		 * We start by searching subtrans, if we overflowed.
+		 * Snapshot overflowed, so convert xid to top-level.  This is safe
+		 * because we eliminated too-old XIDs above.
 		 */
-		if (snapshot->suboverflowed)
-		{
-			/*
-			 * Snapshot overflowed, so convert xid to top-level.  This is safe
-			 * because we eliminated too-old XIDs above.
-			 */
-			xid = SubTransGetTopmostTransaction(xid);
+		TransactionId pxid = SubTransGetTopmostTransaction(xid);
 
-			/*
-			 * If xid was indeed a subxact, we might now have an xid < xmin,
-			 * so recheck to avoid an array scan.  No point in rechecking
-			 * xmax.
-			 */
-			if (TransactionIdPrecedes(xid, snapshot->xmin))
-				return false;
-		}
+		/* If xid wasn't a subxact, then we're done */
+		if (pxid == xid)
+			return false;
+
+		xid = pxid;
 
 		/*
-		 * We now have either a top-level xid higher than xmin or an
-		 * indeterminate xid. We don't know whether it's top level or subxact
-		 * but it doesn't matter. If it's present, the xid is visible.
+		 * If xid was indeed a subxact, we might now have an xid < xmin,
+		 * so recheck to avoid an array scan.  No point in rechecking
+		 * xmax.
 		 */
-		for (j = 0; j < snapshot->subxcnt; j++)
-		{
-			if (TransactionIdEquals(xid, snapshot->subxip[j]))
-				return true;
-		}
+		if (TransactionIdPrecedes(xid, snapshot->xmin))
+			return false;
+
+		parent = true;
+		goto recheck_parent;
 	}
 
 	return false;
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index b01fa52139..210001cceb 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -52,6 +52,7 @@ extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
 extern RunningTransactions GetRunningTransactionData(void);
 
 extern bool TransactionIdIsInProgress(TransactionId xid);
+extern bool GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid);
 extern bool TransactionIdIsActive(TransactionId xid);
 extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
 extern TransactionId GetOldestTransactionIdConsideredRunning(void);
#15Julien Rouhaud
rjuju123@gmail.com
In reply to: Simon Riggs (#14)
Re: suboverflowed subtransactions concurrency performance optimize

Hi,

On Wed, Dec 08, 2021 at 04:39:11PM +0000, Simon Riggs wrote:

This patch shows where I'm going, with changes in GetSnapshotData()
and XidInMVCCSnapshot() and XactLockTableWait().
Passes make check, but needs much more, so this is review-only at this
stage to give a flavour of what is intended.

Thanks a lot to everyone involved in this!

I can't find any entry in the commitfest for the work being done here. Did I
miss something? If not could you create an entry in the next commitfest to
make sure that it doesn't get forgotten?

#16Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Simon Riggs (#8)
1 attachment(s)
Re: suboverflowed subtransactions concurrency performance optimize

On Tue, 30 Nov 2021 at 12:19, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Mon, 30 Aug 2021 at 11:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi Pengcheng!

You are solving important problem, thank you!

30 авг. 2021 г., в 13:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а):

To resolve this performance problem, we think about a solution which cache
SubtransSLRU to local cache.
First we can query parent transaction id from SubtransSLRU, and copy the
SLRU page to local cache page.
After that if we need query parent transaction id again, we can query it
from local cache directly.

A copy of SLRU in each backend's cache can consume a lot of memory.

Yes, copying the whole SLRU into local cache seems overkill.

Why create a copy if we can optimise shared representation of SLRU?

transam.c uses a single item cache to prevent thrashing from repeated
lookups, which reduces problems with shared access to SLRUs.
multitrans.c also has similar.

I notice that subtrans. doesn't have this, but could easily do so.
Patch attached, which seems separate to other attempts at tuning.

Re-attached, so that the CFapp isn't confused between the multiple
patches on this thread.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

subtrans_single_item_cache.v1.patchapplication/octet-stream; name=subtrans_single_item_cache.v1.patchDownload
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6a8e521f89..8aa3d9e53c 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -54,6 +54,14 @@
 #define TransactionIdToPage(xid) ((xid) / (TransactionId) SUBTRANS_XACTS_PER_PAGE)
 #define TransactionIdToEntry(xid) ((xid) % (TransactionId) SUBTRANS_XACTS_PER_PAGE)
 
+/*
+ * Single-item cache for results of SubTransGetTopmostTransaction.  It's worth having
+ * such a cache because we frequently find ourselves repeatedly checking the
+ * same XID, for example when scanning a table just after a bulk insert,
+ * update, or delete.
+ */
+static TransactionId cachedFetchXid = InvalidTransactionId;
+static TransactionId cachedFetchTopmostXid = InvalidTransactionId;
 
 /*
  * Link to shared-memory data structures for SUBTRANS control
@@ -155,6 +163,13 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	/* Can't ask about stuff that might not be around anymore */
 	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
 
+	/*
+	 * Before going to the subtrans log, check our single item cache to
+	 * see if we know the result from a previous/recent request.
+	 */
+	if (TransactionIdEquals(xid, cachedFetchXid))
+		return cachedFetchTopmostXid;
+
 	while (TransactionIdIsValid(parentXid))
 	{
 		previousXid = parentXid;
@@ -174,6 +189,9 @@ SubTransGetTopmostTransaction(TransactionId xid)
 
 	Assert(TransactionIdIsValid(previousXid));
 
+	cachedFetchXid = xid;
+	cachedFetchTopmostXid = previousXid;
+
 	return previousXid;
 }
 
#17Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Simon Riggs (#16)
Re: suboverflowed subtransactions concurrency performance optimize

17 янв. 2022 г., в 18:44, Simon Riggs <simon.riggs@enterprisedb.com> написал(а):

Re-attached, so that the CFapp isn't confused between the multiple
patches on this thread.

FWIW I've looked into the patch and it looks good to me. Comments describing when the cache is useful seem valid.

Thanks!

Best regards, Andrey Borodin.

#18Julien Rouhaud
rjuju123@gmail.com
In reply to: Simon Riggs (#16)
Re: suboverflowed subtransactions concurrency performance optimize

Hi,

On Mon, Jan 17, 2022 at 01:44:02PM +0000, Simon Riggs wrote:

Re-attached, so that the CFapp isn't confused between the multiple
patches on this thread.

Thanks a lot for working on this!

The patch is simple and overall looks good to me. A few comments though:

+/*
+ * Single-item cache for results of SubTransGetTopmostTransaction.  It's worth having
+ * such a cache because we frequently find ourselves repeatedly checking the
+ * same XID, for example when scanning a table just after a bulk insert,
+ * update, or delete.
+ */
+static TransactionId cachedFetchXid = InvalidTransactionId;
+static TransactionId cachedFetchTopmostXid = InvalidTransactionId;

The comment is above the 80 chars after
s/TransactionLogFetch/SubTransGetTopmostTransaction/, and I don't think this
comment is valid for subtrans.c.

Also, maybe naming the first variable cachedFetchSubXid would make it a bit
clearer?

It would be nice to see some benchmarks, for both when this change is
enough to avoid a contention (when there's a single long-running overflowed
backend) and when it's not enough. That will also be useful if/when working on
the "rethink_subtrans" patch.

#19Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Julien Rouhaud (#18)
Re: suboverflowed subtransactions concurrency performance optimize

On Mon, 7 Mar 2022 at 09:49, Julien Rouhaud <rjuju123@gmail.com> wrote:

Hi,

On Mon, Jan 17, 2022 at 01:44:02PM +0000, Simon Riggs wrote:

Re-attached, so that the CFapp isn't confused between the multiple
patches on this thread.

Thanks a lot for working on this!

The patch is simple and overall looks good to me. A few comments though:

+/*
+ * Single-item cache for results of SubTransGetTopmostTransaction.  It's worth having
+ * such a cache because we frequently find ourselves repeatedly checking the
+ * same XID, for example when scanning a table just after a bulk insert,
+ * update, or delete.
+ */
+static TransactionId cachedFetchXid = InvalidTransactionId;
+static TransactionId cachedFetchTopmostXid = InvalidTransactionId;

The comment is above the 80 chars after
s/TransactionLogFetch/SubTransGetTopmostTransaction/, and I don't think this
comment is valid for subtrans.c.

What aspect makes it invalid? The comment seems exactly applicable to
me; Andrey thinks so also.

Also, maybe naming the first variable cachedFetchSubXid would make it a bit
clearer?

Sure, that can be done.

It would be nice to see some benchmarks, for both when this change is
enough to avoid a contention (when there's a single long-running overflowed
backend) and when it's not enough. That will also be useful if/when working on
the "rethink_subtrans" patch.

The patch doesn't do anything about the case of when there's a single
long-running overflowed backend, nor does it claim that.

The patch will speed up calls to SubTransGetTopmostTransaction(), which occur in
src/backend/access/heap/heapam.c
src/backend/utils/time/snapmgr.c
src/backend/storage/lmgr/lmgr.c
src/backend/storage/ipc/procarray.c

The patch was posted because TransactionLogFetch() has a cache, yet
SubTransGetTopmostTransaction() does not, yet the argument should be
identical in both cases.

--
Simon Riggs http://www.EnterpriseDB.com/

#20Julien Rouhaud
rjuju123@gmail.com
In reply to: Simon Riggs (#19)
Re: suboverflowed subtransactions concurrency performance optimize

On Mon, Mar 07, 2022 at 01:27:40PM +0000, Simon Riggs wrote:

+/*
+ * Single-item cache for results of SubTransGetTopmostTransaction.  It's worth having
+ * such a cache because we frequently find ourselves repeatedly checking the
+ * same XID, for example when scanning a table just after a bulk insert,
+ * update, or delete.
+ */
+static TransactionId cachedFetchXid = InvalidTransactionId;
+static TransactionId cachedFetchTopmostXid = InvalidTransactionId;

The comment is above the 80 chars after
s/TransactionLogFetch/SubTransGetTopmostTransaction/, and I don't think this
comment is valid for subtrans.c.

What aspect makes it invalid? The comment seems exactly applicable to
me; Andrey thinks so also.

Sorry, I somehow missed the "for example", and was thinking that
SubTransGetTopmostTransaction was used in many other places compared to
TransactionIdDidCommit and friends.

It would be nice to see some benchmarks, for both when this change is
enough to avoid a contention (when there's a single long-running overflowed
backend) and when it's not enough. That will also be useful if/when working on
the "rethink_subtrans" patch.

The patch doesn't do anything about the case of when there's a single
long-running overflowed backend, nor does it claim that.

I was thinking that having a cache for SubTransGetTopmostTransaction could help
at least to some extent for that problem, sorry if that's not the case.

I'm still curious on how much this simple optimization can help in some
scenarios, even if they're somewhat artificial.

The patch was posted because TransactionLogFetch() has a cache, yet
SubTransGetTopmostTransaction() does not, yet the argument should be
identical in both cases.

I totally agree with that.

#21Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#20)
Re: suboverflowed subtransactions concurrency performance optimize

On Mon, Mar 07, 2022 at 10:17:41PM +0800, Julien Rouhaud wrote:

On Mon, Mar 07, 2022 at 01:27:40PM +0000, Simon Riggs wrote:

The patch was posted because TransactionLogFetch() has a cache, yet
SubTransGetTopmostTransaction() does not, yet the argument should be
identical in both cases.

I totally agree with that.

Agreed as well. That's worth doing in isolation and that will save
some lookups of pg_subtrans anyway while being simple. As mentioned
upthread, this needed an indentation, and the renaming of
cachedFetchXid to cachedFetchSubXid looks adapted. So.. Applied all
those things.
--
Michael

#22Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Michael Paquier (#21)
Re: suboverflowed subtransactions concurrency performance optimize

On Thu, 7 Apr 2022 at 00:36, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Mar 07, 2022 at 10:17:41PM +0800, Julien Rouhaud wrote:

On Mon, Mar 07, 2022 at 01:27:40PM +0000, Simon Riggs wrote:

The patch was posted because TransactionLogFetch() has a cache, yet
SubTransGetTopmostTransaction() does not, yet the argument should be
identical in both cases.

I totally agree with that.

Agreed as well. That's worth doing in isolation and that will save
some lookups of pg_subtrans anyway while being simple. As mentioned
upthread, this needed an indentation, and the renaming of
cachedFetchXid to cachedFetchSubXid looks adapted. So.. Applied all
those things.

Thanks Michael, thanks all.

--
Simon Riggs http://www.EnterpriseDB.com/

#23Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#21)
Re: suboverflowed subtransactions concurrency performance optimize

Hi,

On 2022-04-07 14:36:35 +0900, Michael Paquier wrote:

On Mon, Mar 07, 2022 at 10:17:41PM +0800, Julien Rouhaud wrote:

On Mon, Mar 07, 2022 at 01:27:40PM +0000, Simon Riggs wrote:

The patch was posted because TransactionLogFetch() has a cache, yet
SubTransGetTopmostTransaction() does not, yet the argument should be
identical in both cases.

I totally agree with that.

Agreed as well. That's worth doing in isolation and that will save
some lookups of pg_subtrans anyway while being simple. As mentioned
upthread, this needed an indentation, and the renaming of
cachedFetchXid to cachedFetchSubXid looks adapted. So.. Applied all
those things.

As is, this strikes me as dangerous. At the very least this ought to be
structured so it can have assertions verifying that the cache contents are
correct.

It's far from obvious that it is correct to me, fwiw. Potential issues:

1) The result of SubTransGetTopmostTransaction() can change between subsequent
calls. If TransactionXmin advances, the TransactionXmin cutoff can change
the result. It might be unreachable or harmless, but it's not obvious that
it is, and there's zero comments explaining why it is obvious.

2) xid wraparound. There's nothing forcing SubTransGetTopmostTransaction() to
be called regularly, so even if a backend isn't idle, the cache could just
get more and more outdated until hitting wraparound

To me it also seems odd that we cache in SubTransGetTopmostTransaction(), but
not in SubTransGetParent(). I think it's at least as common to end up with
subtrans access via TransactionIdDidCommit(), which calls SubTransGetParent()
rather than SubTransGetTopmostTransaction()? Why is
SubTransGetTopmostTransaction() the correct layer for caching?

I tried to find a benchmark result for this patch upthread, without
success. Has there been validation this helps with anything?

Greetings,

Andres Freund

#24Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#23)
Re: suboverflowed subtransactions concurrency performance optimize

On Tue, May 24, 2022 at 04:52:50PM -0700, Andres Freund wrote:

As is, this strikes me as dangerous. At the very least this ought to be
structured so it can have assertions verifying that the cache contents are
correct.

Well, under USE_ASSERT_CHECKING we could force a recalculation of the
loop itself before re-checking and sending the cached result, as one
thing.

It's far from obvious that it is correct to me, fwiw. Potential issues:

1) The result of SubTransGetTopmostTransaction() can change between subsequent
calls. If TransactionXmin advances, the TransactionXmin cutoff can change
the result. It might be unreachable or harmless, but it's not obvious that
it is, and there's zero comments explaining why it is obvious.

I am not sure to follow on this one. A change in the TransactionXmin
cutoff does not change the result retrieved for parentXid from the
SLRU layer, because the xid cached refers to a parent still running.

2) xid wraparound. There's nothing forcing SubTransGetTopmostTransaction() to
be called regularly, so even if a backend isn't idle, the cache could just
get more and more outdated until hitting wraparound

Hence, you mean that the non-regularity of the call makes it more
exposed to an inconsistent result after a wraparound?

To me it also seems odd that we cache in SubTransGetTopmostTransaction(), but
not in SubTransGetParent(). I think it's at least as common to end up with
subtrans access via TransactionIdDidCommit(), which calls SubTransGetParent()
rather than SubTransGetTopmostTransaction()? Why is
SubTransGetTopmostTransaction() the correct layer for caching?

Hmm. I recall thinking about this exact point but left it out of the
caching to maintain a symmetry with the setter routine that does the
same and reverse operation on those SLRUs. Anyway, one reason to not
use SubTransGetParent() is that it may return an invalid XID which
we'd better not cache depending on its use (say, a serialized
transaction), and SubTransGetTopmostTransaction() looping around to we
make sure to never hit this case looks like the correct path to do
do. Well, we could also store nothing if an invalid parent is found,
but then the previous argument about the symmetry of the routines
would not apply. This would be beneficial about cases like the one at
the top of the thread about SLRU caches when subxids are overflowing
when referring to the same XID. The ODBC driver likes a lot
savepoints, for example.

I tried to find a benchmark result for this patch upthread, without
success. Has there been validation this helps with anything?

I have been studying that again, and you are right that I should have
asked for much more here. A benchmark like what's presented upthread
may show some benefits with the case of the same savepoint used across
multiple queries, only if with a caching of SubTransGetParent(), with
enough subxids exhausted to overflow the snapshots. It would be
better to revisit that stuff, and the benefit is limited with only
SubTransGetTopmostTransaction(). Point 2) is something I did not
consider, and that's a good one. For now, it looks better to revert
this part rather than tweak it post beta1.
--
Michael

#25Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#24)
Re: suboverflowed subtransactions concurrency performance optimize

On Thu, May 26, 2022 at 12:53 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, May 24, 2022 at 04:52:50PM -0700, Andres Freund wrote:

2) xid wraparound. There's nothing forcing SubTransGetTopmostTransaction() to
be called regularly, so even if a backend isn't idle, the cache could just
get more and more outdated until hitting wraparound

Hence, you mean that the non-regularity of the call makes it more
exposed to an inconsistent result after a wraparound?

Won't in theory the similar cache in transam.c is also prone to
similar behavior?

Anyway, how about if we clear this cache for subtrans whenever
TransactionXmin is advanced and cachedFetchSubXid precedes it? The
comments atop SubTransGetTopmostTransaction seem to state that we
don't care about the exact topmost parent when the intermediate one
precedes TransactionXmin. I think it should preserve the optimization
because anyway for such cases there is a fast path in
SubTransGetTopmostTransaction.

--
With Regards,
Amit Kapila.

#26Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#25)
Re: suboverflowed subtransactions concurrency performance optimize

Hi,

On 2022-05-27 15:44:39 +0530, Amit Kapila wrote:

On Thu, May 26, 2022 at 12:53 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, May 24, 2022 at 04:52:50PM -0700, Andres Freund wrote:

2) xid wraparound. There's nothing forcing SubTransGetTopmostTransaction() to
be called regularly, so even if a backend isn't idle, the cache could just
get more and more outdated until hitting wraparound

Hence, you mean that the non-regularity of the call makes it more
exposed to an inconsistent result after a wraparound?

Won't in theory the similar cache in transam.c is also prone to
similar behavior?

It's not quite the same risk, because there we are likely to actually hit the
cache regularly. Whereas quite normal workloads might not hit this cache for
days on end.

Anyway, how about if we clear this cache for subtrans whenever
TransactionXmin is advanced and cachedFetchSubXid precedes it? The
comments atop SubTransGetTopmostTransaction seem to state that we
don't care about the exact topmost parent when the intermediate one
precedes TransactionXmin. I think it should preserve the optimization
because anyway for such cases there is a fast path in
SubTransGetTopmostTransaction.

There's not even a proof this does speed up anything useful! There's not a
single benchmark for the patch.

Greetings,

Andres Freund

In reply to: Andres Freund (#26)
Re: suboverflowed subtransactions concurrency performance optimize

On Fri, May 27, 2022 at 8:55 AM Andres Freund <andres@anarazel.de> wrote:

Anyway, how about if we clear this cache for subtrans whenever
TransactionXmin is advanced and cachedFetchSubXid precedes it? The
comments atop SubTransGetTopmostTransaction seem to state that we
don't care about the exact topmost parent when the intermediate one
precedes TransactionXmin. I think it should preserve the optimization
because anyway for such cases there is a fast path in
SubTransGetTopmostTransaction.

There's not even a proof this does speed up anything useful! There's not a
single benchmark for the patch.

I find it hard to believe that there wasn't even a cursory effort at
performance validation before this was committed, but that's what it
looks like.

--
Peter Geoghegan

#28Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#27)
Re: suboverflowed subtransactions concurrency performance optimize

On 2022-05-27 11:48:45 -0700, Peter Geoghegan wrote:

On Fri, May 27, 2022 at 8:55 AM Andres Freund <andres@anarazel.de> wrote:

Anyway, how about if we clear this cache for subtrans whenever
TransactionXmin is advanced and cachedFetchSubXid precedes it? The
comments atop SubTransGetTopmostTransaction seem to state that we
don't care about the exact topmost parent when the intermediate one
precedes TransactionXmin. I think it should preserve the optimization
because anyway for such cases there is a fast path in
SubTransGetTopmostTransaction.

There's not even a proof this does speed up anything useful! There's not a
single benchmark for the patch.

I find it hard to believe that there wasn't even a cursory effort at
performance validation before this was committed, but that's what it
looks like.

Yea. Imo this pretty clearly should be reverted. It has correctness issues,
testing issues and we don't know whether it does anything useful.

In reply to: Andres Freund (#28)
Re: suboverflowed subtransactions concurrency performance optimize

On Fri, May 27, 2022 at 11:59 AM Andres Freund <andres@anarazel.de> wrote:

On 2022-05-27 11:48:45 -0700, Peter Geoghegan wrote:

I find it hard to believe that there wasn't even a cursory effort at
performance validation before this was committed, but that's what it
looks like.

Yea. Imo this pretty clearly should be reverted. It has correctness issues,
testing issues and we don't know whether it does anything useful.

It should definitely be reverted.

--
Peter Geoghegan

#30Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#26)
Re: suboverflowed subtransactions concurrency performance optimize

On Fri, May 27, 2022 at 08:55:02AM -0700, Andres Freund wrote:

On 2022-05-27 15:44:39 +0530, Amit Kapila wrote:

Won't in theory the similar cache in transam.c is also prone to
similar behavior?

TransactionIdDidCommit() and TransactionIdDidAbort() are used in much
more code paths for visibility purposes, contrary to the subtrans.c
ones.

It's not quite the same risk, because there we are likely to actually hit the
cache regularly. Whereas quite normal workloads might not hit this cache for
days on end.

Yeah. In short, this mostly depends on the use of savepoints and the
number of XIDs issued until PGPROC_MAX_CACHED_SUBXIDS is reached, and
a single cache entry in this code path would reduce the pressure on
the SLRU lookups depending on the number of queries issued, for
example. One thing I know of that likes to abuse of savepoints and
could cause overflows to make this easier to hit is the ODBC driver
coupled with short queries in long transactions, where its internals
enforce the use of a savepoint each time a query is issued by an
application (pretty much what the benchmark at the top of the thread
does). In this case, even the single cache approach would not help
much because I recall that we finish with one savepoint per query to
be able to rollback to any previous state within a given transaction
(as the ODBC APIs allow).

Doing a caching within SubTransGetParent() would be more interesting,
for sure, though the invalidation to clean the cache and to make that
robust enough may prove tricky.

It took me some time to come back to this thread. The change has now
been reverted.
--
Michael