GISTSTATE is too large

Started by Andres Freundover 4 years ago5 messages
#1Andres Freund
andres@anarazel.de

Hi,

On twitter it was mentioned [1]https://twitter.com/komzpa/status/1386420422225240065 that gist index builds spend a lot of time
in FunctionCall3Coll. Which could be addressed to a good degree by not
using FunctionCall3Coll() which needs to call InitFunctionCallInfoData()
every time, but instead doing so once by including the FunctionCallInfo
in GISTSTATE.

Which made me look at GISTSTATEs layout. And, uh, I was a bit shocked:
struct GISTSTATE {
MemoryContext scanCxt; /* 0 8 */
MemoryContext tempCxt; /* 8 8 */
TupleDesc leafTupdesc; /* 16 8 */
TupleDesc nonLeafTupdesc; /* 24 8 */
TupleDesc fetchTupdesc; /* 32 8 */
FmgrInfo consistentFn[32]; /* 40 1536 */
/* --- cacheline 24 boundary (1536 bytes) was 40 bytes ago --- */
FmgrInfo unionFn[32]; /* 1576 1536 */
...
/* --- cacheline 216 boundary (13824 bytes) was 40 bytes ago --- */
Oid supportCollation[32]; /* 13864 128 */

/* size: 13992, cachelines: 219, members: 15 */
/* last cacheline: 40 bytes */
};

So the basic GISTSTATE is 14kB large. And all the information needed to
call support functions for one attribute is spaced so far appart that
it's guaranteed to be on different cachelines and to be very unlikely to
be prefetched by the hardware prefetcher.

It seems pretty clear that this should be changed to be something more
like

typedef struct GIST_COL_STATE
{
FmgrInfo consistentFn;
FmgrInfo unionFn;
FmgrInfo compressFn;
FmgrInfo decompressFn;
FmgrInfo penaltyFn;
FmgrInfo picksplitFn;
FmgrInfo equalFn;
FmgrInfo distanceFn;
FmgrInfo fetchFn;

/* Collations to pass to the support functions */
Oid supportCollation;
} GIST_COL_STATE;

typedef struct GISTSTATE
{
MemoryContext scanCxt; /* context for scan-lifespan data */
MemoryContext tempCxt; /* short-term context for calling functions */

TupleDesc leafTupdesc; /* index's tuple descriptor */
TupleDesc nonLeafTupdesc; /* truncated tuple descriptor for non-leaf
* pages */
TupleDesc fetchTupdesc; /* tuple descriptor for tuples returned in an
* index-only scan */

GIST_COL_STATE column_state[FLEXIBLE_ARRAY_MEMBER];
}

with initGISTstate allocating based on
IndexRelationGetNumberOfKeyAttributes() instead of using a constant.

And then subsequently change GIST_COL_STATE to embed the
FunctionCallInfo, rather than initializiing them on the stack for every
call.

I'm not planning on doing this work, but I thought it's sensible to send
to the list anyway.

Greetings,

Andres Freund

[1]: https://twitter.com/komzpa/status/1386420422225240065

#2Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Andres Freund (#1)
Re: GISTSTATE is too large

26 апр. 2021 г., в 03:20, Andres Freund <andres@anarazel.de> написал(а):

So the basic GISTSTATE is 14kB large. And all the information needed to
call support functions for one attribute is spaced so far appart that
it's guaranteed to be on different cachelines and to be very unlikely to
be prefetched by the hardware prefetcher.

It seems pretty clear that this should be changed to be something more
like

...

with initGISTstate allocating based on
IndexRelationGetNumberOfKeyAttributes() instead of using a constant.

And then subsequently change GIST_COL_STATE to embed the
FunctionCallInfo, rather than initializiing them on the stack for every
call.

Yes, this makes sense. Also, it's viable to reorder fields to group scan and insert routines together, currently they are interlaced.
Or maybe we could even split state into insert state and scan state.

I'm not planning on doing this work, but I thought it's sensible to send
to the list anyway.

Thanks for idea, I would give it a shot this summer, unless someone else will take it earlier.
BTW, It would make sense to avoid penalty call at all: there are many GiST-based access methods that want to see all items together to choose insertion subtree (e.g. R*-tree and RR-tree). Calling penalty function for each tuple on page often is not a good idea at all.

Thanks!

Best regards, Andrey Borodin.

#3Andreas Karlsson
andreas@proxel.se
In reply to: Andres Freund (#1)
1 attachment(s)
Re: GISTSTATE is too large

On 4/26/21 12:20 AM, Andres Freund wrote:

It seems pretty clear that this should be changed to be something more
like

[...]

with initGISTstate allocating based on
IndexRelationGetNumberOfKeyAttributes() instead of using a constant.

And then subsequently change GIST_COL_STATE to embed the
FunctionCallInfo, rather than initializiing them on the stack for every
call.

I'm not planning on doing this work, but I thought it's sensible to send
to the list anyway.

I did the first part since it seemed easy enough and an obvious win for
all workloads.

Andreas

Attachments:

0001-Shrink-GISTSTATE.patchtext/x-patch; charset=UTF-8; name=0001-Shrink-GISTSTATE.patchDownload
From 63158a51add9cbd43ca7d5d6219cd6127657f25f Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Sun, 30 May 2021 15:11:35 +0200
Subject: [PATCH] Shrink GISTSTATE

---
 src/backend/access/gist/gist.c      | 58 ++++++++++++++---------------
 src/backend/access/gist/gistscan.c  |  4 +-
 src/backend/access/gist/gistsplit.c | 12 +++---
 src/backend/access/gist/gistutil.c  | 38 +++++++++----------
 src/include/access/gist_private.h   | 29 +++++++++------
 5 files changed, 71 insertions(+), 70 deletions(-)

diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 0683f42c25..8bccb76dd2 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1507,11 +1507,6 @@ initGISTstate(Relation index)
 	MemoryContext oldCxt;
 	int			i;
 
-	/* safety check to protect fixed-size arrays in GISTSTATE */
-	if (index->rd_att->natts > INDEX_MAX_KEYS)
-		elog(ERROR, "numberOfAttributes %d > %d",
-			 index->rd_att->natts, INDEX_MAX_KEYS);
-
 	/* Create the memory context that will hold the GISTSTATE */
 	scanCxt = AllocSetContextCreate(CurrentMemoryContext,
 									"GiST scan context",
@@ -1519,7 +1514,8 @@ initGISTstate(Relation index)
 	oldCxt = MemoryContextSwitchTo(scanCxt);
 
 	/* Create and fill in the GISTSTATE */
-	giststate = (GISTSTATE *) palloc(sizeof(GISTSTATE));
+	giststate = (GISTSTATE *) palloc(offsetof(GISTSTATE, column_state) +
+									 sizeof(GIST_COL_STATE) * index->rd_att->natts);
 
 	giststate->scanCxt = scanCxt;
 	giststate->tempCxt = scanCxt;	/* caller must change this if needed */
@@ -1541,54 +1537,54 @@ initGISTstate(Relation index)
 
 	for (i = 0; i < IndexRelationGetNumberOfKeyAttributes(index); i++)
 	{
-		fmgr_info_copy(&(giststate->consistentFn[i]),
+		fmgr_info_copy(&(giststate->column_state[i].consistentFn),
 					   index_getprocinfo(index, i + 1, GIST_CONSISTENT_PROC),
 					   scanCxt);
-		fmgr_info_copy(&(giststate->unionFn[i]),
+		fmgr_info_copy(&(giststate->column_state[i].unionFn),
 					   index_getprocinfo(index, i + 1, GIST_UNION_PROC),
 					   scanCxt);
 
 		/* opclasses are not required to provide a Compress method */
 		if (OidIsValid(index_getprocid(index, i + 1, GIST_COMPRESS_PROC)))
-			fmgr_info_copy(&(giststate->compressFn[i]),
+			fmgr_info_copy(&(giststate->column_state[i].compressFn),
 						   index_getprocinfo(index, i + 1, GIST_COMPRESS_PROC),
 						   scanCxt);
 		else
-			giststate->compressFn[i].fn_oid = InvalidOid;
+			giststate->column_state[i].compressFn.fn_oid = InvalidOid;
 
 		/* opclasses are not required to provide a Decompress method */
 		if (OidIsValid(index_getprocid(index, i + 1, GIST_DECOMPRESS_PROC)))
-			fmgr_info_copy(&(giststate->decompressFn[i]),
+			fmgr_info_copy(&(giststate->column_state[i].decompressFn),
 						   index_getprocinfo(index, i + 1, GIST_DECOMPRESS_PROC),
 						   scanCxt);
 		else
-			giststate->decompressFn[i].fn_oid = InvalidOid;
+			giststate->column_state[i].decompressFn.fn_oid = InvalidOid;
 
-		fmgr_info_copy(&(giststate->penaltyFn[i]),
+		fmgr_info_copy(&(giststate->column_state[i].penaltyFn),
 					   index_getprocinfo(index, i + 1, GIST_PENALTY_PROC),
 					   scanCxt);
-		fmgr_info_copy(&(giststate->picksplitFn[i]),
+		fmgr_info_copy(&(giststate->column_state[i].picksplitFn),
 					   index_getprocinfo(index, i + 1, GIST_PICKSPLIT_PROC),
 					   scanCxt);
-		fmgr_info_copy(&(giststate->equalFn[i]),
+		fmgr_info_copy(&(giststate->column_state[i].equalFn),
 					   index_getprocinfo(index, i + 1, GIST_EQUAL_PROC),
 					   scanCxt);
 
 		/* opclasses are not required to provide a Distance method */
 		if (OidIsValid(index_getprocid(index, i + 1, GIST_DISTANCE_PROC)))
-			fmgr_info_copy(&(giststate->distanceFn[i]),
+			fmgr_info_copy(&(giststate->column_state[i].distanceFn),
 						   index_getprocinfo(index, i + 1, GIST_DISTANCE_PROC),
 						   scanCxt);
 		else
-			giststate->distanceFn[i].fn_oid = InvalidOid;
+			giststate->column_state[i].distanceFn.fn_oid = InvalidOid;
 
 		/* opclasses are not required to provide a Fetch method */
 		if (OidIsValid(index_getprocid(index, i + 1, GIST_FETCH_PROC)))
-			fmgr_info_copy(&(giststate->fetchFn[i]),
+			fmgr_info_copy(&(giststate->column_state[i].fetchFn),
 						   index_getprocinfo(index, i + 1, GIST_FETCH_PROC),
 						   scanCxt);
 		else
-			giststate->fetchFn[i].fn_oid = InvalidOid;
+			giststate->column_state[i].fetchFn.fn_oid = InvalidOid;
 
 		/*
 		 * If the index column has a specified collation, we should honor that
@@ -1602,24 +1598,24 @@ initGISTstate(Relation index)
 		 * any cases where a GiST storage type has a nondefault collation.)
 		 */
 		if (OidIsValid(index->rd_indcollation[i]))
-			giststate->supportCollation[i] = index->rd_indcollation[i];
+			giststate->column_state[i].supportCollation = index->rd_indcollation[i];
 		else
-			giststate->supportCollation[i] = DEFAULT_COLLATION_OID;
+			giststate->column_state[i].supportCollation = DEFAULT_COLLATION_OID;
 	}
 
 	/* No opclass information for INCLUDE attributes */
 	for (; i < index->rd_att->natts; i++)
 	{
-		giststate->consistentFn[i].fn_oid = InvalidOid;
-		giststate->unionFn[i].fn_oid = InvalidOid;
-		giststate->compressFn[i].fn_oid = InvalidOid;
-		giststate->decompressFn[i].fn_oid = InvalidOid;
-		giststate->penaltyFn[i].fn_oid = InvalidOid;
-		giststate->picksplitFn[i].fn_oid = InvalidOid;
-		giststate->equalFn[i].fn_oid = InvalidOid;
-		giststate->distanceFn[i].fn_oid = InvalidOid;
-		giststate->fetchFn[i].fn_oid = InvalidOid;
-		giststate->supportCollation[i] = InvalidOid;
+		giststate->column_state[i].consistentFn.fn_oid = InvalidOid;
+		giststate->column_state[i].unionFn.fn_oid = InvalidOid;
+		giststate->column_state[i].compressFn.fn_oid = InvalidOid;
+		giststate->column_state[i].decompressFn.fn_oid = InvalidOid;
+		giststate->column_state[i].penaltyFn.fn_oid = InvalidOid;
+		giststate->column_state[i].picksplitFn.fn_oid = InvalidOid;
+		giststate->column_state[i].equalFn.fn_oid = InvalidOid;
+		giststate->column_state[i].distanceFn.fn_oid = InvalidOid;
+		giststate->column_state[i].fetchFn.fn_oid = InvalidOid;
+		giststate->column_state[i].supportCollation = InvalidOid;
 	}
 
 	MemoryContextSwitchTo(oldCxt);
diff --git a/src/backend/access/gist/gistscan.c b/src/backend/access/gist/gistscan.c
index 61e92cf0f5..19fe161366 100644
--- a/src/backend/access/gist/gistscan.c
+++ b/src/backend/access/gist/gistscan.c
@@ -258,7 +258,7 @@ gistrescan(IndexScanDesc scan, ScanKey key, int nkeys,
 			 * of function implementing filtering operator.
 			 */
 			fmgr_info_copy(&(skey->sk_func),
-						   &(so->giststate->consistentFn[skey->sk_attno - 1]),
+						   &(so->giststate->column_state[skey->sk_attno - 1].consistentFn),
 						   so->giststate->scanCxt);
 
 			/* Restore prior fn_extra pointers, if not first time */
@@ -304,7 +304,7 @@ gistrescan(IndexScanDesc scan, ScanKey key, int nkeys,
 		for (i = 0; i < scan->numberOfOrderBys; i++)
 		{
 			ScanKey		skey = scan->orderByData + i;
-			FmgrInfo   *finfo = &(so->giststate->distanceFn[skey->sk_attno - 1]);
+			FmgrInfo   *finfo = &(so->giststate->column_state[skey->sk_attno - 1].distanceFn);
 
 			/* Check we actually have a distance function ... */
 			if (!OidIsValid(finfo->fn_oid))
diff --git a/src/backend/access/gist/gistsplit.c b/src/backend/access/gist/gistsplit.c
index 526ed1218e..c8fc67f91c 100644
--- a/src/backend/access/gist/gistsplit.c
+++ b/src/backend/access/gist/gistsplit.c
@@ -378,16 +378,16 @@ genericPickSplit(GISTSTATE *giststate, GistEntryVector *entryvec, GIST_SPLITVEC
 	evec->n = v->spl_nleft;
 	memcpy(evec->vector, entryvec->vector + FirstOffsetNumber,
 		   sizeof(GISTENTRY) * evec->n);
-	v->spl_ldatum = FunctionCall2Coll(&giststate->unionFn[attno],
-									  giststate->supportCollation[attno],
+	v->spl_ldatum = FunctionCall2Coll(&giststate->column_state[attno].unionFn,
+									  giststate->column_state[attno].supportCollation,
 									  PointerGetDatum(evec),
 									  PointerGetDatum(&nbytes));
 
 	evec->n = v->spl_nright;
 	memcpy(evec->vector, entryvec->vector + FirstOffsetNumber + v->spl_nleft,
 		   sizeof(GISTENTRY) * evec->n);
-	v->spl_rdatum = FunctionCall2Coll(&giststate->unionFn[attno],
-									  giststate->supportCollation[attno],
+	v->spl_rdatum = FunctionCall2Coll(&giststate->column_state[attno].unionFn,
+									  giststate->column_state[attno].supportCollation,
 									  PointerGetDatum(evec),
 									  PointerGetDatum(&nbytes));
 }
@@ -430,8 +430,8 @@ gistUserPicksplit(Relation r, GistEntryVector *entryvec, int attno, GistSplitVec
 	 * Let the opclass-specific PickSplit method do its thing.  Note that at
 	 * this point we know there are no null keys in the entryvec.
 	 */
-	FunctionCall2Coll(&giststate->picksplitFn[attno],
-					  giststate->supportCollation[attno],
+	FunctionCall2Coll(&giststate->column_state[attno].picksplitFn,
+					  giststate->column_state[attno].supportCollation,
 					  PointerGetDatum(entryvec),
 					  PointerGetDatum(sv));
 
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 8dcd53c457..32a316b880 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -200,8 +200,8 @@ gistMakeUnionItVec(GISTSTATE *giststate, IndexTuple *itvec, int len,
 			}
 
 			/* Make union and store in attr array */
-			attr[i] = FunctionCall2Coll(&giststate->unionFn[i],
-										giststate->supportCollation[i],
+			attr[i] = FunctionCall2Coll(&giststate->column_state[i].unionFn,
+										giststate->column_state[i].supportCollation,
 										PointerGetDatum(evec),
 										PointerGetDatum(&attrsize));
 
@@ -269,8 +269,8 @@ gistMakeUnionKey(GISTSTATE *giststate, int attno,
 		}
 
 		*dstisnull = false;
-		*dst = FunctionCall2Coll(&giststate->unionFn[attno],
-								 giststate->supportCollation[attno],
+		*dst = FunctionCall2Coll(&giststate->column_state[attno].unionFn,
+								 giststate->column_state[attno].supportCollation,
 								 PointerGetDatum(evec),
 								 PointerGetDatum(&dstsize));
 	}
@@ -281,8 +281,8 @@ gistKeyIsEQ(GISTSTATE *giststate, int attno, Datum a, Datum b)
 {
 	bool		result;
 
-	FunctionCall3Coll(&giststate->equalFn[attno],
-					  giststate->supportCollation[attno],
+	FunctionCall3Coll(&giststate->column_state[attno].equalFn,
+					  giststate->column_state[attno].supportCollation,
 					  a, b,
 					  PointerGetDatum(&result));
 	return result;
@@ -554,12 +554,12 @@ gistdentryinit(GISTSTATE *giststate, int nkey, GISTENTRY *e,
 		gistentryinit(*e, k, r, pg, o, l);
 
 		/* there may not be a decompress function in opclass */
-		if (!OidIsValid(giststate->decompressFn[nkey].fn_oid))
+		if (!OidIsValid(giststate->column_state[nkey].decompressFn.fn_oid))
 			return;
 
 		dep = (GISTENTRY *)
-			DatumGetPointer(FunctionCall1Coll(&giststate->decompressFn[nkey],
-											  giststate->supportCollation[nkey],
+			DatumGetPointer(FunctionCall1Coll(&giststate->column_state[nkey].decompressFn,
+											  giststate->column_state[nkey].supportCollation,
 											  PointerGetDatum(e)));
 		/* decompressFn may just return the given pointer */
 		if (dep != e)
@@ -612,10 +612,10 @@ gistCompressValues(GISTSTATE *giststate, Relation r,
 			gistentryinit(centry, attdata[i], r, NULL, (OffsetNumber) 0,
 						  isleaf);
 			/* there may not be a compress function in opclass */
-			if (OidIsValid(giststate->compressFn[i].fn_oid))
+			if (OidIsValid(giststate->column_state[i].compressFn.fn_oid))
 				cep = (GISTENTRY *)
-					DatumGetPointer(FunctionCall1Coll(&giststate->compressFn[i],
-													  giststate->supportCollation[i],
+					DatumGetPointer(FunctionCall1Coll(&giststate->column_state[i].compressFn,
+													  giststate->column_state[i].supportCollation,
 													  PointerGetDatum(&centry)));
 			else
 				cep = &centry;
@@ -650,8 +650,8 @@ gistFetchAtt(GISTSTATE *giststate, int nkey, Datum k, Relation r)
 	gistentryinit(fentry, k, r, NULL, (OffsetNumber) 0, false);
 
 	fep = (GISTENTRY *)
-		DatumGetPointer(FunctionCall1Coll(&giststate->fetchFn[nkey],
-										  giststate->supportCollation[nkey],
+		DatumGetPointer(FunctionCall1Coll(&giststate->column_state[nkey].fetchFn,
+										  giststate->column_state[nkey].supportCollation,
 										  PointerGetDatum(&fentry)));
 
 	/* fetchFn set 'key', return it to the caller */
@@ -676,14 +676,14 @@ gistFetchTuple(GISTSTATE *giststate, Relation r, IndexTuple tuple)
 
 		datum = index_getattr(tuple, i + 1, giststate->leafTupdesc, &isnull[i]);
 
-		if (giststate->fetchFn[i].fn_oid != InvalidOid)
+		if (giststate->column_state[i].fetchFn.fn_oid != InvalidOid)
 		{
 			if (!isnull[i])
 				fetchatt[i] = gistFetchAtt(giststate, i, datum, r);
 			else
 				fetchatt[i] = (Datum) 0;
 		}
-		else if (giststate->compressFn[i].fn_oid == InvalidOid)
+		else if (giststate->column_state[i].compressFn.fn_oid == InvalidOid)
 		{
 			/*
 			 * If opclass does not provide compress method that could change
@@ -726,11 +726,11 @@ gistpenalty(GISTSTATE *giststate, int attno,
 {
 	float		penalty = 0.0;
 
-	if (giststate->penaltyFn[attno].fn_strict == false ||
+	if (giststate->column_state[attno].penaltyFn.fn_strict == false ||
 		(isNullOrig == false && isNullAdd == false))
 	{
-		FunctionCall3Coll(&giststate->penaltyFn[attno],
-						  giststate->supportCollation[attno],
+		FunctionCall3Coll(&giststate->column_state[attno].penaltyFn,
+						  giststate->column_state[attno].supportCollation,
 						  PointerGetDatum(orig),
 						  PointerGetDatum(add),
 						  PointerGetDatum(&penalty));
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index 553d364e2d..7482482e42 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -59,6 +59,22 @@ typedef struct
 #define PAGE_NO_SPACE(nbp, itup) (PAGE_FREE_SPACE(nbp) < \
 										MAXALIGN(IndexTupleSize(itup)))
 
+typedef struct GIST_COL_STATE
+{
+	FmgrInfo	consistentFn;
+	FmgrInfo	unionFn;
+	FmgrInfo	compressFn;
+	FmgrInfo	decompressFn;
+	FmgrInfo	penaltyFn;
+	FmgrInfo	picksplitFn;
+	FmgrInfo	equalFn;
+	FmgrInfo	distanceFn;
+	FmgrInfo	fetchFn;
+
+	/* Collations to pass to the support functions */
+	Oid			supportCollation;
+} GIST_COL_STATE;
+
 /*
  * GISTSTATE: information needed for any GiST index operation
  *
@@ -83,18 +99,7 @@ typedef struct GISTSTATE
 	TupleDesc	fetchTupdesc;	/* tuple descriptor for tuples returned in an
 								 * index-only scan */
 
-	FmgrInfo	consistentFn[INDEX_MAX_KEYS];
-	FmgrInfo	unionFn[INDEX_MAX_KEYS];
-	FmgrInfo	compressFn[INDEX_MAX_KEYS];
-	FmgrInfo	decompressFn[INDEX_MAX_KEYS];
-	FmgrInfo	penaltyFn[INDEX_MAX_KEYS];
-	FmgrInfo	picksplitFn[INDEX_MAX_KEYS];
-	FmgrInfo	equalFn[INDEX_MAX_KEYS];
-	FmgrInfo	distanceFn[INDEX_MAX_KEYS];
-	FmgrInfo	fetchFn[INDEX_MAX_KEYS];
-
-	/* Collations to pass to the support functions */
-	Oid			supportCollation[INDEX_MAX_KEYS];
+	GIST_COL_STATE column_state[FLEXIBLE_ARRAY_MEMBER];
 } GISTSTATE;
 
 
-- 
2.30.2

#4Zhihong Yu
zyu@yugabyte.com
In reply to: Andreas Karlsson (#3)
Re: GISTSTATE is too large

On Sun, May 30, 2021 at 6:14 AM Andreas Karlsson <andreas@proxel.se> wrote:

On 4/26/21 12:20 AM, Andres Freund wrote:

It seems pretty clear that this should be changed to be something more
like

[...]

with initGISTstate allocating based on
IndexRelationGetNumberOfKeyAttributes() instead of using a constant.

And then subsequently change GIST_COL_STATE to embed the
FunctionCallInfo, rather than initializiing them on the stack for every
call.

I'm not planning on doing this work, but I thought it's sensible to send
to the list anyway.

I did the first part since it seemed easy enough and an obvious win for
all workloads.

Andreas

Hi,
Minor comment:

+   /* Collations to pass to the support functions */
+   Oid         supportCollation;

Collations -> Collation
The field used to be an array. Now it is one Oid.

Cheers

#5Andres Freund
andres@anarazel.de
In reply to: Andreas Karlsson (#3)
Re: GISTSTATE is too large

Hi,

On 2021-05-30 15:14:33 +0200, Andreas Karlsson wrote:

I did the first part since it seemed easy enough and an obvious win for all
workloads.

Cool!

+typedef struct GIST_COL_STATE
+{
+	FmgrInfo	consistentFn;
+	FmgrInfo	unionFn;
+	FmgrInfo	compressFn;
+	FmgrInfo	decompressFn;
+	FmgrInfo	penaltyFn;
+	FmgrInfo	picksplitFn;
+	FmgrInfo	equalFn;
+	FmgrInfo	distanceFn;
+	FmgrInfo	fetchFn;
+
+	/* Collations to pass to the support functions */
+	Oid			supportCollation;
+} GIST_COL_STATE;
+
/*
* GISTSTATE: information needed for any GiST index operation
*
@@ -83,18 +99,7 @@ typedef struct GISTSTATE
TupleDesc	fetchTupdesc;	/* tuple descriptor for tuples returned in an
* index-only scan */
-	FmgrInfo	consistentFn[INDEX_MAX_KEYS];
-	FmgrInfo	unionFn[INDEX_MAX_KEYS];
-	FmgrInfo	compressFn[INDEX_MAX_KEYS];
-	FmgrInfo	decompressFn[INDEX_MAX_KEYS];
-	FmgrInfo	penaltyFn[INDEX_MAX_KEYS];
-	FmgrInfo	picksplitFn[INDEX_MAX_KEYS];
-	FmgrInfo	equalFn[INDEX_MAX_KEYS];
-	FmgrInfo	distanceFn[INDEX_MAX_KEYS];
-	FmgrInfo	fetchFn[INDEX_MAX_KEYS];
-
-	/* Collations to pass to the support functions */
-	Oid			supportCollation[INDEX_MAX_KEYS];
+	GIST_COL_STATE column_state[FLEXIBLE_ARRAY_MEMBER];
} GISTSTATE;

This makes me wonder if the better design would be to keep the layout of
dense arrays for each type of function, but to make it more dense by
allocating dynamically. As above GIST_COL_STATE is 436 bytes (I think),
i.e. *well* above a cache line - far enough apart that accessing
different column's equalFn or such will be hard for the hardware
prefetcher to understand. Presumably not all functions are accessed all
the time.

So we could lay it out as

struct GISTSTATE
{
...
FmgrInfo *consistentFn;
FmgrInfo *unionFn;
...
}
[ncolumns consistentFns follow]
[ncolumns unionFn's follow]

Which'd likely end up with better cache locality for gist indexes with a
few columns. At the expense of a pointer indirection, of course.

Another angle: I don't know how it is for GIST, but for btree, the
FunctionCall2Coll() etc overhead shows up significantly - directly
allocating the FunctionCallInfo and initializing it once, instead of
every call, reduces overhead noticeably (but is a bit complicated to
implement, due to the insertion scan and stuff). I'd be surprised if we
didn't get better performance for gist if it had initialized-once
FunctionCallInfos intead of the FmgrInfos.

And that's not even just true because of needing to re-initialize
FunctionCallInfo on every call, but also because the function call
itself rarely accesses the data from the FmgrInfo, but always accesses
the FunctionCallInfo. And a FunctionCallInfos with 1 argument is the
same size as a FmgrInfo, with 2 it's 16 bytes more. So storing the
once-initialized FunctionCallInfo results in considerably better
locality, by not having all the FmgrInfos in cache.

One annoying bit: Right now it's not trivial to declare arrays of
specific-number-of-arguments FunctionCallInfos. See the uglyness of
LOCAL_FCINFO. I think we should consider having a bunch of typedefs for
1..3 argument FunctionCallInfo structs to make that easier. Probably
would still need union trickery, but it'd not be *too* bad I think.

Greetings,

Andres Freund