Reduce build times of pg_trgm GIN indexes

Started by David Geier7 days ago7 messages

geidav.pg@gmail.com

7 days ago

9 attachment(s)

Hi hackers,

Attached is a series of patches which gradually reduce the time it takes
to create GIN indexes. Most of the gains come from optimizing the
trigram extraction code in pg_trgm. A few small optimizations apply to
any GIN index operator class.

The changes are motivated by the time it takes to create GIN indexes on
large production tables, especially, on columns with long strings. Even
with multiple parallel maintenance workers I've seen this taking hours.

For testing purposes I've used two different data sets:

1. The l_comment column of the TPC-H SF 10 lineitem table. l_comment
contains relatively short strings with a minimum of 10, a maximum of 43
and an average of ~27 characters.

2. The plots from a collection of movies from Wikipedia. The plots are
much longer than l_comment, with a minimum of 15, a maximum of 36,773
and an average of ~2,165 characters. The CSV file can be downloaded here
[1]: https://github.com/kiq005/movie-recommendation/raw/refs/heads/master/src/dataset/wiki_movie_plots_deduped.csv

Testing both cases is important because a big part of the trigram
extraction is spent on removing duplicates. The longer the string, the
more duplicates are usually encountered.

The script I used for testing is attached. I ran CREATE INDEX three
times and took the fastest run. I'm getting the following results on my
i9-13950HX dev laptop in release build:

Data set | Patched (ms) | Master (ms) | Speedup
--------------------|--------------|--------------|----------
movies(plot) | 3,409 | 10,311 | 3.02x
lineitem(l_comment) | 161,569 | 256,986 | 1.59x

The attached patches do the following:

- v1-0001-Inline-ginCompareAttEntries.patch: Inline
ginCompareAttEntries() which is very frequently called by the GIN code.

- v1-0002-Optimized-comparison-functions.patch: Use FunctionCallInvoke()
instead of FunctionCall2Coll(). This saves a bunch of per-comparison
setup code, such as calling InitFunctionCallInfoData().

- v1-0003-Use-sort_template.h.patch: Use sort_template.h instead of
qsort(), to inline calls to the sort comparator. This is an interim step
that is further improved on by patch 0006.

- v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch
ginExtractEntries() deduplicates and sorts the entries returned from the
extract value function. In case of pg_trgm, that is completely redundant
because the trigrams are already deduplicated and sorted. The current
version of this patch is just to demonstrate the potential. We need to
think about what we want here. Ideally, we would require the extraction
function to provide the entries deduplicated and sorted. Alternatively,
we could indicate to ginExtractEntries() if the entries are already
deduplicated and sorted. If we don't want to alter the signature of the
extract value function, we could e.g. use the MSB of the nentries argument.

- v1-0005-Make-btint4cmp-branchless.patch: Removes branches from
btint4cmp(), which is heavily called from the GIN code. This might as
well have benefit in other parts of the code base.

v1-0006-Use-radix-sort.patch: Replace the sort_template.h-based qsort()
with radix sort. For the purpose of demonstrating the possible gains,
I've only replaced the signed variant for now. I've also tried using
simplehash.h for deduplicating followed by a sort_template.h-based sort.
But that was slower.

v1-0007-Faster-qunique-comparator.patch: qunique() doesn't require a
full sort comparator (-1 = less, 0 = equal, 1 = greater) but only a
equal/unequal comparator (e.g. 0 = equal and 1 = unequal). The same
optimization can be done in plenty of other places in our code base.
Likely, in most of them the gains are insignificant.

v1-0008-Add-ASCII-fastpath-to-generate_trgm_only.patch: Typically lots
of text is actually ASCII. Hence, we provide a fast path for this case
which is exercised if the MSB of the current character is unset.

With above changes, the majority of the runtime is now spent on
inserting the trigrams into the GIN index via ginInsertBAEntry(). The
code in master uses a red-black for further deduplication and sorting.
Traversing the red-black tree and updating it is pretty slow. I haven't
looked through all the code yet, but it seems to me that we would be
better off replacing the red-black tree with a sort and/or a hash map.
But I'll leave this as future work for now.

[1]: https://github.com/kiq005/movie-recommendation/raw/refs/heads/master/src/dataset/wiki_movie_plots_deduped.csv
https://github.com/kiq005/movie-recommendation/raw/refs/heads/master/src/dataset/wiki_movie_plots_deduped.csv

--
David Geier

Attachments:

v1-0008-Add-ASCII-fastpath-to-generate_trgm_only.patchtext/x-patch; charset=UTF-8; name=v1-0008-Add-ASCII-fastpath-to-generate_trgm_only.patchDownload

From fc7bf3bd9c1bdf1336c233a796079556bb1dbf9d Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Fri, 14 Nov 2025 11:37:40 +0100
Subject: [PATCH v1 8/8] Add ASCII fastpath to generate_trgm_only()

---
 contrib/pg_trgm/trgm_op.c | 124 ++++++++++++++++++++------------------
 1 file changed, 65 insertions(+), 59 deletions(-)

diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index e07c553b1bd..42343bd1442 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -226,32 +226,6 @@ show_limit(PG_FUNCTION_ARGS)
 	PG_RETURN_FLOAT4(similarity_threshold);
 }
 
-/*
- * Finds first word in string, returns pointer to the word,
- * endword points to the character after word
- */
-static char *
-find_word(char *str, int lenstr, char **endword, int *charlen)
-{
-	char	   *beginword = str;
-
-	while (beginword - str < lenstr && !ISWORDCHR(beginword))
-		beginword += pg_mblen(beginword);
-
-	if (beginword - str >= lenstr)
-		return NULL;
-
-	*endword = beginword;
-	*charlen = 0;
-	while (*endword - str < lenstr && ISWORDCHR(*endword))
-	{
-		*endword += pg_mblen(*endword);
-		(*charlen)++;
-	}
-
-	return beginword;
-}
-
 /*
  * Reduce a trigram (three possibly multi-byte characters) to a trgm,
  * which is always exactly three bytes.  If we have three single-byte
@@ -337,58 +311,90 @@ make_trigrams(trgm *tptr, char *str, int bytelen, int charlen)
 static int
 generate_trgm_only(trgm *trg, char *str, int slen, TrgmBound *bounds)
 {
-	trgm	   *tptr;
-	char	   *buf;
-	int			charlen,
-				bytelen;
-	char	   *bword,
-			   *eword;
+	trgm *tptr = trg;
+	char *buf;
 
 	if (slen + LPADDING + RPADDING < 3 || slen == 0)
 		return 0;
 
-	tptr = trg;
+	buf = palloc_array(char, slen * pg_database_encoding_max_length() + 4 + 1);
+	memset(buf, ' ', LPADDING);
 
-	/* Allocate a buffer for case-folded, blank-padded words */
-	buf = (char *) palloc(slen * pg_database_encoding_max_length() + 4);
-
-	if (LPADDING > 0)
+	for (int i = 0; i < slen; )
 	{
-		*buf = ' ';
-		if (LPADDING > 1)
-			*(buf + 1) = ' ';
-	}
+		int num_bytes = LPADDING;
+		int num_chars = LPADDING;
+		char *word;
 
-	eword = str;
-	while ((bword = find_word(eword, slen - (eword - str), &eword, &charlen)) != NULL)
-	{
+		/* Extract next word */
+		while (i < slen)
+		{
+			if ((str[i] & 0x80) == 0) /* Fast path for ASCII-only */
+			{
+				if (isalnum(str[i]))
+				{
 #ifdef IGNORECASE
-		bword = str_tolower(bword, eword - bword, DEFAULT_COLLATION_OID);
-		bytelen = strlen(bword);
+					buf[num_bytes++] = pg_ascii_tolower(str[i++]);
 #else
-		bytelen = eword - bword;
+					buf[num_bytes++] = str[i++];
 #endif
+				}
+				else
+				{
+					i++;
+					break;
+				}
+			}
+			else 
+			{
+				const int mblen = pg_mblen(str + i);
+				Assert(mblen >= 2); /* Otherwise, it would be ASCII */
+
+				if (ISWORDCHR(str + i))
+				{
+					memcpy(buf + num_bytes, str + i, mblen);
+					num_bytes += mblen;
+					i += mblen;
+				}
+				else
+				{
+					i += mblen;
+					break;
+				}
+			}
+
+			num_chars++;
+		}
 
-		memcpy(buf + LPADDING, bword, bytelen);
+		if (num_chars > LPADDING)
+		{
+			memset(buf + num_bytes, ' ', RPADDING);
+			num_bytes += RPADDING;
+			num_chars += RPADDING;
+			word = buf;
 
 #ifdef IGNORECASE
-		pfree(bword);
+			if (num_chars != num_bytes)
+			{
+				word = str_tolower(buf, num_bytes, DEFAULT_COLLATION_OID);
+				num_bytes = strlen(word); /* String can get shorter from lower-casing */
+			}
 #endif
 
-		buf[LPADDING + bytelen] = ' ';
-		buf[LPADDING + bytelen + 1] = ' ';
+			if (bounds)
+				bounds[tptr - trg] |= TRGM_BOUND_LEFT;
+
+			tptr = make_trigrams(tptr, word, num_bytes, num_chars);
+
+			if (bounds)
+				bounds[tptr - trg - 1] |= TRGM_BOUND_RIGHT;
 
-		/* Calculate trigrams marking their bounds if needed */
-		if (bounds)
-			bounds[tptr - trg] |= TRGM_BOUND_LEFT;
-		tptr = make_trigrams(tptr, buf, bytelen + LPADDING + RPADDING,
-							 charlen + LPADDING + RPADDING);
-		if (bounds)
-			bounds[tptr - trg - 1] |= TRGM_BOUND_RIGHT;
+			if (word != buf)
+				pfree(word);
+		}
 	}
 
 	pfree(buf);
-
 	return tptr - trg;
 }
 
-- 
2.51.0

v1-0007-Faster-qunique-comparator.patchtext/x-patch; charset=UTF-8; name=v1-0007-Faster-qunique-comparator.patchDownload

From 4d101725a6101d723858d5b7561ff1cef123f671 Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Wed, 12 Nov 2025 14:27:13 +0100
Subject: [PATCH v1 7/8] Faster qunique() comparator

---
 contrib/pg_trgm/trgm_op.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 25ee049f352..e07c553b1bd 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -139,6 +139,14 @@ CMPTRGM_UNSIGNED(const void *a, const void *b)
 		   : CMPPCHAR_UNS(a, b, 2));
 }
 
+static inline int
+CMPTRGM_EQ(const void *a, const void *b)
+{
+	char *aa = (char *)a;
+	char *bb = (char *)b;
+	return aa[0] != bb[0] || aa[1] != bb[1] || aa[2] != bb[2] ? 1 : 0;
+}
+
 /*
  * This gets called on the first call. It replaces the function pointer so
  * that subsequent calls are routed directly to the chosen implementation.
@@ -482,15 +490,11 @@ generate_trgm(char *str, int slen)
 	if (len > 1)
 	{
 		if (GetDefaultCharSignedness())
-		{
 			radix_sort_trigrams_signed((trgm *)GETARR(trg), len);
-			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
-		}
 		else
-		{
 			trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
-			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED);
-		}
+
+		len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_EQ);
 	}
 
 	SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
@@ -1038,15 +1042,11 @@ generate_wildcard_trgm(const char *str, int slen)
 	if (len > 1)
 	{
 		if (GetDefaultCharSignedness())
-		{
 			radix_sort_trigrams_signed((trgm *)GETARR(trg), len);
-			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
-		}
 		else
-		{
 			trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
-			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED);
-		}
+
+		len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_EQ);
 	}
 
 	SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
-- 
2.51.0

v1-0006-Use-radix-sort.patchtext/x-patch; charset=UTF-8; name=v1-0006-Use-radix-sort.patchDownload

From 6148b4bbc702eb4ce89994f3aa4790def50ed0c5 Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Tue, 11 Nov 2025 13:18:59 +0100
Subject: [PATCH v1 6/8] Use radix sort

---
 contrib/pg_trgm/trgm_op.c | 64 +++++++++++++++++++++++++++++++++------
 1 file changed, 54 insertions(+), 10 deletions(-)

diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 875065a4670..25ee049f352 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -155,14 +155,6 @@ CMPTRGM_CHOOSE(const void *a, const void *b)
 }
 
 /* Define our specialized sort function name */
-#define ST_SORT trigram_qsort_signed
-#define ST_ELEMENT_TYPE_VOID
-#define ST_COMPARE(a, b) CMPTRGM_SIGNED(a, b)
-#define ST_SCOPE static
-#define ST_DEFINE
-#define ST_DECLARE
-#include "lib/sort_template.h"
-
 #define ST_SORT trigram_qsort_unsigned
 #define ST_ELEMENT_TYPE_VOID
 #define ST_COMPARE(a, b) CMPTRGM_UNSIGNED(a, b)
@@ -392,6 +384,58 @@ generate_trgm_only(trgm *trg, char *str, int slen, TrgmBound *bounds)
 	return tptr - trg;
 }
 
+/*
+ * Needed to properly handle negative numbers in case char is signed.
+ */
+static inline unsigned char FlipSign(char x)
+{
+	return x^0x80;
+}
+
+static void radix_sort_trigrams_signed(trgm *trg, int count)
+{
+	trgm *buffer = palloc_array(trgm, count);
+	trgm *starts[256];
+	trgm *from = trg;
+	trgm *to = buffer;
+	int freqs[3][256];
+
+	/*
+	 * Compute frequencies to partition the buffer.
+	 */
+	memset(freqs, 0, sizeof(freqs));
+
+	for (int i=0; i<count; i++)
+		for (int j=0; j<3; j++)
+			freqs[j][FlipSign(trg[i][j])]++;
+
+	/*
+	 * Do the sorting. Start with last character because that's the is "LSB"
+	 * in a trigram. Avoid unnecessary copies by ping-ponging between the buffers.
+	 */
+	for (int i=2; i>=0; i--)
+	{
+		trgm *old_from = from;
+		trgm *next = to;
+
+		for (int j=0; j<256; j++)
+		{
+			starts[j] = next;
+			next += freqs[i][j];
+		}
+
+		for (int j=0; j<count; j++)
+			memcpy(starts[FlipSign(from[j][i])]++, from[j], sizeof(trgm));
+
+		from = to;
+		to = old_from;
+	}
+
+	Assert(to == buffer);
+	memcpy(trg, buffer, sizeof(trgm) * count);
+	pfree(buffer);
+}
+
 /*
  * Guard against possible overflow in the palloc requests below.  (We
  * don't worry about the additive constants, since palloc can detect
@@ -439,7 +483,7 @@ generate_trgm(char *str, int slen)
 	{
 		if (GetDefaultCharSignedness())
 		{
-			trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
+			radix_sort_trigrams_signed((trgm *)GETARR(trg), len);
 			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
 		}
 		else
@@ -995,7 +1039,7 @@ generate_wildcard_trgm(const char *str, int slen)
 	{
 		if (GetDefaultCharSignedness())
 		{
-			trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
+			radix_sort_trigrams_signed((trgm *)GETARR(trg), len);
 			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
 		}
 		else
-- 
2.51.0

v1-0005-Make-btint4cmp-branchless.patchtext/x-patch; charset=UTF-8; name=v1-0005-Make-btint4cmp-branchless.patchDownload

From 915bd326de5ebde4149093d52fff9c710e557de2 Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Mon, 10 Nov 2025 15:40:11 +0100
Subject: [PATCH v1 5/8] Make btint4cmp() branchless

---
 src/backend/access/nbtree/nbtcompare.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/nbtree/nbtcompare.c b/src/backend/access/nbtree/nbtcompare.c
index bffc4b7709c..5ae27c22621 100644
--- a/src/backend/access/nbtree/nbtcompare.c
+++ b/src/backend/access/nbtree/nbtcompare.c
@@ -60,6 +60,7 @@
 #include "utils/fmgrprotos.h"
 #include "utils/skipsupport.h"
 #include "utils/sortsupport.h"
+#include "common/int.h"
 
 #ifdef STRESS_SORT_INT_MIN
 #define A_LESS_THAN_B		INT_MIN
@@ -202,12 +203,7 @@ btint4cmp(PG_FUNCTION_ARGS)
 	int32		a = PG_GETARG_INT32(0);
 	int32		b = PG_GETARG_INT32(1);
 
-	if (a > b)
-		PG_RETURN_INT32(A_GREATER_THAN_B);
-	else if (a == b)
-		PG_RETURN_INT32(0);
-	else
-		PG_RETURN_INT32(A_LESS_THAN_B);
+	PG_RETURN_INT32(pg_cmp_s32(a, b));
 }
 
 Datum
-- 
2.51.0

v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patchtext/x-patch; charset=UTF-8; name=v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patchDownload

From e91198730d10d301857b1ef38670f506fc0749ca Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Mon, 10 Nov 2025 14:40:37 +0100
Subject: [PATCH v1 4/8] Avoid dedup and sort in ginExtractEntries

---
 src/backend/access/gin/ginutil.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index f4139effd6e..9187264dbdc 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -498,13 +498,6 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 		return entries;
 	}
 
-	/*
-	 * If the extractValueFn didn't create a nullFlags array, create one,
-	 * assuming that everything's non-null.
-	 */
-	if (nullFlags == NULL)
-		nullFlags = (bool *) palloc0(*nentries * sizeof(bool));
-
 	/*
 	 * If there's more than one key, sort and unique-ify.
 	 *
@@ -512,8 +505,8 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 	 * pretty bad too.  For small numbers of keys it'd likely be better to use
 	 * a simple insertion sort.
 	 */
-	if (*nentries > 1)
-	{
+	if (*nentries > 1 && nullFlags != NULL)
+	{	
 		keyEntryData *keydata;
 		cmpEntriesArg arg;
 
@@ -564,9 +557,13 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 	/*
 	 * Create GinNullCategory representation from nullFlags.
 	 */
-	*categories = (GinNullCategory *) palloc0(*nentries * sizeof(GinNullCategory));
-	for (i = 0; i < *nentries; i++)
-		(*categories)[i] = (nullFlags[i] ? GIN_CAT_NULL_KEY : GIN_CAT_NORM_KEY);
+	StaticAssertStmt(GIN_CAT_NORM_KEY == 0, "Assuming GIN_CAT_NORM_KEY is 0");
+	*categories = palloc0_array(GinNullCategory, *nentries);
+
+	if (nullFlags != NULL)
+		for (i = 0; i < *nentries; i++)
+			if (nullFlags[i])
+				(*categories)[i] = GIN_CAT_NULL_KEY;
 
 	return entries;
 }
-- 
2.51.0

v1-0003-Use-sort_template.h.patchtext/x-patch; charset=UTF-8; name=v1-0003-Use-sort_template.h.patchDownload

From 5a5b3b955fffac4baf2ea1367914ce87a6f8bf5a Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Mon, 10 Nov 2025 13:35:11 +0100
Subject: [PATCH v1 3/8] Use sort_template.h

---
 contrib/pg_trgm/trgm_op.c | 49 ++++++++++++++++++++++++++++++---------
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 81182a15e07..875065a4670 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -154,6 +154,23 @@ CMPTRGM_CHOOSE(const void *a, const void *b)
 	return CMPTRGM(a, b);
 }
 
+/* Define our specialized sort function name */
+#define ST_SORT trigram_qsort_signed
+#define ST_ELEMENT_TYPE_VOID
+#define ST_COMPARE(a, b) CMPTRGM_SIGNED(a, b)
+#define ST_SCOPE static
+#define ST_DEFINE
+#define ST_DECLARE
+#include "lib/sort_template.h"
+
+#define ST_SORT trigram_qsort_unsigned
+#define ST_ELEMENT_TYPE_VOID
+#define ST_COMPARE(a, b) CMPTRGM_UNSIGNED(a, b)
+#define ST_SCOPE static
+#define ST_DEFINE
+#define ST_DECLARE
+#include "lib/sort_template.h"
+
 /*
  * Deprecated function.
  * Use "pg_trgm.similarity_threshold" GUC variable instead of this function.
@@ -209,12 +226,6 @@ show_limit(PG_FUNCTION_ARGS)
 	PG_RETURN_FLOAT4(similarity_threshold);
 }
 
-static int
-comp_trgm(const void *a, const void *b)
-{
-	return CMPTRGM(a, b);
-}
-
 /*
  * Finds first word in string, returns pointer to the word,
  * endword points to the character after word
@@ -426,12 +437,20 @@ generate_trgm(char *str, int slen)
 	 */
 	if (len > 1)
 	{
-		qsort(GETARR(trg), len, sizeof(trgm), comp_trgm);
-		len = qunique(GETARR(trg), len, sizeof(trgm), comp_trgm);
+		if (GetDefaultCharSignedness())
+		{
+			trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
+			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
+		}
+		else
+		{
+			trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
+			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED);
+		}
 	}
 
 	SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
-
+ 
 	return trg;
 }
 
@@ -974,8 +993,16 @@ generate_wildcard_trgm(const char *str, int slen)
 	 */
 	if (len > 1)
 	{
-		qsort(GETARR(trg), len, sizeof(trgm), comp_trgm);
-		len = qunique(GETARR(trg), len, sizeof(trgm), comp_trgm);
+		if (GetDefaultCharSignedness())
+		{
+			trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
+			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
+		}
+		else
+		{
+			trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
+			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED);
+		}
 	}
 
 	SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
-- 
2.51.0

v1-0002-Optimized-comparison-functions.patchtext/x-patch; charset=UTF-8; name=v1-0002-Optimized-comparison-functions.patchDownload

From eb7fe3763b31d30d48ebbd4934085fe70315249a Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Mon, 10 Nov 2025 13:35:01 +0100
Subject: [PATCH v1 2/8] Optimized comparison functions

---
 src/backend/access/gin/ginutil.c | 19 ++++++++++++-------
 src/include/access/gin_private.h | 19 +++++++++++++++----
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index d205093e21d..f4139effd6e 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -114,6 +114,7 @@ initGinState(GinState *state, Relation index)
 	for (i = 0; i < origTupdesc->natts; i++)
 	{
 		Form_pg_attribute attr = TupleDescAttr(origTupdesc, i);
+		FunctionCallInfoBaseData *fci  = &state->compareFnCallInfo[i].fcinfo;
 
 		if (state->oneCol)
 			state->tupdesc[i] = state->origTupdesc;
@@ -222,6 +223,10 @@ initGinState(GinState *state, Relation index)
 			state->supportCollation[i] = index->rd_indcollation[i];
 		else
 			state->supportCollation[i] = DEFAULT_COLLATION_OID;
+
+		InitFunctionCallInfoData(*fci, &state->compareFn[i], 2, state->supportCollation[i], NULL, NULL);
+		fci->args[0].isnull = false;
+		fci->args[1].isnull = false;
 	}
 }
 
@@ -402,8 +407,7 @@ typedef struct
 
 typedef struct
 {
-	FmgrInfo   *cmpDatumFunc;
-	Oid			collation;
+	FunctionCallInfoBaseData   *cmpFuncInfo;
 	bool		haveDups;
 } cmpEntriesArg;
 
@@ -425,9 +429,11 @@ cmpEntries(const void *a, const void *b, void *arg)
 	else if (bb->isnull)
 		res = -1;				/* not-NULL "<" NULL */
 	else
-		res = DatumGetInt32(FunctionCall2Coll(data->cmpDatumFunc,
-											  data->collation,
-											  aa->datum, bb->datum));
+	{
+		data->cmpFuncInfo->args[0].value = aa->datum;
+		data->cmpFuncInfo->args[1].value = bb->datum;
+		res = DatumGetInt32(FunctionCallInvoke(data->cmpFuncInfo));
+	}
 
 	/*
 	 * Detect if we have any duplicates.  If there are equal keys, qsort must
@@ -518,8 +524,7 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 			keydata[i].isnull = nullFlags[i];
 		}
 
-		arg.cmpDatumFunc = &ginstate->compareFn[attnum - 1];
-		arg.collation = ginstate->supportCollation[attnum - 1];
+		arg.cmpFuncInfo = &ginstate->compareFnCallInfo[attnum - 1].fcinfo;
 		arg.haveDups = false;
 		qsort_arg(keydata, *nentries, sizeof(keyEntryData),
 				  cmpEntries, &arg);
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index e155045ce8a..7cf19c8a5dc 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -51,6 +51,11 @@ typedef struct GinOptions
 #define GIN_SHARE	BUFFER_LOCK_SHARE
 #define GIN_EXCLUSIVE  BUFFER_LOCK_EXCLUSIVE
 
+typedef union CompareFuncCallInfoData
+{
+	FunctionCallInfoBaseData fcinfo;
+	char fcinfo_data[SizeForFunctionCallInfo(2)];
+} CompareFuncCallInfoData;
 
 /*
  * GinState: working data structure describing the index being worked on
@@ -77,6 +82,10 @@ typedef struct GinState
 	/*
 	 * Per-index-column opclass support functions
 	 */
+
+
+	CompareFuncCallInfoData compareFnCallInfo[INDEX_MAX_KEYS];
+
 	FmgrInfo	compareFn[INDEX_MAX_KEYS];
 	FmgrInfo	extractValueFn[INDEX_MAX_KEYS];
 	FmgrInfo	extractQueryFn[INDEX_MAX_KEYS];
@@ -504,6 +513,8 @@ ginCompareEntries(GinState *ginstate, OffsetNumber attnum,
 				  Datum a, GinNullCategory categorya,
 				  Datum b, GinNullCategory categoryb)
 {
+	FunctionCallInfoBaseData *fci;
+
 	/* if not of same null category, sort by that first */
 	if (categorya != categoryb)
 		return (categorya < categoryb) ? -1 : 1;
@@ -512,10 +523,10 @@ ginCompareEntries(GinState *ginstate, OffsetNumber attnum,
 	if (categorya != GIN_CAT_NORM_KEY)
 		return 0;
 
-	/* both not null, so safe to call the compareFn */
-	return DatumGetInt32(FunctionCall2Coll(&ginstate->compareFn[attnum - 1],
-										   ginstate->supportCollation[attnum - 1],
-										   a, b));
+	fci = &ginstate->compareFnCallInfo[attnum - 1].fcinfo;
+	fci->args[0].value = a;
+	fci->args[1].value = b;
+	return DatumGetInt32(FunctionCallInvoke(fci));
 }
 
 /*
-- 
2.51.0

v1-0001-Inline-ginCompareAttEntries.patchtext/x-patch; charset=UTF-8; name=v1-0001-Inline-ginCompareAttEntries.patchDownload

From a484e7c69bec62474b041eb4e533601e4883dab3 Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Thu, 6 Nov 2025 09:42:27 +0100
Subject: [PATCH v1 1/8] Inline ginCompareAttEntries

---
 src/backend/access/gin/ginutil.c | 38 ---------------------------
 src/include/access/gin_private.h | 44 +++++++++++++++++++++++++++-----
 2 files changed, 38 insertions(+), 44 deletions(-)

diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a546cac18d3..d205093e21d 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -387,44 +387,6 @@ GinInitMetabuffer(Buffer b)
 		((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
 }
 
-/*
- * Compare two keys of the same index column
- */
-int
-ginCompareEntries(GinState *ginstate, OffsetNumber attnum,
-				  Datum a, GinNullCategory categorya,
-				  Datum b, GinNullCategory categoryb)
-{
-	/* if not of same null category, sort by that first */
-	if (categorya != categoryb)
-		return (categorya < categoryb) ? -1 : 1;
-
-	/* all null items in same category are equal */
-	if (categorya != GIN_CAT_NORM_KEY)
-		return 0;
-
-	/* both not null, so safe to call the compareFn */
-	return DatumGetInt32(FunctionCall2Coll(&ginstate->compareFn[attnum - 1],
-										   ginstate->supportCollation[attnum - 1],
-										   a, b));
-}
-
-/*
- * Compare two keys of possibly different index columns
- */
-int
-ginCompareAttEntries(GinState *ginstate,
-					 OffsetNumber attnuma, Datum a, GinNullCategory categorya,
-					 OffsetNumber attnumb, Datum b, GinNullCategory categoryb)
-{
-	/* attribute number is the first sort key */
-	if (attnuma != attnumb)
-		return (attnuma < attnumb) ? -1 : 1;
-
-	return ginCompareEntries(ginstate, attnuma, a, categorya, b, categoryb);
-}
-
-
 /*
  * Support for sorting key datums in ginExtractEntries
  *
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index b33f7cec5b4..e155045ce8a 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -97,12 +97,6 @@ extern Buffer GinNewBuffer(Relation index);
 extern void GinInitBuffer(Buffer b, uint32 f);
 extern void GinInitPage(Page page, uint32 f, Size pageSize);
 extern void GinInitMetabuffer(Buffer b);
-extern int	ginCompareEntries(GinState *ginstate, OffsetNumber attnum,
-							  Datum a, GinNullCategory categorya,
-							  Datum b, GinNullCategory categoryb);
-extern int	ginCompareAttEntries(GinState *ginstate,
-								 OffsetNumber attnuma, Datum a, GinNullCategory categorya,
-								 OffsetNumber attnumb, Datum b, GinNullCategory categoryb);
 extern Datum *ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 								Datum value, bool isNull,
 								int32 *nentries, GinNullCategory **categories);
@@ -502,6 +496,44 @@ ginCompareItemPointers(ItemPointer a, ItemPointer b)
 	return pg_cmp_u64(ia, ib);
 }
 
+/*
+ * Compare two keys of the same index column
+ */
+static inline int
+ginCompareEntries(GinState *ginstate, OffsetNumber attnum,
+				  Datum a, GinNullCategory categorya,
+				  Datum b, GinNullCategory categoryb)
+{
+	/* if not of same null category, sort by that first */
+	if (categorya != categoryb)
+		return (categorya < categoryb) ? -1 : 1;
+
+	/* all null items in same category are equal */
+	if (categorya != GIN_CAT_NORM_KEY)
+		return 0;
+
+	/* both not null, so safe to call the compareFn */
+	return DatumGetInt32(FunctionCall2Coll(&ginstate->compareFn[attnum - 1],
+										   ginstate->supportCollation[attnum - 1],
+										   a, b));
+}
+
+/*
+ * Compare two keys of possibly different index columns
+ */
+static inline int
+ginCompareAttEntries(GinState *ginstate,
+					 OffsetNumber attnuma, Datum a, GinNullCategory categorya,
+					 OffsetNumber attnumb, Datum b, GinNullCategory categoryb)
+{
+	/* attribute number is the first sort key */
+	if (attnuma != attnumb)
+		return (attnuma < attnumb) ? -1 : 1;
+
+	return ginCompareEntries(ginstate, attnuma, a, categorya, b, categoryb);
+
+}
+
 extern int	ginTraverseLock(Buffer buffer, bool searchMode);
 
 #endif							/* GIN_PRIVATE_H */
-- 
2.51.0

test_gin_optimizations.sqlapplication/sql; name=test_gin_optimizations.sqlDownload

Heikki Linnakangas

hlinnaka@iki.fi

5 days ago

In reply to: David Geier (#1)

Re: Reduce build times of pg_trgm GIN indexes

On 05/01/2026 17:01, David Geier wrote:

The script I used for testing is attached. I ran CREATE INDEX three
times and took the fastest run. I'm getting the following results on my
i9-13950HX dev laptop in release build:

Data set | Patched (ms) | Master (ms) | Speedup
--------------------|--------------|--------------|----------
movies(plot) | 3,409 | 10,311 | 3.02x
lineitem(l_comment) | 161,569 | 256,986 | 1.59x

Impressive speedup!

The attached patches do the following:

- v1-0001-Inline-ginCompareAttEntries.patch: Inline
ginCompareAttEntries() which is very frequently called by the GIN code.

Looks good.

- v1-0002-Optimized-comparison-functions.patch: Use FunctionCallInvoke()
instead of FunctionCall2Coll(). This saves a bunch of per-comparison
setup code, such as calling InitFunctionCallInfoData().

You lose the check for NULL result with this. That's probably still
worth checking.

- v1-0003-Use-sort_template.h.patch: Use sort_template.h instead of
qsort(), to inline calls to the sort comparator. This is an interim step
that is further improved on by patch 0006.

- v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch
ginExtractEntries() deduplicates and sorts the entries returned from the
extract value function. In case of pg_trgm, that is completely redundant
because the trigrams are already deduplicated and sorted. The current
version of this patch is just to demonstrate the potential. We need to
think about what we want here. Ideally, we would require the extraction
function to provide the entries deduplicated and sorted. Alternatively,
we could indicate to ginExtractEntries() if the entries are already
deduplicated and sorted. If we don't want to alter the signature of the
extract value function, we could e.g. use the MSB of the nentries argument.

Yeah, this seems wrong as it is. You're assuming that if the extract
function returns nullFlags == NULL, the array is already sorted and deduped.

- v1-0005-Make-btint4cmp-branchless.patch: Removes branches from
btint4cmp(), which is heavily called from the GIN code. This might as
well have benefit in other parts of the code base.

Seems reasonable.

v1-0006-Use-radix-sort.patch: Replace the sort_template.h-based qsort()
with radix sort. For the purpose of demonstrating the possible gains,
I've only replaced the signed variant for now. I've also tried using
simplehash.h for deduplicating followed by a sort_template.h-based sort.
But that was slower.

Ok.

v1-0007-Faster-qunique-comparator.patch: qunique() doesn't require a
full sort comparator (-1 = less, 0 = equal, 1 = greater) but only a
equal/unequal comparator (e.g. 0 = equal and 1 = unequal). The same
optimization can be done in plenty of other places in our code base.
Likely, in most of them the gains are insignificant.

Makes sense. I'm a little disappointed the compiler won't do that
optimization for us..

Perhaps we should introduce a new qunique_eq() function with a different
callback signature:

/* like qunique(), but the callback function returns true/false rather
than int */
static inline size_t
qunique_eq(void *array, size_t elements, size_t width,
bool (*equal) (const void *, const void *))

v1-0008-Add-ASCII-fastpath-to-generate_trgm_only.patch: Typically lots
of text is actually ASCII. Hence, we provide a fast path for this case
which is exercised if the MSB of the current character is unset.

This uses pg_ascii_tolower() when for ASCII characters when built with
the IGNORECASE. I don't think that's correct, if the proper collation
would do something more complicated for than what pg_ascii_tolower() does.

Did you measure how big is the impact from each individual patch?
Patches 1 and 2 seem pretty much ready to be committed, but I wonder if
they make any difference on their own.

- Heikki

Kirill Reshke

reshkekirill@gmail.com

5 days ago

In reply to: Heikki Linnakangas (#2)

Re: Reduce build times of pg_trgm GIN indexes

On Tue, 6 Jan 2026 at 22:00, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 05/01/2026 17:01, David Geier wrote:

The script I used for testing is attached. I ran CREATE INDEX three
times and took the fastest run. I'm getting the following results on my
i9-13950HX dev laptop in release build:

Data set | Patched (ms) | Master (ms) | Speedup
--------------------|--------------|--------------|----------
movies(plot) | 3,409 | 10,311 | 3.02x
lineitem(l_comment) | 161,569 | 256,986 | 1.59x

Impressive speedup!

The attached patches do the following:

- v1-0001-Inline-ginCompareAttEntries.patch: Inline
ginCompareAttEntries() which is very frequently called by the GIN code.

Looks good.

- v1-0002-Optimized-comparison-functions.patch: Use FunctionCallInvoke()
instead of FunctionCall2Coll(). This saves a bunch of per-comparison
setup code, such as calling InitFunctionCallInfoData().

You lose the check for NULL result with this. That's probably still
worth checking.

- v1-0003-Use-sort_template.h.patch: Use sort_template.h instead of
qsort(), to inline calls to the sort comparator. This is an interim step
that is further improved on by patch 0006.

ok

- v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch
ginExtractEntries() deduplicates and sorts the entries returned from the
extract value function. In case of pg_trgm, that is completely redundant
because the trigrams are already deduplicated and sorted. The current
version of this patch is just to demonstrate the potential. We need to
think about what we want here. Ideally, we would require the extraction
function to provide the entries deduplicated and sorted. Alternatively,
we could indicate to ginExtractEntries() if the entries are already
deduplicated and sorted. If we don't want to alter the signature of the
extract value function, we could e.g. use the MSB of the nentries argument.

Yeah, this seems wrong as it is. You're assuming that if the extract
function returns nullFlags == NULL, the array is already sorted and deduped.

- v1-0005-Make-btint4cmp-branchless.patch: Removes branches from
btint4cmp(), which is heavily called from the GIN code. This might as
well have benefit in other parts of the code base.

Seems reasonable.

v1-0006-Use-radix-sort.patch: Replace the sort_template.h-based qsort()
with radix sort. For the purpose of demonstrating the possible gains,
I've only replaced the signed variant for now. I've also tried using
simplehash.h for deduplicating followed by a sort_template.h-based sort.
But that was slower.

Ok.

v1-0007-Faster-qunique-comparator.patch: qunique() doesn't require a
full sort comparator (-1 = less, 0 = equal, 1 = greater) but only a
equal/unequal comparator (e.g. 0 = equal and 1 = unequal). The same
optimization can be done in plenty of other places in our code base.
Likely, in most of them the gains are insignificant.

Makes sense. I'm a little disappointed the compiler won't do that
optimization for us..

Perhaps we should introduce a new qunique_eq() function with a different
callback signature:

/* like qunique(), but the callback function returns true/false rather
than int */
static inline size_t
qunique_eq(void *array, size_t elements, size_t width,
bool (*equal) (const void *, const void *))

v1-0008-Add-ASCII-fastpath-to-generate_trgm_only.patch: Typically lots
of text is actually ASCII. Hence, we provide a fast path for this case
which is exercised if the MSB of the current character is unset.

This uses pg_ascii_tolower() when for ASCII characters when built with
the IGNORECASE. I don't think that's correct, if the proper collation
would do something more complicated for than what pg_ascii_tolower() does.

Did you measure how big is the impact from each individual patch?
Patches 1 and 2 seem pretty much ready to be committed, but I wonder if
they make any difference on their own.

- Heikki

Hi!
patches 0003, 0004 & 0008 applies with whitespace errors.

reshke@yezzey-cbdb-bench:~/pgpure$ git am v1-0003-Use-sort_template.h.patch
Applying: Use sort_template.h
.git/rebase-apply/patch:66: trailing whitespace.

warning: 1 line adds whitespace errors.
reshke@yezzey-cbdb-bench:~/pgpure$ git am
v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch
Applying: Avoid dedup and sort in ginExtractEntries
.git/rebase-apply/patch:30: trailing whitespace.
{
warning: 1 line adds whitespace errors.
reshke@yezzey-cbdb-bench:~/pgpure$ git am
v1-0008-Add-ASCII-fastpath-to-generate_trgm_only.patch
Applying: Add ASCII fastpath to generate_trgm_only()
.git/rebase-apply/patch:101: trailing whitespace.
else
warning: 1 line adds whitespace errors.

I did run benchmarks of my VM using your data. v1-0001 solely improves
runtime by 4-5%. v2-0002 does not show runtime improvement for me.
With parallel GIN build, performance gains are 2-3 % for 2 workers and
not noticeable after that.

I will try to run some more benchmarks on this.

--
Best regards,
Kirill Reshke

David Geier

geidav.pg@gmail.com

3 days ago

In reply to: Heikki Linnakangas (#2)

8 attachment(s)

Re: Reduce build times of pg_trgm GIN indexes

Hi Heikki!

Thanks for looking at the patch set.

On 06.01.2026 18:00, Heikki Linnakangas wrote:

On 05/01/2026 17:01, David Geier wrote:

- v1-0002-Optimized-comparison-functions.patch: Use FunctionCallInvoke()
instead of FunctionCall2Coll(). This saves a bunch of per-comparison
setup code, such as calling InitFunctionCallInfoData().

You lose the check for NULL result with this. That's probably still
worth checking.

It seems like existing code where all args are not null, has that safety
check. Added it for consistency.

- v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch
ginExtractEntries() deduplicates and sorts the entries returned from the
extract value function. In case of pg_trgm, that is completely redundant
because the trigrams are already deduplicated and sorted. The current
version of this patch is just to demonstrate the potential. We need to
think about what we want here. Ideally, we would require the extraction
function to provide the entries deduplicated and sorted. Alternatively,
we could indicate to ginExtractEntries() if the entries are already
deduplicated and sorted. If we don't want to alter the signature of the
extract value function, we could e.g. use the MSB of the nentries
argument.

Yeah, this seems wrong as it is. You're assuming that if the extract
function returns nullFlags == NULL, the array is already sorted and
deduped.

As said, that was just for demonstration purposes of the possible gains.
I've changed the code now such that the extractValue function of the GIN
index can indicate via the third argument uniqueAndSorted, if the
returned keys are already unique and sorted.

Unfortunately, it seems like this patch regresses performance. See
measurements below. I haven't had the time to investigate why that is.
It's pretty counter intuitive, given that this patch effectively only
removes code. Maybe you could re-test patch 0004 and share your runtimes?

v1-0007-Faster-qunique-comparator.patch: qunique() doesn't require a
full sort comparator (-1 = less, 0 = equal, 1 = greater) but only a
equal/unequal comparator (e.g. 0 = equal and 1 = unequal). The same
optimization can be done in plenty of other places in our code base.
Likely, in most of them the gains are insignificant.

Makes sense. I'm a little disappointed the compiler won't do that
optimization for us..

I thought the same.

Perhaps we should introduce a new qunique_eq() function with a different
callback signature:

/* like qunique(), but the callback function returns true/false rather
than int */
static inline size_t
qunique_eq(void *array, size_t elements, size_t width,
bool (*equal) (const void *, const void *))

I would prefer to change qunique() instead. That would enforce using an
adequate comparison function from the get go. There are only ~15 calls
to qunique(). So refactoring this should also be a fairly small patch. I
can do that if there's agreement for that approach.

v1-0008-Add-ASCII-fastpath-to-generate_trgm_only.patch: Typically lots
of text is actually ASCII. Hence, we provide a fast path for this case
which is exercised if the MSB of the current character is unset.

This uses pg_ascii_tolower() when for ASCII characters when built with
the IGNORECASE. I don't think that's correct, if the proper collation
would do something more complicated for than what pg_ascii_tolower() does.

Oh, that's evil. I had tested that specifically. But it only worked
because the code in master uses str_tolower() with
DEFAULT_COLLATION_OID. So using a different locale like in the following
example does something different than when creating a database with the
same locale.

postgres=# select lower('III' COLLATE "tr_TR");
lower
-------
ııı

postgres=# select show_trgm('III' COLLATE "tr_TR");
show_trgm
-------------------------
{" i"," ii","ii ",iii}
(1 row)

But when using tr_TR as default locale of the database the following
happens:

postgres=# select lower('III' COLLATE "tr_TR");
lower
-------
ııı

postgres=# select show_trgm('III');sü
show_trgm
---------------------------------------
{0xbbd8dd,0xf26fab,0xf31e1a,0x2af4f1}

I'm wondering if that's intentional to begin with. Shouldn't the code
instead pass PG_GET_COLLATION() to str_tolower()? Might require some
research to see how other index types handle locales.

Coming back to the original problem: the lengthy comment at the top of
pg_locale_libc.c, suggests that in some cases ASCII characters are
handled the pg_ascii_tolower() way for the default locale. See for
example tolower_libc_mb(). So a character by character conversion using
that function will yield a different result than strlower_libc_mb(). I'm
wondering why that is.

Anyways, we could limit the optimization to only kick in when the used
locale follows the same rules as pg_ascii_tolower(). We could test that
when creating the locale and store that info in pg_locale_struct.

Thoughts?

Did you measure how big is the impact from each individual patch?
Patches 1 and 2 seem pretty much ready to be committed, but I wonder if
they make any difference on their own.

Here is the impact of each patch. I ran again CREATE INDEX three times
and took the fastest run. The run of each patch includes all previous
patches as well. For example, the timings for patch 0003 were measured
with a binary that also had patch 0002 and 0001 applied. To get the
impact of each patch in isolation, the delta to the previous run was taken.

Code | movies |delta | lineitem | delta
------------------------------------|--------|-------|------------------
master | 10,311 | 0 | 256,986 | 0
v1-0001-Inline-ginCompareAttEntries | 9,694 | 617 | 239,778 | 17,208
v1-0002-Optimized-comparison-func | 9,510 | 184 | 238,094 | 1,684
v1-0003-Use-sort_template.h | 8,661 | 849 | 231,190 | 6,904
v1-0004-Avoid-dedup-and-sort-in | 9,305 | -644 | 232,472 | -1,282
v1-0005-Make-btint4cmp-branchless | 8,240 | 1,065 | 228,387 | 4,085
v1-0006-Use-radix-sort | 6,976 | 1,264 | 207,687 | 20,700
v1-0007-Faster-qunique-comparator | 5,911 | 1,065 | 203,744 | 3,943
v1-0008-Add-ASCII-fastpath | 3,409 | 2,502 | 161,469 | 42,275

Attached is v2 of the patch set with the aforementioned changes. I've
also fixed the white space errors in 0003, 0004 and 0008, as reported by
Kirill.

--
David Geier

Attachments:

v2-0008-Add-ASCII-fastpath-to-generate_trgm_only.patchtext/x-patch; charset=UTF-8; name=v2-0008-Add-ASCII-fastpath-to-generate_trgm_only.patchDownload

From 00c75bd58b7982c8bfe62d1260e9366766bc7f34 Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Fri, 14 Nov 2025 11:37:40 +0100
Subject: [PATCH v2 8/8] Add ASCII fastpath to generate_trgm_only()

---
 contrib/pg_trgm/trgm_op.c | 124 ++++++++++++++++++++------------------
 1 file changed, 65 insertions(+), 59 deletions(-)

diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 39b586f5b9a..d2087b3a45e 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -226,32 +226,6 @@ show_limit(PG_FUNCTION_ARGS)
 	PG_RETURN_FLOAT4(similarity_threshold);
 }
 
-/*
- * Finds first word in string, returns pointer to the word,
- * endword points to the character after word
- */
-static char *
-find_word(char *str, int lenstr, char **endword, int *charlen)
-{
-	char	   *beginword = str;
-
-	while (beginword - str < lenstr && !ISWORDCHR(beginword))
-		beginword += pg_mblen(beginword);
-
-	if (beginword - str >= lenstr)
-		return NULL;
-
-	*endword = beginword;
-	*charlen = 0;
-	while (*endword - str < lenstr && ISWORDCHR(*endword))
-	{
-		*endword += pg_mblen(*endword);
-		(*charlen)++;
-	}
-
-	return beginword;
-}
-
 /*
  * Reduce a trigram (three possibly multi-byte characters) to a trgm,
  * which is always exactly three bytes.  If we have three single-byte
@@ -337,58 +311,90 @@ make_trigrams(trgm *tptr, char *str, int bytelen, int charlen)
 static int
 generate_trgm_only(trgm *trg, char *str, int slen, TrgmBound *bounds)
 {
-	trgm	   *tptr;
-	char	   *buf;
-	int			charlen,
-				bytelen;
-	char	   *bword,
-			   *eword;
+	trgm *tptr = trg;
+	char *buf;
 
 	if (slen + LPADDING + RPADDING < 3 || slen == 0)
 		return 0;
 
-	tptr = trg;
-
-	/* Allocate a buffer for case-folded, blank-padded words */
-	buf = (char *) palloc(slen * pg_database_encoding_max_length() + 4);
+	buf = palloc_array(char, slen * pg_database_encoding_max_length() + 4 + 1);
+	memset(buf, ' ', LPADDING);
 
-	if (LPADDING > 0)
+	for (int i = 0; i < slen; )
 	{
-		*buf = ' ';
-		if (LPADDING > 1)
-			*(buf + 1) = ' ';
-	}
+		int num_bytes = LPADDING;
+		int num_chars = LPADDING;
+		char *word;
 
-	eword = str;
-	while ((bword = find_word(eword, slen - (eword - str), &eword, &charlen)) != NULL)
-	{
+		/* Extract next word */
+		while (i < slen)
+		{
+			if ((str[i] & 0x80) == 0) /* Fast path for ASCII-only */
+			{
+				if (isalnum(str[i]))
+				{
 #ifdef IGNORECASE
-		bword = str_tolower(bword, eword - bword, DEFAULT_COLLATION_OID);
-		bytelen = strlen(bword);
+					buf[num_bytes++] = pg_ascii_tolower(str[i++]);
 #else
-		bytelen = eword - bword;
+					buf[num_bytes++] = str[i++];
 #endif
+				}
+				else
+				{
+					i++;
+					break;
+				}
+			}
+			else
+			{
+				const int mblen = pg_mblen(str + i);
+				Assert(mblen >= 2); /* Otherwise, it would be ASCII */
+
+				if (ISWORDCHR(str + i))
+				{
+					memcpy(buf + num_bytes, str + i, mblen);
+					num_bytes += mblen;
+					i += mblen;
+				}
+				else
+				{
+					i += mblen;
+					break;
+				}
+			}
+
+			num_chars++;
+		}
 
-		memcpy(buf + LPADDING, bword, bytelen);
+		if (num_chars > LPADDING)
+		{
+			memset(buf + num_bytes, ' ', RPADDING);
+			num_bytes += RPADDING;
+			num_chars += RPADDING;
+			word = buf;
 
 #ifdef IGNORECASE
-		pfree(bword);
+			if (num_chars != num_bytes)
+			{
+				word = str_tolower(buf, num_bytes, DEFAULT_COLLATION_OID);
+				num_bytes = strlen(word); /* String can get shorter from lower-casing */
+			}
 #endif
 
-		buf[LPADDING + bytelen] = ' ';
-		buf[LPADDING + bytelen + 1] = ' ';
+			if (bounds)
+				bounds[tptr - trg] |= TRGM_BOUND_LEFT;
+
+			tptr = make_trigrams(tptr, word, num_bytes, num_chars);
+
+			if (bounds)
+				bounds[tptr - trg - 1] |= TRGM_BOUND_RIGHT;
 
-		/* Calculate trigrams marking their bounds if needed */
-		if (bounds)
-			bounds[tptr - trg] |= TRGM_BOUND_LEFT;
-		tptr = make_trigrams(tptr, buf, bytelen + LPADDING + RPADDING,
-							 charlen + LPADDING + RPADDING);
-		if (bounds)
-			bounds[tptr - trg - 1] |= TRGM_BOUND_RIGHT;
+			if (word != buf)
+				pfree(word);
+		}
 	}
 
 	pfree(buf);
-
 	return tptr - trg;
 }
 
-- 
2.51.0

v2-0007-Faster-qunique-comparator.patchtext/x-patch; charset=UTF-8; name=v2-0007-Faster-qunique-comparator.patchDownload

From 457a3d0d57a834a80237d628d353408f3e2f4378 Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Wed, 12 Nov 2025 14:27:13 +0100
Subject: [PATCH v2 7/8] Faster qunique() comparator

---
 contrib/pg_trgm/trgm_op.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 039c273f6a1..39b586f5b9a 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -139,6 +139,14 @@ CMPTRGM_UNSIGNED(const void *a, const void *b)
 		   : CMPPCHAR_UNS(a, b, 2));
 }
 
+static inline int
+CMPTRGM_EQ(const void *a, const void *b)
+{
+	char *aa = (char *)a;
+	char *bb = (char *)b;
+	return aa[0] != bb[0] || aa[1] != bb[1] || aa[2] != bb[2] ? 1 : 0;
+}
+
 /*
  * This gets called on the first call. It replaces the function pointer so
  * that subsequent calls are routed directly to the chosen implementation.
@@ -482,15 +490,11 @@ generate_trgm(char *str, int slen)
 	if (len > 1)
 	{
 		if (GetDefaultCharSignedness())
-		{
 			radix_sort_trigrams_signed((trgm *)GETARR(trg), len);
-			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
-		}
 		else
-		{
 			trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
-			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED);
-		}
+
+		len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_EQ);
 	}
 
 	SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
@@ -1038,15 +1042,11 @@ generate_wildcard_trgm(const char *str, int slen)
 	if (len > 1)
 	{
 		if (GetDefaultCharSignedness())
-		{
 			radix_sort_trigrams_signed((trgm *)GETARR(trg), len);
-			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
-		}
 		else
-		{
 			trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
-			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED);
-		}
+
+		len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_EQ);
 	}
 
 	SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
-- 
2.51.0

v2-0006-Use-radix-sort.patchtext/x-patch; charset=UTF-8; name=v2-0006-Use-radix-sort.patchDownload

From 447ce313602e768f29c43d4a5359f4c909b8db46 Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Tue, 11 Nov 2025 13:18:59 +0100
Subject: [PATCH v2 6/8] Use radix sort

---
 contrib/pg_trgm/trgm_op.c | 64 +++++++++++++++++++++++++++++++++------
 1 file changed, 54 insertions(+), 10 deletions(-)

diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 6af120fa1ad..039c273f6a1 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -155,14 +155,6 @@ CMPTRGM_CHOOSE(const void *a, const void *b)
 }
 
 /* Define our specialized sort function name */
-#define ST_SORT trigram_qsort_signed
-#define ST_ELEMENT_TYPE_VOID
-#define ST_COMPARE(a, b) CMPTRGM_SIGNED(a, b)
-#define ST_SCOPE static
-#define ST_DEFINE
-#define ST_DECLARE
-#include "lib/sort_template.h"
-
 #define ST_SORT trigram_qsort_unsigned
 #define ST_ELEMENT_TYPE_VOID
 #define ST_COMPARE(a, b) CMPTRGM_UNSIGNED(a, b)
@@ -392,6 +384,58 @@ generate_trgm_only(trgm *trg, char *str, int slen, TrgmBound *bounds)
 	return tptr - trg;
 }
 
+/*
+ * Needed to properly handle negative numbers in case char is signed.
+ */
+static inline unsigned char FlipSign(char x)
+{
+	return x^0x80;
+}
+
+static void radix_sort_trigrams_signed(trgm *trg, int count)
+{
+	trgm *buffer = palloc_array(trgm, count);
+	trgm *starts[256];
+	trgm *from = trg;
+	trgm *to = buffer;
+	int freqs[3][256];
+
+	/*
+	 * Compute frequencies to partition the buffer.
+	 */
+	memset(freqs, 0, sizeof(freqs));
+
+	for (int i=0; i<count; i++)
+		for (int j=0; j<3; j++)
+			freqs[j][FlipSign(trg[i][j])]++;
+
+	/*
+	 * Do the sorting. Start with last character because that's the is "LSB"
+	 * in a trigram. Avoid unnecessary copies by ping-ponging between the buffers.
+	 */
+	for (int i=2; i>=0; i--)
+	{
+		trgm *old_from = from;
+		trgm *next = to;
+
+		for (int j=0; j<256; j++)
+		{
+			starts[j] = next;
+			next += freqs[i][j];
+		}
+
+		for (int j=0; j<count; j++)
+			memcpy(starts[FlipSign(from[j][i])]++, from[j], sizeof(trgm));
+
+		from = to;
+		to = old_from;
+	}
+
+	Assert(to == buffer);
+	memcpy(trg, buffer, sizeof(trgm) * count);
+	pfree(buffer);
+}
+
 /*
  * Guard against possible overflow in the palloc requests below.  (We
  * don't worry about the additive constants, since palloc can detect
@@ -439,7 +483,7 @@ generate_trgm(char *str, int slen)
 	{
 		if (GetDefaultCharSignedness())
 		{
-			trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
+			radix_sort_trigrams_signed((trgm *)GETARR(trg), len);
 			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
 		}
 		else
@@ -995,7 +1039,7 @@ generate_wildcard_trgm(const char *str, int slen)
 	{
 		if (GetDefaultCharSignedness())
 		{
-			trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
+			radix_sort_trigrams_signed((trgm *)GETARR(trg), len);
 			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
 		}
 		else
-- 
2.51.0

v2-0005-Make-btint4cmp-branchless.patchtext/x-patch; charset=UTF-8; name=v2-0005-Make-btint4cmp-branchless.patchDownload

From c68aa700a245f1eb05f701a227776e24d0c0afe7 Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Mon, 10 Nov 2025 15:40:11 +0100
Subject: [PATCH v2 5/8] Make btint4cmp() branchless

---
 src/backend/access/nbtree/nbtcompare.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/nbtree/nbtcompare.c b/src/backend/access/nbtree/nbtcompare.c
index bffc4b7709c..5ae27c22621 100644
--- a/src/backend/access/nbtree/nbtcompare.c
+++ b/src/backend/access/nbtree/nbtcompare.c
@@ -60,6 +60,7 @@
 #include "utils/fmgrprotos.h"
 #include "utils/skipsupport.h"
 #include "utils/sortsupport.h"
+#include "common/int.h"
 
 #ifdef STRESS_SORT_INT_MIN
 #define A_LESS_THAN_B		INT_MIN
@@ -202,12 +203,7 @@ btint4cmp(PG_FUNCTION_ARGS)
 	int32		a = PG_GETARG_INT32(0);
 	int32		b = PG_GETARG_INT32(1);
 
-	if (a > b)
-		PG_RETURN_INT32(A_GREATER_THAN_B);
-	else if (a == b)
-		PG_RETURN_INT32(0);
-	else
-		PG_RETURN_INT32(A_LESS_THAN_B);
+	PG_RETURN_INT32(pg_cmp_s32(a, b));
 }
 
 Datum
-- 
2.51.0

v2-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patchtext/x-patch; charset=UTF-8; name=v2-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patchDownload

From 0cdc87640cf2a2d8c946aff1669706f99280c555 Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Mon, 10 Nov 2025 14:40:37 +0100
Subject: [PATCH v2 4/8] Avoid dedup and sort in ginExtractEntries

---
 contrib/pg_trgm/trgm_gin.c       |  2 ++
 doc/src/sgml/gin.sgml            |  7 ++++++-
 src/backend/access/gin/ginutil.c | 32 +++++++++++++++++++-------------
 3 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/contrib/pg_trgm/trgm_gin.c b/contrib/pg_trgm/trgm_gin.c
index 66ff6adde99..862e650efec 100644
--- a/contrib/pg_trgm/trgm_gin.c
+++ b/contrib/pg_trgm/trgm_gin.c
@@ -36,10 +36,12 @@ gin_extract_value_trgm(PG_FUNCTION_ARGS)
 {
 	text	   *val = (text *) PG_GETARG_TEXT_PP(0);
 	int32	   *nentries = (int32 *) PG_GETARG_POINTER(1);
+	bool	   *uniqueAndSorted = (bool *) PG_GETARG_POINTER(3);
 	Datum	   *entries = NULL;
 	TRGM	   *trg;
 	int32		trglen;
 
+	*uniqueAndSorted = true;
 	*nentries = 0;
 
 	trg = generate_trgm(VARDATA_ANY(val), VARSIZE_ANY_EXHDR(val));
diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml
index 82410b1fbdf..b96478731f8 100644
--- a/doc/src/sgml/gin.sgml
+++ b/doc/src/sgml/gin.sgml
@@ -167,7 +167,7 @@
   <variablelist>
     <varlistentry>
      <term><function>Datum *extractValue(Datum itemValue, int32 *nkeys,
-        bool **nullFlags)</function></term>
+        bool **nullFlags, bool *uniqueAndSorted)</function></term>
      <listitem>
       <para>
        Returns a palloc'd array of keys given an item to be indexed.  The
@@ -177,6 +177,11 @@
        <literal>*nullFlags</literal>, and set these null flags as needed.
        <literal>*nullFlags</literal> can be left <symbol>NULL</symbol> (its initial value)
        if all keys are non-null.
+       If the returned keys do not contain duplicates and are sorted w.r.t. the comparison
+       function of the GIN type's operator class, store <symbol>true</symbol> in
+       <literal>uniqueAndSorted</literal>. <literal>uniqueAndSorted</literal> can be left
+       <symbol>false</symbol> (its initial value) if the keys are either unsorted or contain
+       duplicates. In that case, duplicate removal and sorting is performed by the GIN index.
        The return value can be <symbol>NULL</symbol> if the item contains no keys.
       </para>
      </listitem>
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 75a18f457bc..22b588483d0 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -464,6 +464,7 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 {
 	Datum	   *entries;
 	bool	   *nullFlags;
+	bool		uniqueAndSorted = false;
 	int32		i;
 
 	/*
@@ -483,11 +484,12 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 	/* OK, call the opclass's extractValueFn */
 	nullFlags = NULL;			/* in case extractValue doesn't set it */
 	entries = (Datum *)
-		DatumGetPointer(FunctionCall3Coll(&ginstate->extractValueFn[attnum - 1],
+		DatumGetPointer(FunctionCall4Coll(&ginstate->extractValueFn[attnum - 1],
 										  ginstate->supportCollation[attnum - 1],
 										  value,
 										  PointerGetDatum(nentries),
-										  PointerGetDatum(&nullFlags)));
+										  PointerGetDatum(&nullFlags),
+										  PointerGetDatum(&uniqueAndSorted)));
 
 	/*
 	 * Generate a placeholder if the item contained no keys.
@@ -502,13 +504,6 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 		return entries;
 	}
 
-	/*
-	 * If the extractValueFn didn't create a nullFlags array, create one,
-	 * assuming that everything's non-null.
-	 */
-	if (nullFlags == NULL)
-		nullFlags = (bool *) palloc0(*nentries * sizeof(bool));
-
 	/*
 	 * If there's more than one key, sort and unique-ify.
 	 *
@@ -516,11 +511,18 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 	 * pretty bad too.  For small numbers of keys it'd likely be better to use
 	 * a simple insertion sort.
 	 */
-	if (*nentries > 1)
+	if (*nentries > 1 && !uniqueAndSorted)
 	{
 		keyEntryData *keydata;
 		cmpEntriesArg arg;
 
+		/*
+		 * If the extractValueFn didn't create a nullFlags array, create one,
+		 * assuming that everything's non-null.
+		 */
+		if (nullFlags == NULL)
+			nullFlags = (bool *) palloc0(*nentries * sizeof(bool));
+
 		keydata = palloc_array(keyEntryData, *nentries);
 		for (i = 0; i < *nentries; i++)
 		{
@@ -568,9 +570,13 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 	/*
 	 * Create GinNullCategory representation from nullFlags.
 	 */
-	*categories = (GinNullCategory *) palloc0(*nentries * sizeof(GinNullCategory));
-	for (i = 0; i < *nentries; i++)
-		(*categories)[i] = (nullFlags[i] ? GIN_CAT_NULL_KEY : GIN_CAT_NORM_KEY);
+	StaticAssertStmt(GIN_CAT_NORM_KEY == 0, "Assuming GIN_CAT_NORM_KEY is 0");
+	*categories = palloc0_array(GinNullCategory, *nentries);
+
+	if (nullFlags != NULL)
+		for (i = 0; i < *nentries; i++)
+			if (nullFlags[i])
+				(*categories)[i] = GIN_CAT_NULL_KEY;
 
 	return entries;
 }
-- 
2.51.0

v2-0003-Use-sort_template.h.patchtext/x-patch; charset=UTF-8; name=v2-0003-Use-sort_template.h.patchDownload

From c38f3517d05d3cefa0fc6d994f4e8f6ac14273d9 Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Mon, 10 Nov 2025 13:35:11 +0100
Subject: [PATCH v2 3/8] Use sort_template.h

---
 contrib/pg_trgm/trgm_op.c | 47 ++++++++++++++++++++++++++++++---------
 1 file changed, 37 insertions(+), 10 deletions(-)

diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 81182a15e07..6af120fa1ad 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -154,6 +154,23 @@ CMPTRGM_CHOOSE(const void *a, const void *b)
 	return CMPTRGM(a, b);
 }
 
+/* Define our specialized sort function name */
+#define ST_SORT trigram_qsort_signed
+#define ST_ELEMENT_TYPE_VOID
+#define ST_COMPARE(a, b) CMPTRGM_SIGNED(a, b)
+#define ST_SCOPE static
+#define ST_DEFINE
+#define ST_DECLARE
+#include "lib/sort_template.h"
+
+#define ST_SORT trigram_qsort_unsigned
+#define ST_ELEMENT_TYPE_VOID
+#define ST_COMPARE(a, b) CMPTRGM_UNSIGNED(a, b)
+#define ST_SCOPE static
+#define ST_DEFINE
+#define ST_DECLARE
+#include "lib/sort_template.h"
+
 /*
  * Deprecated function.
  * Use "pg_trgm.similarity_threshold" GUC variable instead of this function.
@@ -209,12 +226,6 @@ show_limit(PG_FUNCTION_ARGS)
 	PG_RETURN_FLOAT4(similarity_threshold);
 }
 
-static int
-comp_trgm(const void *a, const void *b)
-{
-	return CMPTRGM(a, b);
-}
-
 /*
  * Finds first word in string, returns pointer to the word,
  * endword points to the character after word
@@ -426,8 +437,16 @@ generate_trgm(char *str, int slen)
 	 */
 	if (len > 1)
 	{
-		qsort(GETARR(trg), len, sizeof(trgm), comp_trgm);
-		len = qunique(GETARR(trg), len, sizeof(trgm), comp_trgm);
+		if (GetDefaultCharSignedness())
+		{
+			trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
+			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
+		}
+		else
+		{
+			trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
+			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED);
+		}
 	}
 
 	SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
@@ -974,8 +993,16 @@ generate_wildcard_trgm(const char *str, int slen)
 	 */
 	if (len > 1)
 	{
-		qsort(GETARR(trg), len, sizeof(trgm), comp_trgm);
-		len = qunique(GETARR(trg), len, sizeof(trgm), comp_trgm);
+		if (GetDefaultCharSignedness())
+		{
+			trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
+			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
+		}
+		else
+		{
+			trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
+			len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED);
+		}
 	}
 
 	SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
-- 
2.51.0

v2-0002-Optimized-comparison-functions.patchtext/x-patch; charset=UTF-8; name=v2-0002-Optimized-comparison-functions.patchDownload

From ae80c2dc190493f8c9811c1949c7ebced265abd5 Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Mon, 10 Nov 2025 13:35:01 +0100
Subject: [PATCH v2 2/8] Optimized comparison functions

---
 src/backend/access/gin/ginutil.c | 23 ++++++++++++++++-------
 src/include/access/gin_private.h | 19 +++++++++++++++----
 2 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index d205093e21d..75a18f457bc 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -114,6 +114,7 @@ initGinState(GinState *state, Relation index)
 	for (i = 0; i < origTupdesc->natts; i++)
 	{
 		Form_pg_attribute attr = TupleDescAttr(origTupdesc, i);
+		FunctionCallInfoBaseData *fci  = &state->compareFnCallInfo[i].fcinfo;
 
 		if (state->oneCol)
 			state->tupdesc[i] = state->origTupdesc;
@@ -222,6 +223,10 @@ initGinState(GinState *state, Relation index)
 			state->supportCollation[i] = index->rd_indcollation[i];
 		else
 			state->supportCollation[i] = DEFAULT_COLLATION_OID;
+
+		InitFunctionCallInfoData(*fci, &state->compareFn[i], 2, state->supportCollation[i], NULL, NULL);
+		fci->args[0].isnull = false;
+		fci->args[1].isnull = false;
 	}
 }
 
@@ -402,8 +407,7 @@ typedef struct
 
 typedef struct
 {
-	FmgrInfo   *cmpDatumFunc;
-	Oid			collation;
+	FunctionCallInfoBaseData   *cmpFuncInfo;
 	bool		haveDups;
 } cmpEntriesArg;
 
@@ -425,9 +429,15 @@ cmpEntries(const void *a, const void *b, void *arg)
 	else if (bb->isnull)
 		res = -1;				/* not-NULL "<" NULL */
 	else
-		res = DatumGetInt32(FunctionCall2Coll(data->cmpDatumFunc,
-											  data->collation,
-											  aa->datum, bb->datum));
+	{
+		FunctionCallInfo fci = data->cmpFuncInfo;
+		fci->args[0].value = aa->datum;
+		fci->args[1].value = bb->datum;
+		res = DatumGetInt32(FunctionCallInvoke(fci));
+
+		if (fci->isnull)
+			elog(ERROR, "function %u returned NULL", fci->flinfo->fn_oid);
+	}
 
 	/*
 	 * Detect if we have any duplicates.  If there are equal keys, qsort must
@@ -518,8 +528,7 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 			keydata[i].isnull = nullFlags[i];
 		}
 
-		arg.cmpDatumFunc = &ginstate->compareFn[attnum - 1];
-		arg.collation = ginstate->supportCollation[attnum - 1];
+		arg.cmpFuncInfo = &ginstate->compareFnCallInfo[attnum - 1].fcinfo;
 		arg.haveDups = false;
 		qsort_arg(keydata, *nentries, sizeof(keyEntryData),
 				  cmpEntries, &arg);
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index e155045ce8a..7cf19c8a5dc 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -51,6 +51,11 @@ typedef struct GinOptions
 #define GIN_SHARE	BUFFER_LOCK_SHARE
 #define GIN_EXCLUSIVE  BUFFER_LOCK_EXCLUSIVE
 
+typedef union CompareFuncCallInfoData
+{
+	FunctionCallInfoBaseData fcinfo;
+	char fcinfo_data[SizeForFunctionCallInfo(2)];
+} CompareFuncCallInfoData;
 
 /*
  * GinState: working data structure describing the index being worked on
@@ -77,6 +82,10 @@ typedef struct GinState
 	/*
 	 * Per-index-column opclass support functions
 	 */
+
+
+	CompareFuncCallInfoData compareFnCallInfo[INDEX_MAX_KEYS];
+
 	FmgrInfo	compareFn[INDEX_MAX_KEYS];
 	FmgrInfo	extractValueFn[INDEX_MAX_KEYS];
 	FmgrInfo	extractQueryFn[INDEX_MAX_KEYS];
@@ -504,6 +513,8 @@ ginCompareEntries(GinState *ginstate, OffsetNumber attnum,
 				  Datum a, GinNullCategory categorya,
 				  Datum b, GinNullCategory categoryb)
 {
+	FunctionCallInfoBaseData *fci;
+
 	/* if not of same null category, sort by that first */
 	if (categorya != categoryb)
 		return (categorya < categoryb) ? -1 : 1;
@@ -512,10 +523,10 @@ ginCompareEntries(GinState *ginstate, OffsetNumber attnum,
 	if (categorya != GIN_CAT_NORM_KEY)
 		return 0;
 
-	/* both not null, so safe to call the compareFn */
-	return DatumGetInt32(FunctionCall2Coll(&ginstate->compareFn[attnum - 1],
-										   ginstate->supportCollation[attnum - 1],
-										   a, b));
+	fci = &ginstate->compareFnCallInfo[attnum - 1].fcinfo;
+	fci->args[0].value = a;
+	fci->args[1].value = b;
+	return DatumGetInt32(FunctionCallInvoke(fci));
 }
 
 /*
-- 
2.51.0

v2-0001-Inline-ginCompareAttEntries.patchtext/x-patch; charset=UTF-8; name=v2-0001-Inline-ginCompareAttEntries.patchDownload

From a484e7c69bec62474b041eb4e533601e4883dab3 Mon Sep 17 00:00:00 2001
From: David Geier <geidav.pg@gmail.com>
Date: Thu, 6 Nov 2025 09:42:27 +0100
Subject: [PATCH v2 1/8] Inline ginCompareAttEntries

---
 src/backend/access/gin/ginutil.c | 38 ---------------------------
 src/include/access/gin_private.h | 44 +++++++++++++++++++++++++++-----
 2 files changed, 38 insertions(+), 44 deletions(-)

diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a546cac18d3..d205093e21d 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -387,44 +387,6 @@ GinInitMetabuffer(Buffer b)
 		((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
 }
 
-/*
- * Compare two keys of the same index column
- */
-int
-ginCompareEntries(GinState *ginstate, OffsetNumber attnum,
-				  Datum a, GinNullCategory categorya,
-				  Datum b, GinNullCategory categoryb)
-{
-	/* if not of same null category, sort by that first */
-	if (categorya != categoryb)
-		return (categorya < categoryb) ? -1 : 1;
-
-	/* all null items in same category are equal */
-	if (categorya != GIN_CAT_NORM_KEY)
-		return 0;
-
-	/* both not null, so safe to call the compareFn */
-	return DatumGetInt32(FunctionCall2Coll(&ginstate->compareFn[attnum - 1],
-										   ginstate->supportCollation[attnum - 1],
-										   a, b));
-}
-
-/*
- * Compare two keys of possibly different index columns
- */
-int
-ginCompareAttEntries(GinState *ginstate,
-					 OffsetNumber attnuma, Datum a, GinNullCategory categorya,
-					 OffsetNumber attnumb, Datum b, GinNullCategory categoryb)
-{
-	/* attribute number is the first sort key */
-	if (attnuma != attnumb)
-		return (attnuma < attnumb) ? -1 : 1;
-
-	return ginCompareEntries(ginstate, attnuma, a, categorya, b, categoryb);
-}
-
-
 /*
  * Support for sorting key datums in ginExtractEntries
  *
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index b33f7cec5b4..e155045ce8a 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -97,12 +97,6 @@ extern Buffer GinNewBuffer(Relation index);
 extern void GinInitBuffer(Buffer b, uint32 f);
 extern void GinInitPage(Page page, uint32 f, Size pageSize);
 extern void GinInitMetabuffer(Buffer b);
-extern int	ginCompareEntries(GinState *ginstate, OffsetNumber attnum,
-							  Datum a, GinNullCategory categorya,
-							  Datum b, GinNullCategory categoryb);
-extern int	ginCompareAttEntries(GinState *ginstate,
-								 OffsetNumber attnuma, Datum a, GinNullCategory categorya,
-								 OffsetNumber attnumb, Datum b, GinNullCategory categoryb);
 extern Datum *ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 								Datum value, bool isNull,
 								int32 *nentries, GinNullCategory **categories);
@@ -502,6 +496,44 @@ ginCompareItemPointers(ItemPointer a, ItemPointer b)
 	return pg_cmp_u64(ia, ib);
 }
 
+/*
+ * Compare two keys of the same index column
+ */
+static inline int
+ginCompareEntries(GinState *ginstate, OffsetNumber attnum,
+				  Datum a, GinNullCategory categorya,
+				  Datum b, GinNullCategory categoryb)
+{
+	/* if not of same null category, sort by that first */
+	if (categorya != categoryb)
+		return (categorya < categoryb) ? -1 : 1;
+
+	/* all null items in same category are equal */
+	if (categorya != GIN_CAT_NORM_KEY)
+		return 0;
+
+	/* both not null, so safe to call the compareFn */
+	return DatumGetInt32(FunctionCall2Coll(&ginstate->compareFn[attnum - 1],
+										   ginstate->supportCollation[attnum - 1],
+										   a, b));
+}
+
+/*
+ * Compare two keys of possibly different index columns
+ */
+static inline int
+ginCompareAttEntries(GinState *ginstate,
+					 OffsetNumber attnuma, Datum a, GinNullCategory categorya,
+					 OffsetNumber attnumb, Datum b, GinNullCategory categoryb)
+{
+	/* attribute number is the first sort key */
+	if (attnuma != attnumb)
+		return (attnuma < attnumb) ? -1 : 1;
+
+	return ginCompareEntries(ginstate, attnuma, a, categorya, b, categoryb);
+
+}
+
 extern int	ginTraverseLock(Buffer buffer, bool searchMode);
 
 #endif							/* GIN_PRIVATE_H */
-- 
2.51.0

David Geier

geidav.pg@gmail.com

3 days ago

In reply to: Kirill Reshke (#3)

Re: Reduce build times of pg_trgm GIN indexes

Hi!

Hi!
patches 0003, 0004 & 0008 applies with whitespace errors.

I've fixed the white space errors. See v2 of the patch set in [1]/messages/by-id/e5dd01c6-c469-405d-aea2-feca0b2dc34d@gmail.com.

I did run benchmarks of my VM using your data. v1-0001 solely improves
runtime by 4-5%. v2-0002 does not show runtime improvement for me.
With parallel GIN build, performance gains are 2-3 % for 2 workers and
not noticeable after that.

I will try to run some more benchmarks on this.

Thanks. I've also included the delta for each patch in [1]/messages/by-id/e5dd01c6-c469-405d-aea2-feca0b2dc34d@gmail.com. I would be
curious what you measure, especially for patch 0004, where I curiously
measured a regression.

[1]: /messages/by-id/e5dd01c6-c469-405d-aea2-feca0b2dc34d@gmail.com
/messages/by-id/e5dd01c6-c469-405d-aea2-feca0b2dc34d@gmail.com

--
David Geier

Heikki Linnakangas

hlinnaka@iki.fi

2 days ago

In reply to: David Geier (#4)

2 attachment(s)

Re: Reduce build times of pg_trgm GIN indexes

On 09/01/2026 14:06, David Geier wrote:

On 06.01.2026 18:00, Heikki Linnakangas wrote:

On 05/01/2026 17:01, David Geier wrote:

- v1-0002-Optimized-comparison-functions.patch: Use FunctionCallInvoke()
instead of FunctionCall2Coll(). This saves a bunch of per-comparison
setup code, such as calling InitFunctionCallInfoData().

You lose the check for NULL result with this. That's probably still
worth checking.

It seems like existing code where all args are not null, has that safety
check. Added it for consistency.

- v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch
ginExtractEntries() deduplicates and sorts the entries returned from the
extract value function. In case of pg_trgm, that is completely redundant
because the trigrams are already deduplicated and sorted. The current
version of this patch is just to demonstrate the potential. We need to
think about what we want here. Ideally, we would require the extraction
function to provide the entries deduplicated and sorted. Alternatively,
we could indicate to ginExtractEntries() if the entries are already
deduplicated and sorted. If we don't want to alter the signature of the
extract value function, we could e.g. use the MSB of the nentries
argument.

Yeah, this seems wrong as it is. You're assuming that if the extract
function returns nullFlags == NULL, the array is already sorted and
deduped.

As said, that was just for demonstration purposes of the possible gains.
I've changed the code now such that the extractValue function of the GIN
index can indicate via the third argument uniqueAndSorted, if the
returned keys are already unique and sorted.

Unfortunately, it seems like this patch regresses performance. See
measurements below. I haven't had the time to investigate why that is.
It's pretty counter intuitive, given that this patch effectively only
removes code. Maybe you could re-test patch 0004 and share your runtimes?

Looking at 0001 and 0004 patches and ginExtractEntries(), the way
ginExtractEntries() handles NULLs looks a little inefficient. It treats
NULLs as proper entries, passing them through qsort() for deduplication
and comparing them with cmpEntries(). But the end result is always that
if the opclass's extractValueFn() function returned any NULLs, then
there's exctly one GIN_CAT_NULL_KEY as the last entry of the result
array. Surely we could be smarter about how we accomplish that. Your
0004 patch eliminates the deduplication overhead altogether, which is
great of course, but the point remains for when we still need the
deduplication.

Attached is an attempt at that. It avoids the null-checks in
cmpEntries(), saving a few cycles. That's drowned in noise with your
test cases, but with the attached test case with int arrays, I'm seeing
a 1-2 % gain. That's not much, but I think it's still worth doing
because it makes the code a little simpler too, IMHO. (I didn't test it
together with the rest of your patches.)

Perhaps we should introduce a new qunique_eq() function with a different
callback signature:

/* like qunique(), but the callback function returns true/false rather
than int */
static inline size_t
qunique_eq(void *array, size_t elements, size_t width,
bool (*equal) (const void *, const void *))

I would prefer to change qunique() instead. That would enforce using an
adequate comparison function from the get go. There are only ~15 calls
to qunique(). So refactoring this should also be a fairly small patch. I
can do that if there's agreement for that approach.

Works for me.

At quick glance, most if not all of the qunique() calls call qsort()
just before qunique(). I wonder if we should have a single "sort and
deduplicate" function, instead. It could perhaps do some deduplication
"on the go", or other optimizations.

Did you measure how big is the impact from each individual patch?
Patches 1 and 2 seem pretty much ready to be committed, but I wonder if
they make any difference on their own.

Here is the impact of each patch. I ran again CREATE INDEX three times
and took the fastest run. The run of each patch includes all previous
patches as well. For example, the timings for patch 0003 were measured
with a binary that also had patch 0002 and 0001 applied. To get the
impact of each patch in isolation, the delta to the previous run was taken.

Code | movies |delta | lineitem | delta
------------------------------------|--------|-------|------------------
master | 10,311 | 0 | 256,986 | 0
v1-0001-Inline-ginCompareAttEntries | 9,694 | 617 | 239,778 | 17,208
v1-0002-Optimized-comparison-func | 9,510 | 184 | 238,094 | 1,684
v1-0003-Use-sort_template.h | 8,661 | 849 | 231,190 | 6,904
v1-0004-Avoid-dedup-and-sort-in | 9,305 | -644 | 232,472 | -1,282
v1-0005-Make-btint4cmp-branchless | 8,240 | 1,065 | 228,387 | 4,085
v1-0006-Use-radix-sort | 6,976 | 1,264 | 207,687 | 20,700
v1-0007-Faster-qunique-comparator | 5,911 | 1,065 | 203,744 | 3,943
v1-0008-Add-ASCII-fastpath | 3,409 | 2,502 | 161,469 | 42,275

Attached is v2 of the patch set with the aforementioned changes. I've
also fixed the white space errors in 0003, 0004 and 0008, as reported by
Kirill.

Thanks, I pushed patch 0001 now, that's a simple and clear win.

- Heikki

Attachments:

0001-Make-the-deduplication-in-ginExtractEntries-a-little.patchtext/x-patch; charset=UTF-8; name=0001-Make-the-deduplication-in-ginExtractEntries-a-little.patchDownload

From a0348e8dedb9a7ee53af8f88adbba0e5aeff577d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 9 Jan 2026 19:31:19 +0200
Subject: [PATCH 1/1] Make the deduplication in ginExtractEntries() a little
 faster

The idea is to remove NULLs from the array first, and use qsort to
deduplicate only the non-NULL items. That simplifies the comparison
function a little.
---
 src/backend/access/gin/ginutil.c | 139 +++++++++++++------------------
 src/include/access/gin_private.h |   2 +-
 2 files changed, 61 insertions(+), 80 deletions(-)

diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index d205093e21d..29966b9d05b 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -387,19 +387,6 @@ GinInitMetabuffer(Buffer b)
 		((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
 }
 
-/*
- * Support for sorting key datums in ginExtractEntries
- *
- * Note: we only have to worry about null and not-null keys here;
- * ginExtractEntries never generates more than one placeholder null,
- * so it doesn't have to sort those.
- */
-typedef struct
-{
-	Datum		datum;
-	bool		isnull;
-} keyEntryData;
-
 typedef struct
 {
 	FmgrInfo   *cmpDatumFunc;
@@ -410,24 +397,14 @@ typedef struct
 static int
 cmpEntries(const void *a, const void *b, void *arg)
 {
-	const keyEntryData *aa = (const keyEntryData *) a;
-	const keyEntryData *bb = (const keyEntryData *) b;
+	const Datum *aa = (const Datum *) a;
+	const Datum *bb = (const Datum *) b;
 	cmpEntriesArg *data = (cmpEntriesArg *) arg;
 	int			res;
 
-	if (aa->isnull)
-	{
-		if (bb->isnull)
-			res = 0;			/* NULL "=" NULL */
-		else
-			res = 1;			/* NULL ">" not-NULL */
-	}
-	else if (bb->isnull)
-		res = -1;				/* not-NULL "<" NULL */
-	else
-		res = DatumGetInt32(FunctionCall2Coll(data->cmpDatumFunc,
-											  data->collation,
-											  aa->datum, bb->datum));
+	res = DatumGetInt32(FunctionCall2Coll(data->cmpDatumFunc,
+										  data->collation,
+										  *aa, *bb));
 
 	/*
 	 * Detect if we have any duplicates.  If there are equal keys, qsort must
@@ -450,11 +427,13 @@ cmpEntries(const void *a, const void *b, void *arg)
 Datum *
 ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 				  Datum value, bool isNull,
-				  int32 *nentries, GinNullCategory **categories)
+				  int32 *nentries_p, GinNullCategory **categories_p)
 {
 	Datum	   *entries;
 	bool	   *nullFlags;
-	int32		i;
+	GinNullCategory *categories;
+	bool		hasNull;
+	int32		nentries;
 
 	/*
 	 * We don't call the extractValueFn on a null item.  Instead generate a
@@ -462,42 +441,60 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 	 */
 	if (isNull)
 	{
-		*nentries = 1;
+		*nentries_p = 1;
 		entries = palloc_object(Datum);
 		entries[0] = (Datum) 0;
-		*categories = palloc_object(GinNullCategory);
-		(*categories)[0] = GIN_CAT_NULL_ITEM;
+		*categories_p = palloc_object(GinNullCategory);
+		(*categories_p)[0] = GIN_CAT_NULL_ITEM;
 		return entries;
 	}
 
 	/* OK, call the opclass's extractValueFn */
 	nullFlags = NULL;			/* in case extractValue doesn't set it */
+	nentries = 0;
 	entries = (Datum *)
 		DatumGetPointer(FunctionCall3Coll(&ginstate->extractValueFn[attnum - 1],
 										  ginstate->supportCollation[attnum - 1],
 										  value,
-										  PointerGetDatum(nentries),
+										  PointerGetDatum(&nentries),
 										  PointerGetDatum(&nullFlags)));
 
 	/*
 	 * Generate a placeholder if the item contained no keys.
 	 */
-	if (entries == NULL || *nentries <= 0)
+	if (entries == NULL || nentries <= 0)
 	{
-		*nentries = 1;
+		*nentries_p = 1;
 		entries = palloc_object(Datum);
 		entries[0] = (Datum) 0;
-		*categories = palloc_object(GinNullCategory);
-		(*categories)[0] = GIN_CAT_EMPTY_ITEM;
+		*categories_p = palloc_object(GinNullCategory);
+		(*categories_p)[0] = GIN_CAT_EMPTY_ITEM;
 		return entries;
 	}
 
 	/*
-	 * If the extractValueFn didn't create a nullFlags array, create one,
-	 * assuming that everything's non-null.
+	 * Scan the items for any NULLs.  All NULLs are considered equal, so we
+	 * just need to check and remember if there are any.  We remove them from
+	 * the array here, and if necessary, put back one NULL entry to represent
+	 * them all after deduplication.
 	 */
-	if (nullFlags == NULL)
-		nullFlags = (bool *) palloc0(*nentries * sizeof(bool));
+	hasNull = false;
+	if (nullFlags)
+	{
+		int32		numNonNulls = 0;
+
+		for (int32 i = 0; i < nentries; i++)
+		{
+			if (nullFlags[i])
+				hasNull = true;
+			else
+			{
+				entries[numNonNulls] = entries[i];
+				numNonNulls++;
+			}
+		}
+		nentries = numNonNulls;
+	}
 
 	/*
 	 * If there's more than one key, sort and unique-ify.
@@ -506,63 +503,47 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 	 * pretty bad too.  For small numbers of keys it'd likely be better to use
 	 * a simple insertion sort.
 	 */
-	if (*nentries > 1)
+	if (nentries > 1)
 	{
-		keyEntryData *keydata;
 		cmpEntriesArg arg;
 
-		keydata = palloc_array(keyEntryData, *nentries);
-		for (i = 0; i < *nentries; i++)
-		{
-			keydata[i].datum = entries[i];
-			keydata[i].isnull = nullFlags[i];
-		}
-
 		arg.cmpDatumFunc = &ginstate->compareFn[attnum - 1];
 		arg.collation = ginstate->supportCollation[attnum - 1];
 		arg.haveDups = false;
-		qsort_arg(keydata, *nentries, sizeof(keyEntryData),
+		qsort_arg(entries, nentries, sizeof(Datum),
 				  cmpEntries, &arg);
 
 		if (arg.haveDups)
 		{
 			/* there are duplicates, must get rid of 'em */
-			int32		j;
+			int32		j = 1;
 
-			entries[0] = keydata[0].datum;
-			nullFlags[0] = keydata[0].isnull;
-			j = 1;
-			for (i = 1; i < *nentries; i++)
+			for (int32 i = 1; i < nentries; i++)
 			{
-				if (cmpEntries(&keydata[i - 1], &keydata[i], &arg) != 0)
-				{
-					entries[j] = keydata[i].datum;
-					nullFlags[j] = keydata[i].isnull;
-					j++;
-				}
+				if (cmpEntries(&entries[i - 1], &entries[i], &arg) != 0)
+					entries[j++] = entries[i];
 			}
-			*nentries = j;
+			nentries = j;
 		}
-		else
-		{
-			/* easy, no duplicates */
-			for (i = 0; i < *nentries; i++)
-			{
-				entries[i] = keydata[i].datum;
-				nullFlags[i] = keydata[i].isnull;
-			}
-		}
-
-		pfree(keydata);
 	}
 
 	/*
-	 * Create GinNullCategory representation from nullFlags.
+	 * Create GinNullCategory representation.
 	 */
-	*categories = (GinNullCategory *) palloc0(*nentries * sizeof(GinNullCategory));
-	for (i = 0; i < *nentries; i++)
-		(*categories)[i] = (nullFlags[i] ? GIN_CAT_NULL_KEY : GIN_CAT_NORM_KEY);
+	categories = (GinNullCategory *) palloc((nentries + (hasNull ? 1 : 0)) * sizeof(GinNullCategory));
+	for (int32 i = 0; i < nentries; i++)
+		categories[i] = GIN_CAT_NORM_KEY;
+
+	/* Put back a NULL entry, if there were any */
+	if (hasNull)
+	{
+		entries[nentries] = (Datum) 0;
+		categories[nentries] = GIN_CAT_NULL_KEY;
+		nentries++;
+	}
 
+	*nentries_p = nentries;
+	*categories_p = categories;
 	return entries;
 }
 
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index e155045ce8a..1e97e6cb3c9 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -99,7 +99,7 @@ extern void GinInitPage(Page page, uint32 f, Size pageSize);
 extern void GinInitMetabuffer(Buffer b);
 extern Datum *ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
 								Datum value, bool isNull,
-								int32 *nentries, GinNullCategory **categories);
+								int32 *nentries_p, GinNullCategory **categories_p);
 
 extern OffsetNumber gintuple_get_attrnum(GinState *ginstate, IndexTuple tuple);
 extern Datum gintuple_get_key(GinState *ginstate, IndexTuple tuple,
-- 
2.47.3

test-gin-array.sqlapplication/sql; name=test-gin-array.sqlDownload

David Geier

geidav.pg@gmail.com

2 days ago

In reply to: Heikki Linnakangas (#6)

Re: Reduce build times of pg_trgm GIN indexes

On 09.01.2026 19:36, Heikki Linnakangas wrote:

- v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch
ginExtractEntries() deduplicates and sorts the entries returned from
the
extract value function. In case of pg_trgm, that is completely
redundant
because the trigrams are already deduplicated and sorted. The current
version of this patch is just to demonstrate the potential. We need to
think about what we want here. Ideally, we would require the extraction
function to provide the entries deduplicated and sorted. Alternatively,
we could indicate to ginExtractEntries() if the entries are already
deduplicated and sorted. If we don't want to alter the signature of the
extract value function, we could e.g. use the MSB of the nentries
argument.

Yeah, this seems wrong as it is. You're assuming that if the extract
function returns nullFlags == NULL, the array is already sorted and
deduped.

As said, that was just for demonstration purposes of the possible gains.
I've changed the code now such that the extractValue function of the GIN
index can indicate via the third argument uniqueAndSorted, if the
returned keys are already unique and sorted.

Unfortunately, it seems like this patch regresses performance. See
measurements below. I haven't had the time to investigate why that is.
It's pretty counter intuitive, given that this patch effectively only
removes code. Maybe you could re-test patch 0004 and share your runtimes?

Looking at 0001 and 0004 patches and ginExtractEntries(), the way
ginExtractEntries() handles NULLs looks a little inefficient. It treats
NULLs as proper entries, passing them through qsort() for deduplication
and comparing them with cmpEntries(). But the end result is always that
if the opclass's extractValueFn() function returned any NULLs, then
there's exctly one GIN_CAT_NULL_KEY as the last entry of the result
array. Surely we could be smarter about how we accomplish that. Your
0004 patch eliminates the deduplication overhead altogether, which is
great of course, but the point remains for when we still need the
deduplication.

Good observation. I like this idea. I've focused on pg_trgm but making
this code faster is certainly useful.

Given that doing the sort on pre-sorted input apparently doesn't add
measurable overhead, according to my benchmark results, we can apply
your patch and leave mine out for the moment being.

That's btw. also the reason for why 0002 doesn't show much gain: when
the data is pre-sorted, cmpEntries() is not called as much.

Attached is an attempt at that. It avoids the null-checks in
cmpEntries(), saving a few cycles. That's drowned in noise with your
test cases, but with the attached test case with int arrays, I'm seeing
a 1-2 % gain. That's not much, but I think it's still worth doing
because it makes the code a little simpler too, IMHO. (I didn't test it
together with the rest of your patches.)

Performance is death by a thousand cuts and that's definitely the case
for the GIN index code. I'm all for putting in these small improvements
because they'll add up and what's now 2% can be 10% once other
optimization are in.

I took a look at your patch. Overall looks good to me. Just a few comments:

1) You should be able to create the categories array without the need
for the subsequent for loop as follows:

StaticAssertStmt(GIN_CAT_NORM_KEY == 0, "Assuming GIN_CAT_NORM_KEY=0");
*categories = palloc0_array(GinNullCategory, (nentries + (hasNull ? 1 : 0));

2) Your test case seems sub-optimal to show the gains. The arrays don't
contain any NULL values and are also already sorted. Or I'm missing
something.

3) Could you try your patch in conjunction with 0002 on data that is not
pre-sorted and then check if 0002 has more impact? That way cmpEntries()
should be called much more often.

4) Have you checked if there are regression tests that exercise this
code? If not, how about adding some?

Perhaps we should introduce a new qunique_eq() function with a different
callback signature:

/* like qunique(), but the callback function returns true/false rather
than int */
static inline size_t
qunique_eq(void *array, size_t elements, size_t width,
bool (*equal) (const void *, const void *))

I would prefer to change qunique() instead. That would enforce using an
adequate comparison function from the get go. There are only ~15 calls
to qunique(). So refactoring this should also be a fairly small patch. I
can do that if there's agreement for that approach.

Works for me.

At quick glance, most if not all of the qunique() calls call qsort()
just before qunique(). I wonder if we should have a single "sort and
deduplicate" function, instead. It could perhaps do some deduplication
"on the go", or other optimizations.

If it's just for deduplication purposes and the data doesn't have to end
up sorted, something based on a hash map should be even faster. How
about we start with changing the qunique() comparator signature and as a
2nd step take a closer look at how to go about providing a function that
does it in one go?

If you agree, I'll share a patch here next week.

Did you measure how big is the impact from each individual patch?
Patches 1 and 2 seem pretty much ready to be committed, but I wonder if
they make any difference on their own.

Here is the impact of each patch. I ran again CREATE INDEX three times
and took the fastest run. The run of each patch includes all previous
patches as well. For example, the timings for patch 0003 were measured
with a binary that also had patch 0002 and 0001 applied. To get the
impact of each patch in isolation, the delta to the previous run was
taken.

Code                                | movies |delta | lineitem | delta
------------------------------------|--------|-------|------------------
master                              | 10,311 | 0     | 256,986 | 0
v1-0001-Inline-ginCompareAttEntries | 9,694 | 617   | 239,778 | 17,208
v1-0002-Optimized-comparison-func   | 9,510 | 184   | 238,094 | 1,684
v1-0003-Use-sort_template.h         | 8,661 | 849   | 231,190 | 6,904
v1-0004-Avoid-dedup-and-sort-in     | 9,305 | -644 | 232,472 | -1,282
v1-0005-Make-btint4cmp-branchless   | 8,240 | 1,065 | 228,387 | 4,085
v1-0006-Use-radix-sort              | 6,976 | 1,264 | 207,687 | 20,700
v1-0007-Faster-qunique-comparator   | 5,911 | 1,065 | 203,744 | 3,943
v1-0008-Add-ASCII-fastpath          | 3,409 | 2,502 | 161,469 | 42,275

Attached is v2 of the patch set with the aforementioned changes. I've
also fixed the white space errors in 0003, 0004 and 0008, as reported by
Kirill.

Thanks, I pushed patch 0001 now, that's a simple and clear win.

Great. Thanks.

What about the other patches? 0003 and 0007 are also pretty simple and
IMHO uncontroversial while giving decent savings.

For 0006 I would make the code also work for char being unsigned. That's
still missing.

Any thoughts about 0008 and my findings regarding the to lower-case
conversion for ASCII? Adding to pg_locale_struct if it's save to use
ASCII-style tolower should be straight forward and then the code should
be correct.

--
David Geier