[PATCH] Refactor bytea_sortsupport(), take two

Started by Aleksander Alekseev7 months ago15 messages

aleksander@timescale.com

7 months ago

1 attachment(s)

Hi,

This is a follow-up to b45242fd30ff [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b45242fd30ff. Previously we separated
varlena.c into varlena.c and bytea.c. This patch makes
bytea_sortsupport() independent from varlena.c code as it was proposed
before [2]/messages/by-id/1502394.1725398354@sss.pgh.pa.us[3]/messages/by-id/CAJ7c6TO3X88dGd8C4Tb-Eq2ZDPz+9mP+KOwdzK_82BEz_cMPZg@mail.gmail.com. The benefits of this change are summarized in the
commit message that I included to the patch.

As always, your feedback is most appreciated.

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b45242fd30ff
[2]: /messages/by-id/1502394.1725398354@sss.pgh.pa.us
[3]: /messages/by-id/CAJ7c6TO3X88dGd8C4Tb-Eq2ZDPz+9mP+KOwdzK_82BEz_cMPZg@mail.gmail.com

--
Best regards,
Aleksander Alekseev

Attachments:

v1-0001-Refactor-bytea_sortsupport.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Refactor-bytea_sortsupport.patchDownload

From 768fefc8d0ca29787d55c9c1163d736b662cdcd1 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 2 Jul 2025 13:49:37 +0300
Subject: [PATCH v1] Refactor bytea_sortsupport()

Previously bytea_sortsupport() was implemented as a wrapper function for
varstr_sortsupport(). There were several problems with this.

Firstly, this was error-prone. Changing logic for string types could affect
bytea logic and vice versa.

Secondly, the bytea-related part of the code was difficult to reason about.
It was mixed with the logic for other types.

Lastly, the performance and memory consumption could be optimized a little for
the bytea case.

Use independent bytea_sortsupport() implementation for the named reasons.

Aleksander Alekseev, reviewed by TODO FIXME
Discussion: TODO FIXME
---
 src/backend/utils/adt/bytea.c   | 208 +++++++++++++++++++++++++++++++-
 src/backend/utils/adt/varlena.c |  27 ++---
 2 files changed, 213 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index 2e539c2504e..24f836976f4 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -15,18 +15,19 @@
 #include "postgres.h"
 
 #include "access/detoast.h"
-#include "catalog/pg_collation_d.h"
-#include "catalog/pg_type_d.h"
+#include "common/hashfn.h"
 #include "common/int.h"
 #include "fmgr.h"
+#include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bitutils.h"
+#include "port/pg_bswap.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/fmgrprotos.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
-#include "utils/varlena.h"
 #include "varatt.h"
 
 /* GUC variable */
@@ -37,6 +38,19 @@ static bytea *bytea_substring(Datum str, int S, int L,
 							  bool length_not_specified);
 static bytea *bytea_overlay(bytea *t1, bytea *t2, int sp, int sl);
 
+typedef struct
+{
+	bool		abbreviate;		/* Should we abbreviate keys? */
+	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
+	hyperLogLogState full_card; /* Full key cardinality state */
+	double		prop_card;		/* Required cardinality proportion */
+}			ByteaSortSupport;
+
+/* Static function declarations for sort support */
+static int	byteafastcmp(Datum x, Datum y, SortSupport ssup);
+static Datum bytea_abbrev_convert(Datum original, SortSupport ssup);
+static bool bytea_abbrev_abort(int memtupcount, SortSupport ssup);
+
 /*
  * bytea_catenate
  *	Guts of byteacat(), broken out so it can be used by other functions
@@ -1030,16 +1044,200 @@ bytea_smaller(PG_FUNCTION_ARGS)
 	PG_RETURN_BYTEA_P(result);
 }
 
+/*
+ * sortsupport comparison func
+ */
+static int
+byteafastcmp(Datum x, Datum y, SortSupport ssup)
+{
+	bytea	   *arg1 = DatumGetByteaPP(x);
+	bytea	   *arg2 = DatumGetByteaPP(y);
+	char	   *a1p,
+			   *a2p;
+	int			len1,
+				len2,
+				result;
+
+	a1p = VARDATA_ANY(arg1);
+	a2p = VARDATA_ANY(arg2);
+
+	len1 = VARSIZE_ANY_EXHDR(arg1);
+	len2 = VARSIZE_ANY_EXHDR(arg2);
+
+	result = memcmp(a1p, a2p, Min(len1, len2));
+	if ((result == 0) && (len1 != len2))
+		result = (len1 < len2) ? -1 : 1;
+
+	/* We can't afford to leak memory here. */
+	if (PointerGetDatum(arg1) != x)
+		pfree(arg1);
+	if (PointerGetDatum(arg2) != y)
+		pfree(arg2);
+
+	return result;
+}
+
+/*
+ * Conversion routine for sortsupport.
+ * This is basically a simplified version of varstr_abbrev_convert().
+ */
+static Datum
+bytea_abbrev_convert(Datum original, SortSupport ssup)
+{
+	const size_t max_prefix_bytes = sizeof(Datum);
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	bytea	   *authoritative = DatumGetByteaPP(original);
+	char	   *authoritative_data = VARDATA_ANY(authoritative);
+	Datum		res;
+	char	   *pres;
+	int			len;
+	uint32		hash;
+
+	pres = (char *) &res;
+
+	/* memset(), so any non-overwritten bytes are NUL */
+	memset(pres, 0, max_prefix_bytes);
+	len = VARSIZE_ANY_EXHDR(authoritative);
+	memcpy(pres, authoritative_data, Min(len, max_prefix_bytes));
+
+	/*
+	 * Maintain approximate cardinality of both abbreviated keys and original
+	 * keys using HyperLogLog.
+	 */
+	hash = DatumGetUInt32(hash_any((unsigned char *) authoritative_data,
+								   Min(len, PG_CACHE_LINE_SIZE)));
+
+	if (len > PG_CACHE_LINE_SIZE)
+		hash ^= DatumGetUInt32(hash_uint32((uint32) len));
+
+	addHyperLogLog(&bss->full_card, hash);
+
+	/* Hash abbreviated key */
+#if SIZEOF_DATUM == 8
+	{
+		uint32		lohalf,
+					hihalf;
+
+		lohalf = (uint32) res;
+		hihalf = (uint32) (res >> 32);
+		hash = DatumGetUInt32(hash_uint32(lohalf ^ hihalf));
+	}
+#else							/* SIZEOF_DATUM != 8 */
+	hash = DatumGetUInt32(hash_uint32((uint32) res));
+#endif
+
+	addHyperLogLog(&bss->abbr_card, hash);
+
+	/*
+	 * Byteswap on little-endian machines.
+	 *
+	 * This is needed so that ssup_datum_unsigned_cmp() works correctly on all
+	 * platforms.
+	 */
+	res = DatumBigEndianToNative(res);
+
+	/* Don't leak memory here */
+	if (PointerGetDatum(authoritative) != original)
+		pfree(authoritative);
+
+	return res;
+}
+
+/*
+ * Callback for estimating effectiveness of abbreviated key optimization, using
+ * heuristic rules.  Returns value indicating if the abbreviation optimization
+ * should be aborted, based on its projected effectiveness.
+ */
+static bool
+bytea_abbrev_abort(int memtupcount, SortSupport ssup)
+{
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	double		abbrev_distinct,
+				key_distinct;
+
+	Assert(ssup->abbreviate);
+
+	/* Have a little patience */
+	if (memtupcount < 100)
+		return false;
+
+	abbrev_distinct = estimateHyperLogLog(&bss->abbr_card);
+	key_distinct = estimateHyperLogLog(&bss->full_card);
+
+	/*
+	 * Clamp cardinality estimates to at least one distinct value.
+	 */
+	if (abbrev_distinct <= 1.0)
+		abbrev_distinct = 1.0;
+
+	if (key_distinct <= 1.0)
+		key_distinct = 1.0;
+
+	if (trace_sort)
+	{
+		double		norm_abbrev_card = abbrev_distinct / (double) memtupcount;
+
+		elog(LOG, "bytea_abbrev: abbrev_distinct after %d: %f "
+			 "(key_distinct: %f, norm_abbrev_card: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, norm_abbrev_card,
+			 bss->prop_card);
+	}
+
+	/*
+	 * If the number of distinct abbreviated keys approximately matches the
+	 * number of distinct original keys, continue with abbreviation.
+	 */
+	if (abbrev_distinct > key_distinct * bss->prop_card)
+	{
+		/*
+		 * Decay required cardinality aggressively after 10,000 tuples.
+		 */
+		if (memtupcount > 10000)
+			bss->prop_card *= 0.65;
+
+		return false;
+	}
+
+	/*
+	 * Abort abbreviation strategy.
+	 */
+	if (trace_sort)
+		elog(LOG, "bytea_abbrev: aborted abbreviation at %d "
+			 "(abbrev_distinct: %f, key_distinct: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, bss->prop_card);
+
+	return true;
+}
+
 Datum
 bytea_sortsupport(PG_FUNCTION_ARGS)
 {
 	SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
 	MemoryContext oldcontext;
+	ByteaSortSupport *bss;
+	bool		abbreviate = ssup->abbreviate;
 
 	oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
 
-	/* Use generic string SortSupport, forcing "C" collation */
-	varstr_sortsupport(ssup, BYTEAOID, C_COLLATION_OID);
+	ssup->comparator = byteafastcmp;
+
+	/*
+	 * Set up abbreviation support if requested.
+	 */
+	if (abbreviate)
+	{
+		bss = palloc(sizeof(ByteaSortSupport));
+		bss->abbreviate = true;
+		bss->prop_card = 0.20;
+		initHyperLogLog(&bss->abbr_card, 10);
+		initHyperLogLog(&bss->full_card, 10);
+
+		ssup->ssup_extra = bss;
+		ssup->abbrev_full_comparator = ssup->comparator;
+		ssup->comparator = ssup_datum_unsigned_cmp;
+		ssup->abbrev_converter = bytea_abbrev_convert;
+		ssup->abbrev_abort = bytea_abbrev_abort;
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index ffae8c23abf..39cbc72f126 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -92,7 +92,7 @@ typedef struct
 	int			last_returned;	/* Last comparison result (cache) */
 	bool		cache_blob;		/* Does buf2 contain strxfrm() blob, etc? */
 	bool		collate_c;
-	Oid			typid;			/* Actual datatype (text/bpchar/bytea/name) */
+	Oid			typid;			/* Actual datatype (text/bpchar/name) */
 	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
 	hyperLogLogState full_card; /* Full key cardinality state */
 	double		prop_card;		/* Required cardinality proportion */
@@ -1607,10 +1607,9 @@ bttextsortsupport(PG_FUNCTION_ARGS)
  * Includes locale support, and support for BpChar semantics (i.e. removing
  * trailing spaces before comparison).
  *
- * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
- * same representation.  Callers that always use the C collation (e.g.
- * non-collatable type callers like bytea) may have NUL bytes in their strings;
- * this will not work with any other collation, though.
+ * Relies on the assumption that text, VarChar, and BpChar all have the
+ * same representation. Callers that use the C collation may have NUL bytes
+ * in their strings; this will not work with any other collation, though.
  */
 void
 varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
@@ -1974,7 +1973,7 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
  * representation.  Our encoding strategy is simple -- pack the first 8 bytes
  * of a strxfrm() blob into a Datum (on little-endian machines, the 8 bytes are
  * stored in reverse order), and treat it as an unsigned integer.  When the "C"
- * locale is used, or in case of bytea, just memcpy() from original instead.
+ * locale is used just memcpy() from original instead.
  */
 static Datum
 varstr_abbrev_convert(Datum original, SortSupport ssup)
@@ -2000,15 +1999,12 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		len = bpchartruelen(authoritative_data, len);
 
 	/*
-	 * If we're using the C collation, use memcpy(), rather than strxfrm(), to
-	 * abbreviate keys.  The full comparator for the C locale is always
-	 * memcmp().  It would be incorrect to allow bytea callers (callers that
-	 * always force the C collation -- bytea isn't a collatable type, but this
-	 * approach is convenient) to use strxfrm().  This is because bytea
-	 * strings may contain NUL bytes.  Besides, this should be faster, too.
+	 * If we're using the C collation, use memcpy(), rather than strxfrm(),
+	 * to abbreviate keys.  The full comparator for the C locale is also
+	 * memcmp().  This should be faster than strxfrm().
 	 *
-	 * More generally, it's okay that bytea callers can have NUL bytes in
-	 * strings because abbreviated cmp need not make a distinction between
+	 * Generally speaking, it's okay that C locale callers can have NUL bytes
+	 * in strings because abbreviated cmp need not make a distinction between
 	 * terminating NUL bytes, and NUL bytes representing actual NULs in the
 	 * authoritative representation.  Hopefully a comparison at or past one
 	 * abbreviated key's terminating NUL byte will resolve the comparison
@@ -2106,9 +2102,6 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		 * strxfrm() blob is itself NUL terminated, leaving no danger of
 		 * misinterpreting any NUL bytes not intended to be interpreted as
 		 * logically representing termination.
-		 *
-		 * (Actually, even if there were NUL bytes in the blob it would be
-		 * okay.  See remarks on bytea case above.)
 		 */
 		memcpy(pres, sss->buf2, Min(max_prefix_bytes, bsize));
 	}
-- 
2.43.0

Aleksander Alekseev

aleksander@timescale.com

6 months ago

In reply to: Aleksander Alekseev (#1)

1 attachment(s)

Re: [PATCH] Refactor bytea_sortsupport(), take two

Hi,

This is a follow-up to b45242fd30ff [1]. Previously we separated
varlena.c into varlena.c and bytea.c. This patch makes
bytea_sortsupport() independent from varlena.c code as it was proposed
before [2][3]. The benefits of this change are summarized in the
commit message that I included to the patch.

As always, your feedback is most appreciated.

cfbot indicates that v1 needs a rebase. Here is v2.

Attachments:

v2-0001-Refactor-bytea_sortsupport.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Refactor-bytea_sortsupport.patchDownload

From 5ae3320497d27045b1f64be5508349e0afcc4e06 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 2 Jul 2025 13:49:37 +0300
Subject: [PATCH v2] Refactor bytea_sortsupport()

Previously bytea_sortsupport() was implemented as a wrapper function for
varstr_sortsupport(). There were several problems with this.

Firstly, this was error-prone. Changing logic for string types could affect
bytea logic and vice versa.

Secondly, the bytea-related part of the code was difficult to reason about.
It was mixed with the logic for other types.

Lastly, the performance and memory consumption could be optimized a little for
the bytea case.

Use independent bytea_sortsupport() implementation for the named reasons.

Aleksander Alekseev, reviewed by TODO FIXME
Discussion: TODO FIXME
---
 src/backend/utils/adt/bytea.c   | 208 +++++++++++++++++++++++++++++++-
 src/backend/utils/adt/varlena.c |  27 ++---
 2 files changed, 213 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index 6e7b914c563..f509f94074c 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -15,18 +15,19 @@
 #include "postgres.h"
 
 #include "access/detoast.h"
-#include "catalog/pg_collation_d.h"
-#include "catalog/pg_type_d.h"
+#include "common/hashfn.h"
 #include "common/int.h"
 #include "fmgr.h"
+#include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bitutils.h"
+#include "port/pg_bswap.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/fmgrprotos.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
-#include "utils/varlena.h"
 #include "varatt.h"
 
 /* GUC variable */
@@ -37,6 +38,19 @@ static bytea *bytea_substring(Datum str, int S, int L,
 							  bool length_not_specified);
 static bytea *bytea_overlay(bytea *t1, bytea *t2, int sp, int sl);
 
+typedef struct
+{
+	bool		abbreviate;		/* Should we abbreviate keys? */
+	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
+	hyperLogLogState full_card; /* Full key cardinality state */
+	double		prop_card;		/* Required cardinality proportion */
+}			ByteaSortSupport;
+
+/* Static function declarations for sort support */
+static int	byteafastcmp(Datum x, Datum y, SortSupport ssup);
+static Datum bytea_abbrev_convert(Datum original, SortSupport ssup);
+static bool bytea_abbrev_abort(int memtupcount, SortSupport ssup);
+
 /*
  * bytea_catenate
  *	Guts of byteacat(), broken out so it can be used by other functions
@@ -1001,16 +1015,200 @@ bytea_smaller(PG_FUNCTION_ARGS)
 	PG_RETURN_BYTEA_P(result);
 }
 
+/*
+ * sortsupport comparison func
+ */
+static int
+byteafastcmp(Datum x, Datum y, SortSupport ssup)
+{
+	bytea	   *arg1 = DatumGetByteaPP(x);
+	bytea	   *arg2 = DatumGetByteaPP(y);
+	char	   *a1p,
+			   *a2p;
+	int			len1,
+				len2,
+				result;
+
+	a1p = VARDATA_ANY(arg1);
+	a2p = VARDATA_ANY(arg2);
+
+	len1 = VARSIZE_ANY_EXHDR(arg1);
+	len2 = VARSIZE_ANY_EXHDR(arg2);
+
+	result = memcmp(a1p, a2p, Min(len1, len2));
+	if ((result == 0) && (len1 != len2))
+		result = (len1 < len2) ? -1 : 1;
+
+	/* We can't afford to leak memory here. */
+	if (PointerGetDatum(arg1) != x)
+		pfree(arg1);
+	if (PointerGetDatum(arg2) != y)
+		pfree(arg2);
+
+	return result;
+}
+
+/*
+ * Conversion routine for sortsupport.
+ * This is basically a simplified version of varstr_abbrev_convert().
+ */
+static Datum
+bytea_abbrev_convert(Datum original, SortSupport ssup)
+{
+	const size_t max_prefix_bytes = sizeof(Datum);
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	bytea	   *authoritative = DatumGetByteaPP(original);
+	char	   *authoritative_data = VARDATA_ANY(authoritative);
+	Datum		res;
+	char	   *pres;
+	int			len;
+	uint32		hash;
+
+	pres = (char *) &res;
+
+	/* memset(), so any non-overwritten bytes are NUL */
+	memset(pres, 0, max_prefix_bytes);
+	len = VARSIZE_ANY_EXHDR(authoritative);
+	memcpy(pres, authoritative_data, Min(len, max_prefix_bytes));
+
+	/*
+	 * Maintain approximate cardinality of both abbreviated keys and original
+	 * keys using HyperLogLog.
+	 */
+	hash = DatumGetUInt32(hash_any((unsigned char *) authoritative_data,
+								   Min(len, PG_CACHE_LINE_SIZE)));
+
+	if (len > PG_CACHE_LINE_SIZE)
+		hash ^= DatumGetUInt32(hash_uint32((uint32) len));
+
+	addHyperLogLog(&bss->full_card, hash);
+
+	/* Hash abbreviated key */
+#if SIZEOF_DATUM == 8
+	{
+		uint32		lohalf,
+					hihalf;
+
+		lohalf = (uint32) res;
+		hihalf = (uint32) (res >> 32);
+		hash = DatumGetUInt32(hash_uint32(lohalf ^ hihalf));
+	}
+#else							/* SIZEOF_DATUM != 8 */
+	hash = DatumGetUInt32(hash_uint32((uint32) res));
+#endif
+
+	addHyperLogLog(&bss->abbr_card, hash);
+
+	/*
+	 * Byteswap on little-endian machines.
+	 *
+	 * This is needed so that ssup_datum_unsigned_cmp() works correctly on all
+	 * platforms.
+	 */
+	res = DatumBigEndianToNative(res);
+
+	/* Don't leak memory here */
+	if (PointerGetDatum(authoritative) != original)
+		pfree(authoritative);
+
+	return res;
+}
+
+/*
+ * Callback for estimating effectiveness of abbreviated key optimization, using
+ * heuristic rules.  Returns value indicating if the abbreviation optimization
+ * should be aborted, based on its projected effectiveness.
+ */
+static bool
+bytea_abbrev_abort(int memtupcount, SortSupport ssup)
+{
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	double		abbrev_distinct,
+				key_distinct;
+
+	Assert(ssup->abbreviate);
+
+	/* Have a little patience */
+	if (memtupcount < 100)
+		return false;
+
+	abbrev_distinct = estimateHyperLogLog(&bss->abbr_card);
+	key_distinct = estimateHyperLogLog(&bss->full_card);
+
+	/*
+	 * Clamp cardinality estimates to at least one distinct value.
+	 */
+	if (abbrev_distinct <= 1.0)
+		abbrev_distinct = 1.0;
+
+	if (key_distinct <= 1.0)
+		key_distinct = 1.0;
+
+	if (trace_sort)
+	{
+		double		norm_abbrev_card = abbrev_distinct / (double) memtupcount;
+
+		elog(LOG, "bytea_abbrev: abbrev_distinct after %d: %f "
+			 "(key_distinct: %f, norm_abbrev_card: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, norm_abbrev_card,
+			 bss->prop_card);
+	}
+
+	/*
+	 * If the number of distinct abbreviated keys approximately matches the
+	 * number of distinct original keys, continue with abbreviation.
+	 */
+	if (abbrev_distinct > key_distinct * bss->prop_card)
+	{
+		/*
+		 * Decay required cardinality aggressively after 10,000 tuples.
+		 */
+		if (memtupcount > 10000)
+			bss->prop_card *= 0.65;
+
+		return false;
+	}
+
+	/*
+	 * Abort abbreviation strategy.
+	 */
+	if (trace_sort)
+		elog(LOG, "bytea_abbrev: aborted abbreviation at %d "
+			 "(abbrev_distinct: %f, key_distinct: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, bss->prop_card);
+
+	return true;
+}
+
 Datum
 bytea_sortsupport(PG_FUNCTION_ARGS)
 {
 	SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
 	MemoryContext oldcontext;
+	ByteaSortSupport *bss;
+	bool		abbreviate = ssup->abbreviate;
 
 	oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
 
-	/* Use generic string SortSupport, forcing "C" collation */
-	varstr_sortsupport(ssup, BYTEAOID, C_COLLATION_OID);
+	ssup->comparator = byteafastcmp;
+
+	/*
+	 * Set up abbreviation support if requested.
+	 */
+	if (abbreviate)
+	{
+		bss = palloc(sizeof(ByteaSortSupport));
+		bss->abbreviate = true;
+		bss->prop_card = 0.20;
+		initHyperLogLog(&bss->abbr_card, 10);
+		initHyperLogLog(&bss->full_card, 10);
+
+		ssup->ssup_extra = bss;
+		ssup->abbrev_full_comparator = ssup->comparator;
+		ssup->comparator = ssup_datum_unsigned_cmp;
+		ssup->abbrev_converter = bytea_abbrev_convert;
+		ssup->abbrev_abort = bytea_abbrev_abort;
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index ffae8c23abf..39cbc72f126 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -92,7 +92,7 @@ typedef struct
 	int			last_returned;	/* Last comparison result (cache) */
 	bool		cache_blob;		/* Does buf2 contain strxfrm() blob, etc? */
 	bool		collate_c;
-	Oid			typid;			/* Actual datatype (text/bpchar/bytea/name) */
+	Oid			typid;			/* Actual datatype (text/bpchar/name) */
 	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
 	hyperLogLogState full_card; /* Full key cardinality state */
 	double		prop_card;		/* Required cardinality proportion */
@@ -1607,10 +1607,9 @@ bttextsortsupport(PG_FUNCTION_ARGS)
  * Includes locale support, and support for BpChar semantics (i.e. removing
  * trailing spaces before comparison).
  *
- * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
- * same representation.  Callers that always use the C collation (e.g.
- * non-collatable type callers like bytea) may have NUL bytes in their strings;
- * this will not work with any other collation, though.
+ * Relies on the assumption that text, VarChar, and BpChar all have the
+ * same representation. Callers that use the C collation may have NUL bytes
+ * in their strings; this will not work with any other collation, though.
  */
 void
 varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
@@ -1974,7 +1973,7 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
  * representation.  Our encoding strategy is simple -- pack the first 8 bytes
  * of a strxfrm() blob into a Datum (on little-endian machines, the 8 bytes are
  * stored in reverse order), and treat it as an unsigned integer.  When the "C"
- * locale is used, or in case of bytea, just memcpy() from original instead.
+ * locale is used just memcpy() from original instead.
  */
 static Datum
 varstr_abbrev_convert(Datum original, SortSupport ssup)
@@ -2000,15 +1999,12 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		len = bpchartruelen(authoritative_data, len);
 
 	/*
-	 * If we're using the C collation, use memcpy(), rather than strxfrm(), to
-	 * abbreviate keys.  The full comparator for the C locale is always
-	 * memcmp().  It would be incorrect to allow bytea callers (callers that
-	 * always force the C collation -- bytea isn't a collatable type, but this
-	 * approach is convenient) to use strxfrm().  This is because bytea
-	 * strings may contain NUL bytes.  Besides, this should be faster, too.
+	 * If we're using the C collation, use memcpy(), rather than strxfrm(),
+	 * to abbreviate keys.  The full comparator for the C locale is also
+	 * memcmp().  This should be faster than strxfrm().
 	 *
-	 * More generally, it's okay that bytea callers can have NUL bytes in
-	 * strings because abbreviated cmp need not make a distinction between
+	 * Generally speaking, it's okay that C locale callers can have NUL bytes
+	 * in strings because abbreviated cmp need not make a distinction between
 	 * terminating NUL bytes, and NUL bytes representing actual NULs in the
 	 * authoritative representation.  Hopefully a comparison at or past one
 	 * abbreviated key's terminating NUL byte will resolve the comparison
@@ -2106,9 +2102,6 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		 * strxfrm() blob is itself NUL terminated, leaving no danger of
 		 * misinterpreting any NUL bytes not intended to be interpreted as
 		 * logically representing termination.
-		 *
-		 * (Actually, even if there were NUL bytes in the blob it would be
-		 * okay.  See remarks on bytea case above.)
 		 */
 		memcpy(pres, sss->buf2, Min(max_prefix_bytes, bsize));
 	}
-- 
2.43.0

John Naylor

john.naylor@enterprisedb.com

6 months ago

In reply to: Aleksander Alekseev (#2)

Re: [PATCH] Refactor bytea_sortsupport(), take two

On Thu, Aug 7, 2025 at 6:44 PM Aleksander Alekseev
<aleksander@tigerdata.com> wrote:

Hi,

This is a follow-up to b45242fd30ff [1]. Previously we separated
varlena.c into varlena.c and bytea.c. This patch makes
bytea_sortsupport() independent from varlena.c code as it was proposed
before [2][3]. The benefits of this change are summarized in the
commit message that I included to the patch.

As always, your feedback is most appreciated.

cfbot indicates that v1 needs a rebase. Here is v2.

- * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
- * same representation.  Callers that always use the C collation (e.g.
- * non-collatable type callers like bytea) may have NUL bytes in their strings;
- * this will not work with any other collation, though.
+ * Relies on the assumption that text, VarChar, and BpChar all have the
+ * same representation. Callers that use the C collation may have NUL bytes
+ * in their strings; this will not work with any other collation, though.

- * More generally, it's okay that bytea callers can have NUL bytes in
- * strings because abbreviated cmp need not make a distinction between
+ * Generally speaking, it's okay that C locale callers can have NUL bytes
+ * in strings because abbreviated cmp need not make a distinction between

Don't these types disallow NUL bytes regardless of locale / character set?

--
John Naylor
Amazon Web Services

Aleksander Alekseev

aleksander@timescale.com

4 months ago

In reply to: John Naylor (#3)

1 attachment(s)

Re: [PATCH] Refactor bytea_sortsupport(), take two

Hi John,

Thanks for the review.

Don't these types disallow NUL bytes regardless of locale / character set?

You are right, they do. Here is the patch v3 with corrected comments.

--
Best regards,
Aleksander Alekseev

Attachments:

v3-0001-Refactor-bytea_sortsupport.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Refactor-bytea_sortsupport.patchDownload

From f722f62073c232b9b928afd2797317a91369fc63 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 2 Jul 2025 13:49:37 +0300
Subject: [PATCH v3] Refactor bytea_sortsupport()

Previously bytea_sortsupport() was implemented as a wrapper function for
varstr_sortsupport(). There were several problems with this.

Firstly, this was error-prone. Changing logic for string types could affect
bytea logic and vice versa.

Secondly, the bytea-related part of the code was difficult to reason about.
It was mixed with the logic for other types.

Lastly, the performance and memory consumption could be optimized a little for
the bytea case.

Use independent bytea_sortsupport() implementation for the named reasons.

Aleksander Alekseev, reviewed by John Naylor
Discussion: https://postgr.es/m/CAJ7c6TP1bAbEhUJa6+rgceN6QJWMSsxhg1=mqfSN=Nb-n6DAKg@mail.gmail.com
---
 src/backend/utils/adt/bytea.c   | 207 +++++++++++++++++++++++++++++++-
 src/backend/utils/adt/varlena.c |  42 ++-----
 2 files changed, 211 insertions(+), 38 deletions(-)

diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index 6e7b914c563..187d5ba2814 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -15,18 +15,19 @@
 #include "postgres.h"
 
 #include "access/detoast.h"
-#include "catalog/pg_collation_d.h"
-#include "catalog/pg_type_d.h"
+#include "common/hashfn.h"
 #include "common/int.h"
 #include "fmgr.h"
+#include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bitutils.h"
+#include "port/pg_bswap.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/fmgrprotos.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
-#include "utils/varlena.h"
 #include "varatt.h"
 
 /* GUC variable */
@@ -37,6 +38,19 @@ static bytea *bytea_substring(Datum str, int S, int L,
 							  bool length_not_specified);
 static bytea *bytea_overlay(bytea *t1, bytea *t2, int sp, int sl);
 
+typedef struct
+{
+	bool		abbreviate;		/* Should we abbreviate keys? */
+	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
+	hyperLogLogState full_card; /* Full key cardinality state */
+	double		prop_card;		/* Required cardinality proportion */
+}			ByteaSortSupport;
+
+/* Static function declarations for sort support */
+static int	byteafastcmp(Datum x, Datum y, SortSupport ssup);
+static Datum bytea_abbrev_convert(Datum original, SortSupport ssup);
+static bool bytea_abbrev_abort(int memtupcount, SortSupport ssup);
+
 /*
  * bytea_catenate
  *	Guts of byteacat(), broken out so it can be used by other functions
@@ -1001,16 +1015,199 @@ bytea_smaller(PG_FUNCTION_ARGS)
 	PG_RETURN_BYTEA_P(result);
 }
 
+/*
+ * sortsupport comparison func
+ */
+static int
+byteafastcmp(Datum x, Datum y, SortSupport ssup)
+{
+	bytea	   *arg1 = DatumGetByteaPP(x);
+	bytea	   *arg2 = DatumGetByteaPP(y);
+	char	   *a1p,
+			   *a2p;
+	int			len1,
+				len2,
+				result;
+
+	a1p = VARDATA_ANY(arg1);
+	a2p = VARDATA_ANY(arg2);
+
+	len1 = VARSIZE_ANY_EXHDR(arg1);
+	len2 = VARSIZE_ANY_EXHDR(arg2);
+
+	result = memcmp(a1p, a2p, Min(len1, len2));
+	if ((result == 0) && (len1 != len2))
+		result = (len1 < len2) ? -1 : 1;
+
+	/* We can't afford to leak memory here. */
+	if (PointerGetDatum(arg1) != x)
+		pfree(arg1);
+	if (PointerGetDatum(arg2) != y)
+		pfree(arg2);
+
+	return result;
+}
+
+/*
+ * Conversion routine for sortsupport.
+ */
+static Datum
+bytea_abbrev_convert(Datum original, SortSupport ssup)
+{
+	const size_t max_prefix_bytes = sizeof(Datum);
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	bytea	   *authoritative = DatumGetByteaPP(original);
+	char	   *authoritative_data = VARDATA_ANY(authoritative);
+	Datum		res;
+	char	   *pres;
+	int			len;
+	uint32		hash;
+
+	pres = (char *) &res;
+
+	/* memset(), so any non-overwritten bytes are NUL */
+	memset(pres, 0, max_prefix_bytes);
+	len = VARSIZE_ANY_EXHDR(authoritative);
+	memcpy(pres, authoritative_data, Min(len, max_prefix_bytes));
+
+	/*
+	 * Maintain approximate cardinality of both abbreviated keys and original
+	 * keys using HyperLogLog.
+	 */
+	hash = DatumGetUInt32(hash_any((unsigned char *) authoritative_data,
+								   Min(len, PG_CACHE_LINE_SIZE)));
+
+	if (len > PG_CACHE_LINE_SIZE)
+		hash ^= DatumGetUInt32(hash_uint32((uint32) len));
+
+	addHyperLogLog(&bss->full_card, hash);
+
+	/* Hash abbreviated key */
+#if SIZEOF_DATUM == 8
+	{
+		uint32		lohalf,
+					hihalf;
+
+		lohalf = (uint32) res;
+		hihalf = (uint32) (res >> 32);
+		hash = DatumGetUInt32(hash_uint32(lohalf ^ hihalf));
+	}
+#else							/* SIZEOF_DATUM != 8 */
+	hash = DatumGetUInt32(hash_uint32((uint32) res));
+#endif
+
+	addHyperLogLog(&bss->abbr_card, hash);
+
+	/*
+	 * Byteswap on little-endian machines.
+	 *
+	 * This is needed so that ssup_datum_unsigned_cmp() works correctly on all
+	 * platforms.
+	 */
+	res = DatumBigEndianToNative(res);
+
+	/* Don't leak memory here */
+	if (PointerGetDatum(authoritative) != original)
+		pfree(authoritative);
+
+	return res;
+}
+
+/*
+ * Callback for estimating effectiveness of abbreviated key optimization, using
+ * heuristic rules.  Returns value indicating if the abbreviation optimization
+ * should be aborted, based on its projected effectiveness.
+ */
+static bool
+bytea_abbrev_abort(int memtupcount, SortSupport ssup)
+{
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	double		abbrev_distinct,
+				key_distinct;
+
+	Assert(ssup->abbreviate);
+
+	/* Have a little patience */
+	if (memtupcount < 100)
+		return false;
+
+	abbrev_distinct = estimateHyperLogLog(&bss->abbr_card);
+	key_distinct = estimateHyperLogLog(&bss->full_card);
+
+	/*
+	 * Clamp cardinality estimates to at least one distinct value.
+	 */
+	if (abbrev_distinct <= 1.0)
+		abbrev_distinct = 1.0;
+
+	if (key_distinct <= 1.0)
+		key_distinct = 1.0;
+
+	if (trace_sort)
+	{
+		double		norm_abbrev_card = abbrev_distinct / (double) memtupcount;
+
+		elog(LOG, "bytea_abbrev: abbrev_distinct after %d: %f "
+			 "(key_distinct: %f, norm_abbrev_card: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, norm_abbrev_card,
+			 bss->prop_card);
+	}
+
+	/*
+	 * If the number of distinct abbreviated keys approximately matches the
+	 * number of distinct original keys, continue with abbreviation.
+	 */
+	if (abbrev_distinct > key_distinct * bss->prop_card)
+	{
+		/*
+		 * Decay required cardinality aggressively after 10,000 tuples.
+		 */
+		if (memtupcount > 10000)
+			bss->prop_card *= 0.65;
+
+		return false;
+	}
+
+	/*
+	 * Abort abbreviation strategy.
+	 */
+	if (trace_sort)
+		elog(LOG, "bytea_abbrev: aborted abbreviation at %d "
+			 "(abbrev_distinct: %f, key_distinct: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, bss->prop_card);
+
+	return true;
+}
+
 Datum
 bytea_sortsupport(PG_FUNCTION_ARGS)
 {
 	SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
 	MemoryContext oldcontext;
+	ByteaSortSupport *bss;
+	bool		abbreviate = ssup->abbreviate;
 
 	oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
 
-	/* Use generic string SortSupport, forcing "C" collation */
-	varstr_sortsupport(ssup, BYTEAOID, C_COLLATION_OID);
+	ssup->comparator = byteafastcmp;
+
+	/*
+	 * Set up abbreviation support if requested.
+	 */
+	if (abbreviate)
+	{
+		bss = palloc(sizeof(ByteaSortSupport));
+		bss->abbreviate = true;
+		bss->prop_card = 0.20;
+		initHyperLogLog(&bss->abbr_card, 10);
+		initHyperLogLog(&bss->full_card, 10);
+
+		ssup->ssup_extra = bss;
+		ssup->abbrev_full_comparator = ssup->comparator;
+		ssup->comparator = ssup_datum_unsigned_cmp;
+		ssup->abbrev_converter = bytea_abbrev_convert;
+		ssup->abbrev_abort = bytea_abbrev_abort;
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 2c398cd9e5c..0d182db1790 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -92,7 +92,7 @@ typedef struct
 	int			last_returned;	/* Last comparison result (cache) */
 	bool		cache_blob;		/* Does buf2 contain strxfrm() blob, etc? */
 	bool		collate_c;
-	Oid			typid;			/* Actual datatype (text/bpchar/bytea/name) */
+	Oid			typid;			/* Actual datatype (text/bpchar/name) */
 	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
 	hyperLogLogState full_card; /* Full key cardinality state */
 	double		prop_card;		/* Required cardinality proportion */
@@ -1606,10 +1606,8 @@ bttextsortsupport(PG_FUNCTION_ARGS)
  * Includes locale support, and support for BpChar semantics (i.e. removing
  * trailing spaces before comparison).
  *
- * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
- * same representation.  Callers that always use the C collation (e.g.
- * non-collatable type callers like bytea) may have NUL bytes in their strings;
- * this will not work with any other collation, though.
+ * Relies on the assumption that text, VarChar, and BpChar all have the
+ * same representation. These text types cannot contain NUL bytes.
  */
 void
 varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
@@ -1972,7 +1970,7 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
  * representation.  Our encoding strategy is simple -- pack the first 8 bytes
  * of a strxfrm() blob into a Datum (on little-endian machines, the 8 bytes are
  * stored in reverse order), and treat it as an unsigned integer.  When the "C"
- * locale is used, or in case of bytea, just memcpy() from original instead.
+ * locale is used just memcpy() from original instead.
  */
 static Datum
 varstr_abbrev_convert(Datum original, SortSupport ssup)
@@ -1998,31 +1996,12 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		len = bpchartruelen(authoritative_data, len);
 
 	/*
-	 * If we're using the C collation, use memcpy(), rather than strxfrm(), to
-	 * abbreviate keys.  The full comparator for the C locale is always
-	 * memcmp().  It would be incorrect to allow bytea callers (callers that
-	 * always force the C collation -- bytea isn't a collatable type, but this
-	 * approach is convenient) to use strxfrm().  This is because bytea
-	 * strings may contain NUL bytes.  Besides, this should be faster, too.
+	 * If we're using the C collation, use memcpy(), rather than strxfrm(),
+	 * to abbreviate keys.  The full comparator for the C locale is also
+	 * memcmp().  This should be faster than strxfrm().
 	 *
-	 * More generally, it's okay that bytea callers can have NUL bytes in
-	 * strings because abbreviated cmp need not make a distinction between
-	 * terminating NUL bytes, and NUL bytes representing actual NULs in the
-	 * authoritative representation.  Hopefully a comparison at or past one
-	 * abbreviated key's terminating NUL byte will resolve the comparison
-	 * without consulting the authoritative representation; specifically, some
-	 * later non-NUL byte in the longer string can resolve the comparison
-	 * against a subsequent terminating NUL in the shorter string.  There will
-	 * usually be what is effectively a "length-wise" resolution there and
-	 * then.
-	 *
-	 * If that doesn't work out -- if all bytes in the longer string
-	 * positioned at or past the offset of the smaller string's (first)
-	 * terminating NUL are actually representative of NUL bytes in the
-	 * authoritative binary string (perhaps with some *terminating* NUL bytes
-	 * towards the end of the longer string iff it happens to still be small)
-	 * -- then an authoritative tie-breaker will happen, and do the right
-	 * thing: explicitly consider string length.
+	 * Note: text types (text, varchar, bpchar) cannot contain NUL bytes,
+	 * so we don't need to worry about NUL byte handling here.
 	 */
 	if (sss->collate_c)
 		memcpy(pres, authoritative_data, Min(len, max_prefix_bytes));
@@ -2104,9 +2083,6 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		 * strxfrm() blob is itself NUL terminated, leaving no danger of
 		 * misinterpreting any NUL bytes not intended to be interpreted as
 		 * logically representing termination.
-		 *
-		 * (Actually, even if there were NUL bytes in the blob it would be
-		 * okay.  See remarks on bytea case above.)
 		 */
 		memcpy(pres, sss->buf2, Min(max_prefix_bytes, bsize));
 	}
-- 
2.43.0

John Naylor

john.naylor@enterprisedb.com

4 months ago

In reply to: Aleksander Alekseev (#4)

Re: [PATCH] Refactor bytea_sortsupport(), take two

On Mon, Sep 15, 2025 at 3:40 PM Aleksander Alekseev
<aleksander@tigerdata.com> wrote:

Don't these types disallow NUL bytes regardless of locale / character set?

You are right, they do. Here is the patch v3 with corrected comments.

Hi Aleksander,

- * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
- * same representation.  Callers that always use the C collation (e.g.
- * non-collatable type callers like bytea) may have NUL bytes in their strings;
- * this will not work with any other collation, though.
+ * Relies on the assumption that text, VarChar, and BpChar all have the
+ * same representation. These text types cannot contain NUL bytes.

AFAICS, the only reaon to mention NUL bytes here before was because of
bytea -- sirnce bytea is being removed from this path, I don't see a
need to mention mention NUL bytes here. It'd be relevant if any
simplification is possible in functions that previously may have had
to worry about NUL bytes. I don't immediately see any such
opportunities, though. Are there?

+ * Note: text types (text, varchar, bpchar) cannot contain NUL bytes,
+ * so we don't need to worry about NUL byte handling here.

Same here. Also, "Note" is stronger than a normal comment, maybe to
call attention to complex edge cases or weird behaviors.

+#if SIZEOF_DATUM == 8

We recently made all datums 8 bytes.

Some comments are randomly different than the equivalents in
varlena.c. It's probably better if same things remain the same, but
there's nothing wrong either.

"Lastly, the performance and memory consumption could be optimized a little for
the bytea case."

Is this a side effect of the patch, or a possibility for future work?
It's not clear.

The rest seems fine at a glance.
--
John Naylor
Amazon Web Services

Chao Li

li.evan.chao@gmail.com

4 months ago

In reply to: Aleksander Alekseev (#4)

Re: [PATCH] Refactor bytea_sortsupport(), take two

Hi Aleksander,

Overall LGTM. Just a few small comments:

On Sep 15, 2025, at 16:40, Aleksander Alekseev <aleksander@tigerdata.com> wrote:

--
Best regards,
Aleksander Alekseev
<v3-0001-Refactor-bytea_sortsupport.patch>

1.
····
+	/* We can't afford to leak memory here. */
+	if (PointerGetDatum(arg1) != x)
+		pfree(arg1);
+	if (PointerGetDatum(arg2) != y)
+		pfree(arg2);
···

Should we do:
```
PG_FREE_IF_COPY(arg1, 0);
PG_FREE_IF_COPY(arg2, 1)
```

Similar to other places.

2.
···
+/*
+ * Conversion routine for sortsupport.
+ */
+static Datum
+bytea_abbrev_convert(Datum original, SortSupport ssup)
···

The function comment is less descriptive. I would suggest something like:
```
/*
* Abbreviated key conversion for bytea sortsupport.
*
* Returns the abbreviated key as a Datum. If a detoasted copy was made,
* it is freed before returning.
*/
```

3.
```
+	if (abbrev_distinct <= 1.0)
+		abbrev_distinct = 1.0;
+
+	if (key_distinct <= 1.0)
+		key_distinct = 1.0;
```

Why <= 1.0 then set to 1.0? Shouldn't “if" takes just <1.0?

4.
```
Datum
bytea_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
MemoryContext oldcontext;
+ ByteaSortSupport *bss;
```

“Bss” can be defined in the “if” clause where it is used.

5.
```
Datum
bytea_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
MemoryContext oldcontext;
+ ByteaSortSupport *bss;
+ bool abbreviate = ssup->abbreviate;
```

The local variable “abbreviate” is only used once, do we really need to cache ssup->abbreviate into a local variable?

--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Aleksander Alekseev

aleksander@timescale.com

4 months ago

In reply to: Chao Li (#6)

Re: [PATCH] Refactor bytea_sortsupport(), take two

Hi Chao,

1.
····
+ /* We can't afford to leak memory here. */
+ if (PointerGetDatum(arg1) != x)
+ pfree(arg1);
+ if (PointerGetDatum(arg2) != y)
+ pfree(arg2);
···
Should we do:
```
PG_FREE_IF_COPY(arg1, 0);
PG_FREE_IF_COPY(arg2, 1)
```

Similar to other places.

Hmmm... to me it doesn't look more readable to be honest. I choose the
same pattern used in varlena.c and suggest keeping it for consistency.
IMO this refactoring can be discussed later in a separate thread. Good
observation though.

2.
···
+/*
+ * Conversion routine for sortsupport.
+ */
+static Datum
+bytea_abbrev_convert(Datum original, SortSupport ssup)
···
The function comment is less descriptive. I would suggest something like:
```
/*
* Abbreviated key conversion for bytea sortsupport.
*
* Returns the abbreviated key as a Datum. If a detoasted copy was made,
* it is freed before returning.
*/
```

Sorry, but I don't think I agree with this either. The comment you are
proposing exposes the implementation details. I don't think that the
caller needs to know them.

This is basically a simplified varstr_abbrev_convert(). I was not
certain if I should mention this in a comment and/or how much text I
should copy from the comment for varstr_abbrev_convert(). I'm open to
suggestions.

3.
```
+ if (abbrev_distinct <= 1.0)
+ abbrev_distinct = 1.0;
+
+ if (key_distinct <= 1.0)
+ key_distinct = 1.0;
```
Why <= 1.0 then set to 1.0? Shouldn't “if" takes just <1.0?

Good point. I'll fix this in v4. Also I'll modify
varstr_abbrev_abort() for consistency.

One could argue that the change of varstr_abbrev_abort() is irrelevant
in the context of this patch. If there are objections we can easily
change '<' back to '<='. It's not that important but '<' looks cleaner
IMO.

4.
```
Datum
bytea_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
MemoryContext oldcontext;
+ ByteaSortSupport *bss;
```

“Bss” can be defined in the “if” clause where it is used.

Agree. I'll fix this in v4.

5.
```
Datum
bytea_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
MemoryContext oldcontext;
+ ByteaSortSupport *bss;
+ bool abbreviate = ssup->abbreviate;
```

The local variable “abbreviate” is only used once, do we really need to cache ssup->abbreviate into a local variable?

Agree, thanks.

--
Best regards,
Aleksander Alekseev

Aleksander Alekseev

aleksander@timescale.com

4 months ago

In reply to: John Naylor (#5)

1 attachment(s)

Re: [PATCH] Refactor bytea_sortsupport(), take two

Hi John,

- * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
- * same representation.  Callers that always use the C collation (e.g.
- * non-collatable type callers like bytea) may have NUL bytes in their strings;
- * this will not work with any other collation, though.
+ * Relies on the assumption that text, VarChar, and BpChar all have the
+ * same representation. These text types cannot contain NUL bytes.
AFAICS, the only reaon to mention NUL bytes here before was because of
bytea -- sirnce bytea is being removed from this path, I don't see a
need to mention mention NUL bytes here. It'd be relevant if any
simplification is possible in functions that previously may have had
to worry about NUL bytes. I don't immediately see any such
opportunities, though. Are there?

Doesn't seem so. I removed the mention of NUL bytes.

+ * Note: text types (text, varchar, bpchar) cannot contain NUL bytes,
+ * so we don't need to worry about NUL byte handling here.
Same here. Also, "Note" is stronger than a normal comment, maybe to
call attention to complex edge cases or weird behaviors.

Agree. Fixed.

+#if SIZEOF_DATUM == 8

We recently made all datums 8 bytes.

Right, I forgot to change the patch accordingly. Fixed.

Some comments are randomly different than the equivalents in
varlena.c. It's probably better if same things remain the same, but
there's nothing wrong either.

I did my best to keep the comments consistent between varlena.c and
bytea.c. I don't think they are going to be that consistent in the
long term anyway, so not sure how much effort we should invest into
this.

"Lastly, the performance and memory consumption could be optimized a little for
the bytea case."

Is this a side effect of the patch, or a possibility for future work?
It's not clear.

The idea was to reflect the fact that bytea-specific code becomes
simpler (less if's, etc) and uses structures of smaller size. This is
probably not that important for the commit message. I removed it in
order to avoid confusion.

PFA patch v4.

--
Best regards,
Aleksander Alekseev

Attachments:

v4-0001-Refactor-bytea_sortsupport.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Refactor-bytea_sortsupport.patchDownload

From fcb9de3f3d45f79a3a3ab920b5839a9394ece637 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 2 Jul 2025 13:49:37 +0300
Subject: [PATCH v4] Refactor bytea_sortsupport()

Previously bytea_sortsupport() was implemented as a wrapper function for
varstr_sortsupport(). There were several problems with this.

Firstly, this was error-prone. Changing logic for string types could affect
bytea logic and vice versa.

Secondly, the bytea-related part of the code was difficult to reason about.
It was mixed with the logic for other types.

Use independent bytea_sortsupport() implementation for the named reasons.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TP1bAbEhUJa6+rgceN6QJWMSsxhg1=mqfSN=Nb-n6DAKg@mail.gmail.com
---
 src/backend/utils/adt/bytea.c   | 203 +++++++++++++++++++++++++++++++-
 src/backend/utils/adt/varlena.c |  45 ++-----
 2 files changed, 207 insertions(+), 41 deletions(-)

diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index 6e7b914c563..b22052c30ae 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -15,18 +15,19 @@
 #include "postgres.h"
 
 #include "access/detoast.h"
-#include "catalog/pg_collation_d.h"
-#include "catalog/pg_type_d.h"
+#include "common/hashfn.h"
 #include "common/int.h"
 #include "fmgr.h"
+#include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bitutils.h"
+#include "port/pg_bswap.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/fmgrprotos.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
-#include "utils/varlena.h"
 #include "varatt.h"
 
 /* GUC variable */
@@ -37,6 +38,19 @@ static bytea *bytea_substring(Datum str, int S, int L,
 							  bool length_not_specified);
 static bytea *bytea_overlay(bytea *t1, bytea *t2, int sp, int sl);
 
+typedef struct
+{
+	bool		abbreviate;		/* Should we abbreviate keys? */
+	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
+	hyperLogLogState full_card; /* Full key cardinality state */
+	double		prop_card;		/* Required cardinality proportion */
+}			ByteaSortSupport;
+
+/* Static function declarations for sort support */
+static int	byteafastcmp(Datum x, Datum y, SortSupport ssup);
+static Datum bytea_abbrev_convert(Datum original, SortSupport ssup);
+static bool bytea_abbrev_abort(int memtupcount, SortSupport ssup);
+
 /*
  * bytea_catenate
  *	Guts of byteacat(), broken out so it can be used by other functions
@@ -1001,6 +1015,166 @@ bytea_smaller(PG_FUNCTION_ARGS)
 	PG_RETURN_BYTEA_P(result);
 }
 
+/*
+ * sortsupport comparison func
+ */
+static int
+byteafastcmp(Datum x, Datum y, SortSupport ssup)
+{
+	bytea	   *arg1 = DatumGetByteaPP(x);
+	bytea	   *arg2 = DatumGetByteaPP(y);
+	char	   *a1p,
+			   *a2p;
+	int			len1,
+				len2,
+				result;
+
+	a1p = VARDATA_ANY(arg1);
+	a2p = VARDATA_ANY(arg2);
+
+	len1 = VARSIZE_ANY_EXHDR(arg1);
+	len2 = VARSIZE_ANY_EXHDR(arg2);
+
+	result = memcmp(a1p, a2p, Min(len1, len2));
+	if ((result == 0) && (len1 != len2))
+		result = (len1 < len2) ? -1 : 1;
+
+	/* We can't afford to leak memory here. */
+	if (PointerGetDatum(arg1) != x)
+		pfree(arg1);
+	if (PointerGetDatum(arg2) != y)
+		pfree(arg2);
+
+	return result;
+}
+
+/*
+ * Conversion routine for sortsupport.
+ */
+static Datum
+bytea_abbrev_convert(Datum original, SortSupport ssup)
+{
+	const size_t max_prefix_bytes = sizeof(Datum);
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	bytea	   *authoritative = DatumGetByteaPP(original);
+	char	   *authoritative_data = VARDATA_ANY(authoritative);
+	Datum		res;
+	char	   *pres;
+	int			len;
+	uint32		hash;
+
+	pres = (char *) &res;
+
+	/* memset(), so any non-overwritten bytes are NUL */
+	memset(pres, 0, max_prefix_bytes);
+	len = VARSIZE_ANY_EXHDR(authoritative);
+	memcpy(pres, authoritative_data, Min(len, max_prefix_bytes));
+
+	/*
+	 * Maintain approximate cardinality of both abbreviated keys and original
+	 * keys using HyperLogLog.
+	 */
+	hash = DatumGetUInt32(hash_any((unsigned char *) authoritative_data,
+								   Min(len, PG_CACHE_LINE_SIZE)));
+
+	if (len > PG_CACHE_LINE_SIZE)
+		hash ^= DatumGetUInt32(hash_uint32((uint32) len));
+
+	addHyperLogLog(&bss->full_card, hash);
+
+	/* Hash abbreviated key */
+	{
+		uint32		lohalf,
+					hihalf;
+
+		lohalf = (uint32) res;
+		hihalf = (uint32) (res >> 32);
+		hash = DatumGetUInt32(hash_uint32(lohalf ^ hihalf));
+	}
+
+	addHyperLogLog(&bss->abbr_card, hash);
+
+	/*
+	 * Byteswap on little-endian machines.
+	 *
+	 * This is needed so that ssup_datum_unsigned_cmp() works correctly on all
+	 * platforms.
+	 */
+	res = DatumBigEndianToNative(res);
+
+	/* Don't leak memory here */
+	if (PointerGetDatum(authoritative) != original)
+		pfree(authoritative);
+
+	return res;
+}
+
+/*
+ * Callback for estimating effectiveness of abbreviated key optimization, using
+ * heuristic rules.  Returns value indicating if the abbreviation optimization
+ * should be aborted, based on its projected effectiveness.
+ */
+static bool
+bytea_abbrev_abort(int memtupcount, SortSupport ssup)
+{
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	double		abbrev_distinct,
+				key_distinct;
+
+	Assert(ssup->abbreviate);
+
+	/* Have a little patience */
+	if (memtupcount < 100)
+		return false;
+
+	abbrev_distinct = estimateHyperLogLog(&bss->abbr_card);
+	key_distinct = estimateHyperLogLog(&bss->full_card);
+
+	/*
+	 * Clamp cardinality estimates to at least one distinct value.
+	 */
+	if (abbrev_distinct < 1.0)
+		abbrev_distinct = 1.0;
+
+	if (key_distinct < 1.0)
+		key_distinct = 1.0;
+
+	if (trace_sort)
+	{
+		double		norm_abbrev_card = abbrev_distinct / (double) memtupcount;
+
+		elog(LOG, "bytea_abbrev: abbrev_distinct after %d: %f "
+			 "(key_distinct: %f, norm_abbrev_card: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, norm_abbrev_card,
+			 bss->prop_card);
+	}
+
+	/*
+	 * If the number of distinct abbreviated keys approximately matches the
+	 * number of distinct original keys, continue with abbreviation.
+	 */
+	if (abbrev_distinct > key_distinct * bss->prop_card)
+	{
+		/*
+		 * Decay required cardinality aggressively after 10,000 tuples.
+		 */
+		if (memtupcount > 10000)
+			bss->prop_card *= 0.65;
+
+		return false;
+	}
+
+	/*
+	 * Abort abbreviation strategy.
+	 */
+	if (trace_sort)
+		elog(LOG, "bytea_abbrev: aborted abbreviation at %d "
+			 "(abbrev_distinct: %f, key_distinct: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, bss->prop_card);
+
+	return true;
+}
+
 Datum
 bytea_sortsupport(PG_FUNCTION_ARGS)
 {
@@ -1009,8 +1183,27 @@ bytea_sortsupport(PG_FUNCTION_ARGS)
 
 	oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
 
-	/* Use generic string SortSupport, forcing "C" collation */
-	varstr_sortsupport(ssup, BYTEAOID, C_COLLATION_OID);
+	ssup->comparator = byteafastcmp;
+
+	/*
+	 * Set up abbreviation support if requested.
+	 */
+	if (ssup->abbreviate)
+	{
+		ByteaSortSupport *bss;
+
+		bss = palloc(sizeof(ByteaSortSupport));
+		bss->abbreviate = true;
+		bss->prop_card = 0.20;
+		initHyperLogLog(&bss->abbr_card, 10);
+		initHyperLogLog(&bss->full_card, 10);
+
+		ssup->ssup_extra = bss;
+		ssup->abbrev_full_comparator = ssup->comparator;
+		ssup->comparator = ssup_datum_unsigned_cmp;
+		ssup->abbrev_converter = bytea_abbrev_convert;
+		ssup->abbrev_abort = bytea_abbrev_abort;
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 2c398cd9e5c..02b24a13cfb 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -92,7 +92,7 @@ typedef struct
 	int			last_returned;	/* Last comparison result (cache) */
 	bool		cache_blob;		/* Does buf2 contain strxfrm() blob, etc? */
 	bool		collate_c;
-	Oid			typid;			/* Actual datatype (text/bpchar/bytea/name) */
+	Oid			typid;			/* Actual datatype (text/bpchar/name) */
 	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
 	hyperLogLogState full_card; /* Full key cardinality state */
 	double		prop_card;		/* Required cardinality proportion */
@@ -1606,10 +1606,8 @@ bttextsortsupport(PG_FUNCTION_ARGS)
  * Includes locale support, and support for BpChar semantics (i.e. removing
  * trailing spaces before comparison).
  *
- * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
- * same representation.  Callers that always use the C collation (e.g.
- * non-collatable type callers like bytea) may have NUL bytes in their strings;
- * this will not work with any other collation, though.
+ * Relies on the assumption that text, VarChar, and BpChar all have the
+ * same representation.
  */
 void
 varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
@@ -1972,7 +1970,7 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
  * representation.  Our encoding strategy is simple -- pack the first 8 bytes
  * of a strxfrm() blob into a Datum (on little-endian machines, the 8 bytes are
  * stored in reverse order), and treat it as an unsigned integer.  When the "C"
- * locale is used, or in case of bytea, just memcpy() from original instead.
+ * locale is used just memcpy() from original instead.
  */
 static Datum
 varstr_abbrev_convert(Datum original, SortSupport ssup)
@@ -1998,31 +1996,9 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		len = bpchartruelen(authoritative_data, len);
 
 	/*
-	 * If we're using the C collation, use memcpy(), rather than strxfrm(), to
-	 * abbreviate keys.  The full comparator for the C locale is always
-	 * memcmp().  It would be incorrect to allow bytea callers (callers that
-	 * always force the C collation -- bytea isn't a collatable type, but this
-	 * approach is convenient) to use strxfrm().  This is because bytea
-	 * strings may contain NUL bytes.  Besides, this should be faster, too.
-	 *
-	 * More generally, it's okay that bytea callers can have NUL bytes in
-	 * strings because abbreviated cmp need not make a distinction between
-	 * terminating NUL bytes, and NUL bytes representing actual NULs in the
-	 * authoritative representation.  Hopefully a comparison at or past one
-	 * abbreviated key's terminating NUL byte will resolve the comparison
-	 * without consulting the authoritative representation; specifically, some
-	 * later non-NUL byte in the longer string can resolve the comparison
-	 * against a subsequent terminating NUL in the shorter string.  There will
-	 * usually be what is effectively a "length-wise" resolution there and
-	 * then.
-	 *
-	 * If that doesn't work out -- if all bytes in the longer string
-	 * positioned at or past the offset of the smaller string's (first)
-	 * terminating NUL are actually representative of NUL bytes in the
-	 * authoritative binary string (perhaps with some *terminating* NUL bytes
-	 * towards the end of the longer string iff it happens to still be small)
-	 * -- then an authoritative tie-breaker will happen, and do the right
-	 * thing: explicitly consider string length.
+	 * If we're using the C collation, use memcpy(), rather than strxfrm(),
+	 * to abbreviate keys.  The full comparator for the C locale is also
+	 * memcmp().  This should be faster than strxfrm().
 	 */
 	if (sss->collate_c)
 		memcpy(pres, authoritative_data, Min(len, max_prefix_bytes));
@@ -2104,9 +2080,6 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		 * strxfrm() blob is itself NUL terminated, leaving no danger of
 		 * misinterpreting any NUL bytes not intended to be interpreted as
 		 * logically representing termination.
-		 *
-		 * (Actually, even if there were NUL bytes in the blob it would be
-		 * okay.  See remarks on bytea case above.)
 		 */
 		memcpy(pres, sss->buf2, Min(max_prefix_bytes, bsize));
 	}
@@ -2187,10 +2160,10 @@ varstr_abbrev_abort(int memtupcount, SortSupport ssup)
 	 * NULLs are generally disregarded, if only NULL values were seen so far,
 	 * that might misrepresent costs if we failed to clamp.
 	 */
-	if (abbrev_distinct <= 1.0)
+	if (abbrev_distinct < 1.0)
 		abbrev_distinct = 1.0;
 
-	if (key_distinct <= 1.0)
+	if (key_distinct < 1.0)
 		key_distinct = 1.0;
 
 	/*
-- 
2.43.0

John Naylor

john.naylor@enterprisedb.com

3 months ago

In reply to: Aleksander Alekseev (#8)

Re: [PATCH] Refactor bytea_sortsupport(), take two

On Tue, Sep 16, 2025 at 4:33 PM Aleksander Alekseev
<aleksander@tigerdata.com> wrote:

Some comments are randomly different than the equivalents in
varlena.c. It's probably better if same things remain the same, but
there's nothing wrong either.

I did my best to keep the comments consistent between varlena.c and
bytea.c. I don't think they are going to be that consistent in the
long term anyway, so not sure how much effort we should invest into
this.

I see plenty of random differences by diff'ing the relevant functions
beteween the two. In a large, old codebase, consistency is one of the
most important things to maintain. Also, it takes hardly any time at
all to copy something that doesn't need changing.

After looking more closely, some of the differences actually make
things worse for maintenance IMO, so I think we need to be more
thoughtful. For example in *_abbrev_convert:

-        * Maintain approximate cardinality of both abbreviated keys
and original,
-        * authoritative keys using HyperLogLog.  Used as cheap
insurance against
-        * the worst case, where we do many string transformations for no saving
-        * in full strcoll()-based comparisons.  These statistics are used by
-        * varstr_abbrev_abort().
-        *
-        * First, Hash key proper, or a significant fraction of it.
Mix in length
-        * in order to compensate for cases where differences are past
-        * PG_CACHE_LINE_SIZE bytes, so as to limit the overhead of hashing.
+        * Maintain approximate cardinality of both abbreviated keys
and original
+        * keys using HyperLogLog.

The first part of the comment explained why we do this at all. The
bytea type does not use strcoll, so it's possible that a future patch
may decide to skip cardinality estimation entirely. It's obviously not
the responsibility of this refactoring patch to investigate that, but
throwing the reason out makes it harder for someone to discover that
bytea could do something different here. In this case, maybe we could
point to varstr_abbrev_convert instead of repeating the whole comment.

        /*
-        * Clamp cardinality estimates to at least one distinct value.  While
-        * NULLs are generally disregarded, if only NULL values were
seen so far,
-        * that might misrepresent costs if we failed to clamp.
+        * Clamp cardinality estimates to at least one distinct value.
         */

The old comment explained why we clamp, but the new comment just says
what the code is doing, and it's obvious what the code is doing.

- /*
- * In the worst case all abbreviated keys are identical, while
at the same
- * time there are differences within full key strings not captured in
- * abbreviations.
- */

Why is this gone? I could go on, but hopefully you get my point. If
it's short, just keep it, and adjust as necessary. If it's long, point
to varlena.c if the idea is the same.

Back to the actual patch:

- * More generally, it's okay that bytea callers can have NUL bytes in
- * strings because abbreviated cmp need not make a distinction between
- * terminating NUL bytes, and NUL bytes representing actual NULs in the
- * authoritative representation. Hopefully a comparison at or past one
- * abbreviated key's terminating NUL byte will resolve the comparison
- * without consulting the authoritative representation; specifically, some
- * later non-NUL byte in the longer string can resolve the comparison
- * against a subsequent terminating NUL in the shorter string. There will
- * usually be what is effectively a "length-wise" resolution there and
- * then.
- *
- * If that doesn't work out -- if all bytes in the longer string
- * positioned at or past the offset of the smaller string's (first)
- * terminating NUL are actually representative of NUL bytes in the
- * authoritative binary string (perhaps with some *terminating* NUL bytes
- * towards the end of the longer string iff it happens to still be small)
- * -- then an authoritative tie-breaker will happen, and do the right
- * thing: explicitly consider string length.

Why is this gone -- AFAICT it's still true for bytea, no matter what
file it's located in?

--
John Naylor
Amazon Web Services

#10

Aleksander Alekseev

aleksander@timescale.com

3 months ago

In reply to: John Naylor (#9)

1 attachment(s)

Re: [PATCH] Refactor bytea_sortsupport(), take two

Hi John,

Many thanks for the feedback.

For example in *_abbrev_convert:

[...]

The first part of the comment explained why we do this at all. The
bytea type does not use strcoll, so it's possible that a future patch
may decide to skip cardinality estimation entirely. It's obviously not
the responsibility of this refactoring patch to investigate that, but
throwing the reason out makes it harder for someone to discover that
bytea could do something different here. In this case, maybe we could
point to varstr_abbrev_convert instead of repeating the whole comment.

Fixed. I choose to repeat the entire comment since it is relatively
short in this case.

/*
-        * Clamp cardinality estimates to at least one distinct value.  While
-        * NULLs are generally disregarded, if only NULL values were
seen so far,
-        * that might misrepresent costs if we failed to clamp.
+        * Clamp cardinality estimates to at least one distinct value.
*/
The old comment explained why we clamp, but the new comment just says
what the code is doing, and it's obvious what the code is doing.

Fixed in the same manner.

- /*
- * In the worst case all abbreviated keys are identical, while
at the same
- * time there are differences within full key strings not captured in
- * abbreviations.
- */

Why is this gone? I could go on, but hopefully you get my point. If
it's short, just keep it, and adjust as necessary. If it's long, point
to varlena.c if the idea is the same.

OK, I made sure all the comments are in place. I referenced
corresponding varlena.c functions where the comments were too long to
my taste to repeat them entirely.

Back to the actual patch:

- * More generally, it's okay that bytea callers can have NUL bytes in
- * strings because abbreviated cmp need not make a distinction between
- * terminating NUL bytes, and NUL bytes representing actual NULs in the
- * authoritative representation. Hopefully a comparison at or past one
- * abbreviated key's terminating NUL byte will resolve the comparison
- * without consulting the authoritative representation; specifically, some
- * later non-NUL byte in the longer string can resolve the comparison
- * against a subsequent terminating NUL in the shorter string. There will
- * usually be what is effectively a "length-wise" resolution there and
- * then.
- *
- * If that doesn't work out -- if all bytes in the longer string
- * positioned at or past the offset of the smaller string's (first)
- * terminating NUL are actually representative of NUL bytes in the
- * authoritative binary string (perhaps with some *terminating* NUL bytes
- * towards the end of the longer string iff it happens to still be small)
- * -- then an authoritative tie-breaker will happen, and do the right
- * thing: explicitly consider string length.

Why is this gone -- AFAICT it's still true for bytea, no matter what
file it's located in?

Actually for bytea_abbrev_convert() this comment makes no sense IMO.
Presumably the reader is aware that '\x0000...' is a valid bytea, and
there is no such thing as "terminating NUL bytes" for bytea.

For varstr_abbrev_convert() as you correctly pointed out before [1]/messages/by-id/CANWCAZbDKESq30bn_6QPZqOyrP7JYxxz27Q5gymv0qJEDVj6_A@mail.gmail.com we
actually disallow NUL bytes [2]https://www.postgresql.org/docs/current/datatype-character.html. The reason why the comment is
currently there is that varstr_abbrev_convert() may have bytea callers
which may have NUL bytes.

So it seems to me that rewriting this particular part is exactly in
scope of the refactoring, unless I'm missing something.

[1]: /messages/by-id/CANWCAZbDKESq30bn_6QPZqOyrP7JYxxz27Q5gymv0qJEDVj6_A@mail.gmail.com
[2]: https://www.postgresql.org/docs/current/datatype-character.html

--
Best regards,
Aleksander Alekseev

Attachments:

v5-0001-Refactor-bytea_sortsupport.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Refactor-bytea_sortsupport.patchDownload

From e15d0fa9ea91b6bc553bb333c7b7c87af8dc0eac Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 2 Jul 2025 13:49:37 +0300
Subject: [PATCH v5] Refactor bytea_sortsupport()

Previously bytea_sortsupport() was implemented as a wrapper function for
varstr_sortsupport(). There were several problems with this.

Firstly, this was error-prone. Changing logic for string types could affect
bytea logic and vice versa.

Secondly, the bytea-related part of the code was difficult to reason about.
It was mixed with the logic for other types.

Use independent bytea_sortsupport() implementation for the named reasons.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TP1bAbEhUJa6+rgceN6QJWMSsxhg1=mqfSN=Nb-n6DAKg@mail.gmail.com
---
 src/backend/utils/adt/bytea.c   | 219 +++++++++++++++++++++++++++++++-
 src/backend/utils/adt/varlena.c |  45 ++-----
 2 files changed, 223 insertions(+), 41 deletions(-)

diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index 6e7b914c563..12f6e4b2ad8 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -15,18 +15,19 @@
 #include "postgres.h"
 
 #include "access/detoast.h"
-#include "catalog/pg_collation_d.h"
-#include "catalog/pg_type_d.h"
+#include "common/hashfn.h"
 #include "common/int.h"
 #include "fmgr.h"
+#include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bitutils.h"
+#include "port/pg_bswap.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/fmgrprotos.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
-#include "utils/varlena.h"
 #include "varatt.h"
 
 /* GUC variable */
@@ -37,6 +38,19 @@ static bytea *bytea_substring(Datum str, int S, int L,
 							  bool length_not_specified);
 static bytea *bytea_overlay(bytea *t1, bytea *t2, int sp, int sl);
 
+typedef struct
+{
+	bool		abbreviate;		/* Should we abbreviate keys? */
+	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
+	hyperLogLogState full_card; /* Full key cardinality state */
+	double		prop_card;		/* Required cardinality proportion */
+}			ByteaSortSupport;
+
+/* Static function declarations for sort support */
+static int	byteafastcmp(Datum x, Datum y, SortSupport ssup);
+static Datum bytea_abbrev_convert(Datum original, SortSupport ssup);
+static bool bytea_abbrev_abort(int memtupcount, SortSupport ssup);
+
 /*
  * bytea_catenate
  *	Guts of byteacat(), broken out so it can be used by other functions
@@ -1001,6 +1015,182 @@ bytea_smaller(PG_FUNCTION_ARGS)
 	PG_RETURN_BYTEA_P(result);
 }
 
+/*
+ * sortsupport comparison func
+ */
+static int
+byteafastcmp(Datum x, Datum y, SortSupport ssup)
+{
+	bytea	   *arg1 = DatumGetByteaPP(x);
+	bytea	   *arg2 = DatumGetByteaPP(y);
+	char	   *a1p,
+			   *a2p;
+	int			len1,
+				len2,
+				result;
+
+	a1p = VARDATA_ANY(arg1);
+	a2p = VARDATA_ANY(arg2);
+
+	len1 = VARSIZE_ANY_EXHDR(arg1);
+	len2 = VARSIZE_ANY_EXHDR(arg2);
+
+	result = memcmp(a1p, a2p, Min(len1, len2));
+	if ((result == 0) && (len1 != len2))
+		result = (len1 < len2) ? -1 : 1;
+
+	/* We can't afford to leak memory here. */
+	if (PointerGetDatum(arg1) != x)
+		pfree(arg1);
+	if (PointerGetDatum(arg2) != y)
+		pfree(arg2);
+
+	return result;
+}
+
+/*
+ * Conversion routine for sortsupport.
+ */
+static Datum
+bytea_abbrev_convert(Datum original, SortSupport ssup)
+{
+	const size_t max_prefix_bytes = sizeof(Datum);
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	bytea	   *authoritative = DatumGetByteaPP(original);
+	char	   *authoritative_data = VARDATA_ANY(authoritative);
+	Datum		res;
+	char	   *pres;
+	int			len;
+	uint32		hash;
+
+	pres = (char *) &res;
+
+	/* memset(), so any non-overwritten bytes are NUL */
+	memset(pres, 0, max_prefix_bytes);
+	len = VARSIZE_ANY_EXHDR(authoritative);
+
+	memcpy(pres, authoritative_data, Min(len, max_prefix_bytes));
+
+	/*
+	 * Maintain approximate cardinality of both abbreviated keys and original,
+	 * authoritative keys using HyperLogLog.  Used as cheap insurance against
+	 * the worst case, where we do many string transformations for no saving
+	 * in full strcoll()-based comparisons.  These statistics are used by
+	 * bytea_abbrev_abort().
+	 *
+	 * First, Hash key proper, or a significant fraction of it.  Mix in length
+	 * in order to compensate for cases where differences are past
+	 * PG_CACHE_LINE_SIZE bytes, so as to limit the overhead of hashing.
+	 */
+	hash = DatumGetUInt32(hash_any((unsigned char *) authoritative_data,
+								   Min(len, PG_CACHE_LINE_SIZE)));
+
+	if (len > PG_CACHE_LINE_SIZE)
+		hash ^= DatumGetUInt32(hash_uint32((uint32) len));
+
+	addHyperLogLog(&bss->full_card, hash);
+
+	/* Hash abbreviated key */
+	{
+		uint32		lohalf,
+					hihalf;
+
+		lohalf = (uint32) res;
+		hihalf = (uint32) (res >> 32);
+		hash = DatumGetUInt32(hash_uint32(lohalf ^ hihalf));
+	}
+
+	addHyperLogLog(&bss->abbr_card, hash);
+
+	/*
+	 * Byteswap on little-endian machines.
+	 *
+	 * This is needed so that ssup_datum_unsigned_cmp() works correctly on all
+	 * platforms.
+	 */
+	res = DatumBigEndianToNative(res);
+
+	/* Don't leak memory here */
+	if (PointerGetDatum(authoritative) != original)
+		pfree(authoritative);
+
+	return res;
+}
+
+/*
+ * Callback for estimating effectiveness of abbreviated key optimization, using
+ * heuristic rules.  Returns value indicating if the abbreviation optimization
+ * should be aborted, based on its projected effectiveness.
+ */
+static bool
+bytea_abbrev_abort(int memtupcount, SortSupport ssup)
+{
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	double		abbrev_distinct,
+				key_distinct;
+
+	Assert(ssup->abbreviate);
+
+	/* Have a little patience */
+	if (memtupcount < 100)
+		return false;
+
+	abbrev_distinct = estimateHyperLogLog(&bss->abbr_card);
+	key_distinct = estimateHyperLogLog(&bss->full_card);
+
+	/*
+	 * Clamp cardinality estimates to at least one distinct value.  While
+	 * NULLs are generally disregarded, if only NULL values were seen so far,
+	 * that might misrepresent costs if we failed to clamp.
+	 */
+	if (abbrev_distinct < 1.0)
+		abbrev_distinct = 1.0;
+
+	if (key_distinct < 1.0)
+		key_distinct = 1.0;
+
+	if (trace_sort)
+	{
+		double		norm_abbrev_card = abbrev_distinct / (double) memtupcount;
+
+		elog(LOG, "bytea_abbrev: abbrev_distinct after %d: %f "
+			 "(key_distinct: %f, norm_abbrev_card: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, norm_abbrev_card,
+			 bss->prop_card);
+	}
+
+	/*
+	 * If the number of distinct abbreviated keys approximately matches the
+	 * number of distinct original keys, continue with abbreviation.
+	 *
+	 * See varstr_abbrev_abort() for more details.
+	 */
+	if (abbrev_distinct > key_distinct * bss->prop_card)
+	{
+		/*
+		 * Decay required cardinality aggressively after 10,000 tuples.
+		 *
+		 * See varstr_abbrev_abort() for more details.
+		 */
+		if (memtupcount > 10000)
+			bss->prop_card *= 0.65;
+
+		return false;
+	}
+
+	/*
+	 * Abort abbreviation strategy.
+	 *
+	 * See varstr_abbrev_abort() for more details.
+	 */
+	if (trace_sort)
+		elog(LOG, "bytea_abbrev: aborted abbreviation at %d "
+			 "(abbrev_distinct: %f, key_distinct: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, bss->prop_card);
+
+	return true;
+}
+
 Datum
 bytea_sortsupport(PG_FUNCTION_ARGS)
 {
@@ -1009,8 +1199,27 @@ bytea_sortsupport(PG_FUNCTION_ARGS)
 
 	oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
 
-	/* Use generic string SortSupport, forcing "C" collation */
-	varstr_sortsupport(ssup, BYTEAOID, C_COLLATION_OID);
+	ssup->comparator = byteafastcmp;
+
+	/*
+	 * Set up abbreviation support if requested.
+	 */
+	if (ssup->abbreviate)
+	{
+		ByteaSortSupport *bss;
+
+		bss = palloc(sizeof(ByteaSortSupport));
+		bss->abbreviate = true;
+		bss->prop_card = 0.20;
+		initHyperLogLog(&bss->abbr_card, 10);
+		initHyperLogLog(&bss->full_card, 10);
+
+		ssup->ssup_extra = bss;
+		ssup->abbrev_full_comparator = ssup->comparator;
+		ssup->comparator = ssup_datum_unsigned_cmp;
+		ssup->abbrev_converter = bytea_abbrev_convert;
+		ssup->abbrev_abort = bytea_abbrev_abort;
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 3894457ab40..331a30d5264 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -92,7 +92,7 @@ typedef struct
 	int			last_returned;	/* Last comparison result (cache) */
 	bool		cache_blob;		/* Does buf2 contain strxfrm() blob, etc? */
 	bool		collate_c;
-	Oid			typid;			/* Actual datatype (text/bpchar/bytea/name) */
+	Oid			typid;			/* Actual datatype (text/bpchar/name) */
 	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
 	hyperLogLogState full_card; /* Full key cardinality state */
 	double		prop_card;		/* Required cardinality proportion */
@@ -1606,10 +1606,8 @@ bttextsortsupport(PG_FUNCTION_ARGS)
  * Includes locale support, and support for BpChar semantics (i.e. removing
  * trailing spaces before comparison).
  *
- * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
- * same representation.  Callers that always use the C collation (e.g.
- * non-collatable type callers like bytea) may have NUL bytes in their strings;
- * this will not work with any other collation, though.
+ * Relies on the assumption that text, VarChar, and BpChar all have the
+ * same representation.
  */
 void
 varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
@@ -1972,7 +1970,7 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
  * representation.  Our encoding strategy is simple -- pack the first 8 bytes
  * of a strxfrm() blob into a Datum (on little-endian machines, the 8 bytes are
  * stored in reverse order), and treat it as an unsigned integer.  When the "C"
- * locale is used, or in case of bytea, just memcpy() from original instead.
+ * locale is used just memcpy() from original instead.
  */
 static Datum
 varstr_abbrev_convert(Datum original, SortSupport ssup)
@@ -1998,31 +1996,9 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		len = bpchartruelen(authoritative_data, len);
 
 	/*
-	 * If we're using the C collation, use memcpy(), rather than strxfrm(), to
-	 * abbreviate keys.  The full comparator for the C locale is always
-	 * memcmp().  It would be incorrect to allow bytea callers (callers that
-	 * always force the C collation -- bytea isn't a collatable type, but this
-	 * approach is convenient) to use strxfrm().  This is because bytea
-	 * strings may contain NUL bytes.  Besides, this should be faster, too.
-	 *
-	 * More generally, it's okay that bytea callers can have NUL bytes in
-	 * strings because abbreviated cmp need not make a distinction between
-	 * terminating NUL bytes, and NUL bytes representing actual NULs in the
-	 * authoritative representation.  Hopefully a comparison at or past one
-	 * abbreviated key's terminating NUL byte will resolve the comparison
-	 * without consulting the authoritative representation; specifically, some
-	 * later non-NUL byte in the longer string can resolve the comparison
-	 * against a subsequent terminating NUL in the shorter string.  There will
-	 * usually be what is effectively a "length-wise" resolution there and
-	 * then.
-	 *
-	 * If that doesn't work out -- if all bytes in the longer string
-	 * positioned at or past the offset of the smaller string's (first)
-	 * terminating NUL are actually representative of NUL bytes in the
-	 * authoritative binary string (perhaps with some *terminating* NUL bytes
-	 * towards the end of the longer string iff it happens to still be small)
-	 * -- then an authoritative tie-breaker will happen, and do the right
-	 * thing: explicitly consider string length.
+	 * If we're using the C collation, use memcpy(), rather than strxfrm(),
+	 * to abbreviate keys.  The full comparator for the C locale is also
+	 * memcmp().  This should be faster than strxfrm().
 	 */
 	if (sss->collate_c)
 		memcpy(pres, authoritative_data, Min(len, max_prefix_bytes));
@@ -2104,9 +2080,6 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		 * strxfrm() blob is itself NUL terminated, leaving no danger of
 		 * misinterpreting any NUL bytes not intended to be interpreted as
 		 * logically representing termination.
-		 *
-		 * (Actually, even if there were NUL bytes in the blob it would be
-		 * okay.  See remarks on bytea case above.)
 		 */
 		memcpy(pres, sss->buf2, Min(max_prefix_bytes, bsize));
 	}
@@ -2187,10 +2160,10 @@ varstr_abbrev_abort(int memtupcount, SortSupport ssup)
 	 * NULLs are generally disregarded, if only NULL values were seen so far,
 	 * that might misrepresent costs if we failed to clamp.
 	 */
-	if (abbrev_distinct <= 1.0)
+	if (abbrev_distinct < 1.0)
 		abbrev_distinct = 1.0;
 
-	if (key_distinct <= 1.0)
+	if (key_distinct < 1.0)
 		key_distinct = 1.0;
 
 	/*
-- 
2.43.0

#11

John Naylor

john.naylor@enterprisedb.com

2 months ago

In reply to: Aleksander Alekseev (#10)

Re: [PATCH] Refactor bytea_sortsupport(), take two

On Mon, Nov 10, 2025 at 6:22 PM Aleksander Alekseev
<aleksander@tigerdata.com> wrote:

OK, I made sure all the comments are in place. I referenced
corresponding varlena.c functions where the comments were too long to
my taste to repeat them entirely.

Looks mostly okay. The "See varstr_abbrev_abort() for more details."
are repeated several times close together, so I would replace those
with a header comment like "This is based on varstr_abbrev_abort(),
but some comments have been elided for brevity. See there for more
details."

+ * the worst case, where we do many string transformations for no saving
+ * in full strcoll()-based comparisons.  These statistics are used by
+ * bytea_abbrev_abort().

s/strcoll/memcmp/, and maybe s/transformations/abbreviations/ since
that was talking about strxfrm().

- * More generally, it's okay that bytea callers can have NUL bytes in
- * strings because abbreviated cmp need not make a distinction between
- * terminating NUL bytes, and NUL bytes representing actual NULs in the
- * authoritative representation. Hopefully a comparison at or past one

Why is this gone -- AFAICT it's still true for bytea, no matter what
file it's located in?

Actually for bytea_abbrev_convert() this comment makes no sense IMO.
Presumably the reader is aware that '\x0000...' is a valid bytea, and
there is no such thing as "terminating NUL bytes" for bytea.

It makes perfect sense to me. The abbreviated datum has terminating
NUL bytes if the source length is shorter than 8 bytes. The comment is
explaining why comparing bytea abbreviated datums still works. It
seems like everything starting with "abbreviated cmp need not ..." is
still appropriate for bytea.c.

Likewise, this part of varlena.c is still factually accurate as an aside:

- * (Actually, even if there were NUL bytes in the blob it would be
- * okay. See remarks on bytea case above.)

...although with the above, we'll have to point to the relevant place
in bytea.c.

I only have one real nitpick left (diff'ing between files):

    /* Hash abbreviated key */
    {
-         uint32      tmp;
+         uint32      lohalf,
+                     hihalf;

-         tmp = DatumGetUInt32(res) ^ (uint32) (DatumGetUInt64(res) >> 32);
-         hash = DatumGetUInt32(hash_uint32(tmp));
+         lohalf = (uint32) res;
+         hihalf = (uint32) (res >> 32);
+         hash = DatumGetUInt32(hash_uint32(lohalf ^ hihalf));
    }

If we're going to do anything different here at all, the logical thing
would be to simplify with murmurhash64() since datums are now 8 bytes
everywhere. That's material for a separate patch, though.

--
John Naylor
Amazon Web Services

#12

Aleksander Alekseev

aleksander@timescale.com

2 months ago

In reply to: John Naylor (#11)

1 attachment(s)

Re: [PATCH] Refactor bytea_sortsupport(), take two

Hi John,

Looks mostly okay. The "See varstr_abbrev_abort() for more details."
are repeated several times close together, so I would replace those
with a header comment like "This is based on varstr_abbrev_abort(),
but some comments have been elided for brevity. See there for more
details."

OK, fixed.

+ * the worst case, where we do many string transformations for no saving
+ * in full strcoll()-based comparisons.  These statistics are used by
+ * bytea_abbrev_abort().
s/strcoll/memcmp/, and maybe s/transformations/abbreviations/ since
that was talking about strxfrm().

Agree. Fixed.

- * More generally, it's okay that bytea callers can have NUL bytes in
- * strings because abbreviated cmp need not make a distinction between
- * terminating NUL bytes, and NUL bytes representing actual NULs in the
- * authoritative representation. Hopefully a comparison at or past one

Why is this gone -- AFAICT it's still true for bytea, no matter what
file it's located in?

Actually for bytea_abbrev_convert() this comment makes no sense IMO.
Presumably the reader is aware that '\x0000...' is a valid bytea, and
there is no such thing as "terminating NUL bytes" for bytea.

It makes perfect sense to me. The abbreviated datum has terminating
NUL bytes if the source length is shorter than 8 bytes. The comment is
explaining why comparing bytea abbreviated datums still works. It
seems like everything starting with "abbreviated cmp need not ..." is
still appropriate for bytea.c.

OK, I returned the comment to bytea.c, but replaced "string" with
"bytea". IMO mentioning "strings" is confusing in this context.

I still don't like the fact that the comment is reasoning about
"terminating NUL bytes" in the context of bytea, but I kept it as is
for now. It seems confusing to me. Do you think we should rewrite
this?

Likewise, this part of varlena.c is still factually accurate as an aside:

- * (Actually, even if there were NUL bytes in the blob it would be
- * okay. See remarks on bytea case above.)

...although with the above, we'll have to point to the relevant place
in bytea.c.

Not sure if I follow. Previously we agreed that we disallow NUL bytes
within strings, except for terminating ones. What we are doing here is
refactoring / simplifying the code. Why keep irrelevant and confusing
comments?

I only have one real nitpick left (diff'ing between files):
/* Hash abbreviated key */
{
-         uint32      tmp;
+         uint32      lohalf,
+                     hihalf;
-         tmp = DatumGetUInt32(res) ^ (uint32) (DatumGetUInt64(res) >> 32);
-         hash = DatumGetUInt32(hash_uint32(tmp));
+         lohalf = (uint32) res;
+         hihalf = (uint32) (res >> 32);
+         hash = DatumGetUInt32(hash_uint32(lohalf ^ hihalf));
}
If we're going to do anything different here at all, the logical thing
would be to simplify with murmurhash64() since datums are now 8 bytes
everywhere. That's material for a separate patch, though.

Good point. Fixed.

--
Best regards,
Aleksander Alekseev

Attachments:

v6-0001-Refactor-bytea_sortsupport.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Refactor-bytea_sortsupport.patchDownload

From 9c9addd13a725d1a52b6031bd0d196cc9173efe8 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 2 Jul 2025 13:49:37 +0300
Subject: [PATCH v6] Refactor bytea_sortsupport()

Previously bytea_sortsupport() was implemented as a wrapper function for
varstr_sortsupport(). There were several problems with this.

Firstly, this was error-prone. Changing logic for string types could affect
bytea logic and vice versa.

Secondly, the bytea-related part of the code was difficult to reason about.
It was mixed with the logic for other types.

Use independent bytea_sortsupport() implementation for the named reasons.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TP1bAbEhUJa6+rgceN6QJWMSsxhg1=mqfSN=Nb-n6DAKg@mail.gmail.com
---
 src/backend/utils/adt/bytea.c   | 233 +++++++++++++++++++++++++++++++-
 src/backend/utils/adt/varlena.c |  45 ++----
 2 files changed, 237 insertions(+), 41 deletions(-)

diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index 6e7b914c563..0ff1c7ee5e5 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -15,18 +15,19 @@
 #include "postgres.h"
 
 #include "access/detoast.h"
-#include "catalog/pg_collation_d.h"
-#include "catalog/pg_type_d.h"
+#include "common/hashfn.h"
 #include "common/int.h"
 #include "fmgr.h"
+#include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bitutils.h"
+#include "port/pg_bswap.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/fmgrprotos.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
-#include "utils/varlena.h"
 #include "varatt.h"
 
 /* GUC variable */
@@ -37,6 +38,19 @@ static bytea *bytea_substring(Datum str, int S, int L,
 							  bool length_not_specified);
 static bytea *bytea_overlay(bytea *t1, bytea *t2, int sp, int sl);
 
+typedef struct
+{
+	bool		abbreviate;		/* Should we abbreviate keys? */
+	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
+	hyperLogLogState full_card; /* Full key cardinality state */
+	double		prop_card;		/* Required cardinality proportion */
+}			ByteaSortSupport;
+
+/* Static function declarations for sort support */
+static int	byteafastcmp(Datum x, Datum y, SortSupport ssup);
+static Datum bytea_abbrev_convert(Datum original, SortSupport ssup);
+static bool bytea_abbrev_abort(int memtupcount, SortSupport ssup);
+
 /*
  * bytea_catenate
  *	Guts of byteacat(), broken out so it can be used by other functions
@@ -1001,6 +1015,196 @@ bytea_smaller(PG_FUNCTION_ARGS)
 	PG_RETURN_BYTEA_P(result);
 }
 
+/*
+ * sortsupport comparison func
+ */
+static int
+byteafastcmp(Datum x, Datum y, SortSupport ssup)
+{
+	bytea	   *arg1 = DatumGetByteaPP(x);
+	bytea	   *arg2 = DatumGetByteaPP(y);
+	char	   *a1p,
+			   *a2p;
+	int			len1,
+				len2,
+				result;
+
+	a1p = VARDATA_ANY(arg1);
+	a2p = VARDATA_ANY(arg2);
+
+	len1 = VARSIZE_ANY_EXHDR(arg1);
+	len2 = VARSIZE_ANY_EXHDR(arg2);
+
+	result = memcmp(a1p, a2p, Min(len1, len2));
+	if ((result == 0) && (len1 != len2))
+		result = (len1 < len2) ? -1 : 1;
+
+	/* We can't afford to leak memory here. */
+	if (PointerGetDatum(arg1) != x)
+		pfree(arg1);
+	if (PointerGetDatum(arg2) != y)
+		pfree(arg2);
+
+	return result;
+}
+
+/*
+ * Conversion routine for sortsupport.
+ */
+static Datum
+bytea_abbrev_convert(Datum original, SortSupport ssup)
+{
+	const size_t max_prefix_bytes = sizeof(Datum);
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	bytea	   *authoritative = DatumGetByteaPP(original);
+	char	   *authoritative_data = VARDATA_ANY(authoritative);
+	Datum		res;
+	char	   *pres;
+	int			len;
+	uint32		hash;
+
+	pres = (char *) &res;
+
+	/* memset(), so any non-overwritten bytes are NUL */
+	memset(pres, 0, max_prefix_bytes);
+	len = VARSIZE_ANY_EXHDR(authoritative);
+
+	/*
+	 * It's okay that callers can have NUL bytes in bytea because abbreviated
+	 * cmp need not make a distinction between terminating NUL bytes, and NUL
+	 * bytes representing actual NULs in the authoritative representation.
+	 * Hopefully a comparison at or past one abbreviated key's terminating NUL
+	 * byte will resolve the comparison without consulting the authoritative
+	 * representation; specifically, some later non-NUL byte in the longer
+	 * bytea can resolve the comparison against a subsequent terminating NUL
+	 * in the shorter bytea.  There will usually be what is effectively a
+	 * "length-wise" resolution there and then.
+	 *
+	 * If that doesn't work out -- if all bytes in the longer bytea positioned
+	 * at or past the offset of the smaller bytea (first) terminating NUL are
+	 * actually representative of NUL bytes in the authoritative binary bytea
+	 * (perhaps with some *terminating* NUL bytes towards the end of the
+	 * longer bytea iff it happens to still be small) -- then an authoritative
+	 * tie-breaker will happen, and do the right thing: explicitly consider
+	 * bytea length.
+	 */
+	memcpy(pres, authoritative_data, Min(len, max_prefix_bytes));
+
+	/*
+	 * Maintain approximate cardinality of both abbreviated keys and original,
+	 * authoritative keys using HyperLogLog.  Used as cheap insurance against
+	 * the worst case, where we do many string abbreviations for no saving in
+	 * full memcmp()-based comparisons.  These statistics are used by
+	 * bytea_abbrev_abort().
+	 *
+	 * First, Hash key proper, or a significant fraction of it.  Mix in length
+	 * in order to compensate for cases where differences are past
+	 * PG_CACHE_LINE_SIZE bytes, so as to limit the overhead of hashing.
+	 */
+	hash = DatumGetUInt32(hash_any((unsigned char *) authoritative_data,
+								   Min(len, PG_CACHE_LINE_SIZE)));
+
+	if (len > PG_CACHE_LINE_SIZE)
+		hash ^= DatumGetUInt32(hash_uint32((uint32) len));
+
+	addHyperLogLog(&bss->full_card, hash);
+
+	/* Hash abbreviated key */
+	{
+		uint32		tmp;
+
+		tmp = DatumGetUInt32(res) ^ (uint32) (DatumGetUInt64(res) >> 32);
+		hash = DatumGetUInt32(hash_uint32(tmp));
+	}
+
+	addHyperLogLog(&bss->abbr_card, hash);
+
+	/*
+	 * Byteswap on little-endian machines.
+	 *
+	 * This is needed so that ssup_datum_unsigned_cmp() works correctly on all
+	 * platforms.
+	 */
+	res = DatumBigEndianToNative(res);
+
+	/* Don't leak memory here */
+	if (PointerGetDatum(authoritative) != original)
+		pfree(authoritative);
+
+	return res;
+}
+
+/*
+ * Callback for estimating effectiveness of abbreviated key optimization, using
+ * heuristic rules.  Returns value indicating if the abbreviation optimization
+ * should be aborted, based on its projected effectiveness.
+ *
+ * This is based on varstr_abbrev_abort(), but some comments have been elided
+ * for brevity. See there for more details.
+ */
+static bool
+bytea_abbrev_abort(int memtupcount, SortSupport ssup)
+{
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	double		abbrev_distinct,
+				key_distinct;
+
+	Assert(ssup->abbreviate);
+
+	/* Have a little patience */
+	if (memtupcount < 100)
+		return false;
+
+	abbrev_distinct = estimateHyperLogLog(&bss->abbr_card);
+	key_distinct = estimateHyperLogLog(&bss->full_card);
+
+	/*
+	 * Clamp cardinality estimates to at least one distinct value.  While
+	 * NULLs are generally disregarded, if only NULL values were seen so far,
+	 * that might misrepresent costs if we failed to clamp.
+	 */
+	if (abbrev_distinct < 1.0)
+		abbrev_distinct = 1.0;
+
+	if (key_distinct < 1.0)
+		key_distinct = 1.0;
+
+	if (trace_sort)
+	{
+		double		norm_abbrev_card = abbrev_distinct / (double) memtupcount;
+
+		elog(LOG, "bytea_abbrev: abbrev_distinct after %d: %f "
+			 "(key_distinct: %f, norm_abbrev_card: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, norm_abbrev_card,
+			 bss->prop_card);
+	}
+
+	/*
+	 * If the number of distinct abbreviated keys approximately matches the
+	 * number of distinct original keys, continue with abbreviation.
+	 */
+	if (abbrev_distinct > key_distinct * bss->prop_card)
+	{
+		/*
+		 * Decay required cardinality aggressively after 10,000 tuples.
+		 */
+		if (memtupcount > 10000)
+			bss->prop_card *= 0.65;
+
+		return false;
+	}
+
+	/*
+	 * Abort abbreviation strategy.
+	 */
+	if (trace_sort)
+		elog(LOG, "bytea_abbrev: aborted abbreviation at %d "
+			 "(abbrev_distinct: %f, key_distinct: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, bss->prop_card);
+
+	return true;
+}
+
 Datum
 bytea_sortsupport(PG_FUNCTION_ARGS)
 {
@@ -1009,8 +1213,27 @@ bytea_sortsupport(PG_FUNCTION_ARGS)
 
 	oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
 
-	/* Use generic string SortSupport, forcing "C" collation */
-	varstr_sortsupport(ssup, BYTEAOID, C_COLLATION_OID);
+	ssup->comparator = byteafastcmp;
+
+	/*
+	 * Set up abbreviation support if requested.
+	 */
+	if (ssup->abbreviate)
+	{
+		ByteaSortSupport *bss;
+
+		bss = palloc(sizeof(ByteaSortSupport));
+		bss->abbreviate = true;
+		bss->prop_card = 0.20;
+		initHyperLogLog(&bss->abbr_card, 10);
+		initHyperLogLog(&bss->full_card, 10);
+
+		ssup->ssup_extra = bss;
+		ssup->abbrev_full_comparator = ssup->comparator;
+		ssup->comparator = ssup_datum_unsigned_cmp;
+		ssup->abbrev_converter = bytea_abbrev_convert;
+		ssup->abbrev_abort = bytea_abbrev_abort;
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 3894457ab40..331a30d5264 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -92,7 +92,7 @@ typedef struct
 	int			last_returned;	/* Last comparison result (cache) */
 	bool		cache_blob;		/* Does buf2 contain strxfrm() blob, etc? */
 	bool		collate_c;
-	Oid			typid;			/* Actual datatype (text/bpchar/bytea/name) */
+	Oid			typid;			/* Actual datatype (text/bpchar/name) */
 	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
 	hyperLogLogState full_card; /* Full key cardinality state */
 	double		prop_card;		/* Required cardinality proportion */
@@ -1606,10 +1606,8 @@ bttextsortsupport(PG_FUNCTION_ARGS)
  * Includes locale support, and support for BpChar semantics (i.e. removing
  * trailing spaces before comparison).
  *
- * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
- * same representation.  Callers that always use the C collation (e.g.
- * non-collatable type callers like bytea) may have NUL bytes in their strings;
- * this will not work with any other collation, though.
+ * Relies on the assumption that text, VarChar, and BpChar all have the
+ * same representation.
  */
 void
 varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
@@ -1972,7 +1970,7 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
  * representation.  Our encoding strategy is simple -- pack the first 8 bytes
  * of a strxfrm() blob into a Datum (on little-endian machines, the 8 bytes are
  * stored in reverse order), and treat it as an unsigned integer.  When the "C"
- * locale is used, or in case of bytea, just memcpy() from original instead.
+ * locale is used just memcpy() from original instead.
  */
 static Datum
 varstr_abbrev_convert(Datum original, SortSupport ssup)
@@ -1998,31 +1996,9 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		len = bpchartruelen(authoritative_data, len);
 
 	/*
-	 * If we're using the C collation, use memcpy(), rather than strxfrm(), to
-	 * abbreviate keys.  The full comparator for the C locale is always
-	 * memcmp().  It would be incorrect to allow bytea callers (callers that
-	 * always force the C collation -- bytea isn't a collatable type, but this
-	 * approach is convenient) to use strxfrm().  This is because bytea
-	 * strings may contain NUL bytes.  Besides, this should be faster, too.
-	 *
-	 * More generally, it's okay that bytea callers can have NUL bytes in
-	 * strings because abbreviated cmp need not make a distinction between
-	 * terminating NUL bytes, and NUL bytes representing actual NULs in the
-	 * authoritative representation.  Hopefully a comparison at or past one
-	 * abbreviated key's terminating NUL byte will resolve the comparison
-	 * without consulting the authoritative representation; specifically, some
-	 * later non-NUL byte in the longer string can resolve the comparison
-	 * against a subsequent terminating NUL in the shorter string.  There will
-	 * usually be what is effectively a "length-wise" resolution there and
-	 * then.
-	 *
-	 * If that doesn't work out -- if all bytes in the longer string
-	 * positioned at or past the offset of the smaller string's (first)
-	 * terminating NUL are actually representative of NUL bytes in the
-	 * authoritative binary string (perhaps with some *terminating* NUL bytes
-	 * towards the end of the longer string iff it happens to still be small)
-	 * -- then an authoritative tie-breaker will happen, and do the right
-	 * thing: explicitly consider string length.
+	 * If we're using the C collation, use memcpy(), rather than strxfrm(),
+	 * to abbreviate keys.  The full comparator for the C locale is also
+	 * memcmp().  This should be faster than strxfrm().
 	 */
 	if (sss->collate_c)
 		memcpy(pres, authoritative_data, Min(len, max_prefix_bytes));
@@ -2104,9 +2080,6 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		 * strxfrm() blob is itself NUL terminated, leaving no danger of
 		 * misinterpreting any NUL bytes not intended to be interpreted as
 		 * logically representing termination.
-		 *
-		 * (Actually, even if there were NUL bytes in the blob it would be
-		 * okay.  See remarks on bytea case above.)
 		 */
 		memcpy(pres, sss->buf2, Min(max_prefix_bytes, bsize));
 	}
@@ -2187,10 +2160,10 @@ varstr_abbrev_abort(int memtupcount, SortSupport ssup)
 	 * NULLs are generally disregarded, if only NULL values were seen so far,
 	 * that might misrepresent costs if we failed to clamp.
 	 */
-	if (abbrev_distinct <= 1.0)
+	if (abbrev_distinct < 1.0)
 		abbrev_distinct = 1.0;
 
-	if (key_distinct <= 1.0)
+	if (key_distinct < 1.0)
 		key_distinct = 1.0;
 
 	/*
-- 
2.43.0

#13

John Naylor

john.naylor@enterprisedb.com

2 months ago

In reply to: Aleksander Alekseev (#12)

Re: [PATCH] Refactor bytea_sortsupport(), take two

On Mon, Nov 17, 2025 at 5:57 PM Aleksander Alekseev
<aleksander@tigerdata.com> wrote:

- * More generally, it's okay that bytea callers can have NUL bytes in
- * strings because abbreviated cmp need not make a distinction between
- * terminating NUL bytes, and NUL bytes representing actual NULs in the
- * authoritative representation. Hopefully a comparison at or past one

Why is this gone -- AFAICT it's still true for bytea, no matter what
file it's located in?

Actually for bytea_abbrev_convert() this comment makes no sense IMO.
Presumably the reader is aware that '\x0000...' is a valid bytea, and
there is no such thing as "terminating NUL bytes" for bytea.

It makes perfect sense to me. The abbreviated datum has terminating
NUL bytes if the source length is shorter than 8 bytes. The comment is
explaining why comparing bytea abbreviated datums still works. It
seems like everything starting with "abbreviated cmp need not ..." is
still appropriate for bytea.c.

OK, I returned the comment to bytea.c, but replaced "string" with
"bytea". IMO mentioning "strings" is confusing in this context.

I still don't like the fact that the comment is reasoning about
"terminating NUL bytes" in the context of bytea, but I kept it as is
for now. It seems confusing to me. Do you think we should rewrite
this?

+ /*
+ * It's okay that callers can have NUL bytes in bytea because abbreviated
+ * cmp need not make a distinction between terminating NUL bytes, and NUL
+ * bytes representing actual NULs in the authoritative representation.

I mentioned everything starting with "abbreviated cmp need not ..."
was okay, so I agree that "It's okay that callers can have NUL bytes
in bytea" is redundant. Was that the confusing part for you? How about
this:

"Short byteas will have terminating NUL bytes in the abbreviated
datum. Abbreviated comparison need not make a distinction between
thse NUL bytes, and NUL bytes representing actual NULs in the
authoritative representation." [...]

After that, the rest seems to flow better at a quick glance.

Likewise, this part of varlena.c is still factually accurate as an aside:

- * (Actually, even if there were NUL bytes in the blob it would be
- * okay. See remarks on bytea case above.)

...although with the above, we'll have to point to the relevant place
in bytea.c.

Not sure if I follow. Previously we agreed that we disallow NUL bytes
within strings, except for terminating ones. What we are doing here is
refactoring / simplifying the code. Why keep irrelevant and confusing
comments?

Okay, I'll concede that's it's no longer useful and maybe a distraction..

--
John Naylor
Amazon Web Services

#14

Aleksander Alekseev

aleksander@timescale.com

about 2 months ago

In reply to: John Naylor (#13)

1 attachment(s)

Re: [PATCH] Refactor bytea_sortsupport(), take two

Hi John,

How about this:

"Short byteas will have terminating NUL bytes in the abbreviated
datum. Abbreviated comparison need not make a distinction between
thse NUL bytes, and NUL bytes representing actual NULs in the
authoritative representation." [...]

After that, the rest seems to flow better at a quick glance.

Yes, it is much better now, thanks! Previously the comment was
reasoning about NUL bytes as if normally bytea can't have them which
IMO was confusing.

--
Best regards,
Aleksander Alekseev

Attachments:

v7-0001-Refactor-bytea_sortsupport.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Refactor-bytea_sortsupport.patchDownload

From 87f4588436d4ca5b223f25b5e020c043333bd813 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 2 Jul 2025 13:49:37 +0300
Subject: [PATCH v7] Refactor bytea_sortsupport()

Previously bytea_sortsupport() was implemented as a wrapper function for
varstr_sortsupport(). There were several problems with this.

Firstly, this was error-prone. Changing logic for string types could affect
bytea logic and vice versa.

Secondly, the bytea-related part of the code was difficult to reason about.
It was mixed with the logic for other types.

Use independent bytea_sortsupport() implementation for the named reasons.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TP1bAbEhUJa6+rgceN6QJWMSsxhg1=mqfSN=Nb-n6DAKg@mail.gmail.com
---
 src/backend/utils/adt/bytea.c   | 235 +++++++++++++++++++++++++++++++-
 src/backend/utils/adt/varlena.c |  45 ++----
 2 files changed, 239 insertions(+), 41 deletions(-)

diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index 6e7b914c563..0d314270535 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -15,18 +15,19 @@
 #include "postgres.h"
 
 #include "access/detoast.h"
-#include "catalog/pg_collation_d.h"
-#include "catalog/pg_type_d.h"
+#include "common/hashfn.h"
 #include "common/int.h"
 #include "fmgr.h"
+#include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bitutils.h"
+#include "port/pg_bswap.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/fmgrprotos.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
-#include "utils/varlena.h"
 #include "varatt.h"
 
 /* GUC variable */
@@ -37,6 +38,19 @@ static bytea *bytea_substring(Datum str, int S, int L,
 							  bool length_not_specified);
 static bytea *bytea_overlay(bytea *t1, bytea *t2, int sp, int sl);
 
+typedef struct
+{
+	bool		abbreviate;		/* Should we abbreviate keys? */
+	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
+	hyperLogLogState full_card; /* Full key cardinality state */
+	double		prop_card;		/* Required cardinality proportion */
+}			ByteaSortSupport;
+
+/* Static function declarations for sort support */
+static int	byteafastcmp(Datum x, Datum y, SortSupport ssup);
+static Datum bytea_abbrev_convert(Datum original, SortSupport ssup);
+static bool bytea_abbrev_abort(int memtupcount, SortSupport ssup);
+
 /*
  * bytea_catenate
  *	Guts of byteacat(), broken out so it can be used by other functions
@@ -1001,6 +1015,198 @@ bytea_smaller(PG_FUNCTION_ARGS)
 	PG_RETURN_BYTEA_P(result);
 }
 
+/*
+ * sortsupport comparison func
+ */
+static int
+byteafastcmp(Datum x, Datum y, SortSupport ssup)
+{
+	bytea	   *arg1 = DatumGetByteaPP(x);
+	bytea	   *arg2 = DatumGetByteaPP(y);
+	char	   *a1p,
+			   *a2p;
+	int			len1,
+				len2,
+				result;
+
+	a1p = VARDATA_ANY(arg1);
+	a2p = VARDATA_ANY(arg2);
+
+	len1 = VARSIZE_ANY_EXHDR(arg1);
+	len2 = VARSIZE_ANY_EXHDR(arg2);
+
+	result = memcmp(a1p, a2p, Min(len1, len2));
+	if ((result == 0) && (len1 != len2))
+		result = (len1 < len2) ? -1 : 1;
+
+	/* We can't afford to leak memory here. */
+	if (PointerGetDatum(arg1) != x)
+		pfree(arg1);
+	if (PointerGetDatum(arg2) != y)
+		pfree(arg2);
+
+	return result;
+}
+
+/*
+ * Conversion routine for sortsupport.
+ */
+static Datum
+bytea_abbrev_convert(Datum original, SortSupport ssup)
+{
+	const size_t max_prefix_bytes = sizeof(Datum);
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	bytea	   *authoritative = DatumGetByteaPP(original);
+	char	   *authoritative_data = VARDATA_ANY(authoritative);
+	Datum		res;
+	char	   *pres;
+	int			len;
+	uint32		hash;
+
+	pres = (char *) &res;
+
+	/* memset(), so any non-overwritten bytes are NUL */
+	memset(pres, 0, max_prefix_bytes);
+	len = VARSIZE_ANY_EXHDR(authoritative);
+
+	/*
+	 * Short byteas will have terminating NUL bytes in the abbreviated datum.
+	 * Abbreviated comparison need not make a distinction between these NUL
+	 * bytes, and NUL bytes representing actual NULs in the authoritative
+	 * representation.
+	 *
+	 * Hopefully a comparison at or past one abbreviated key's terminating NUL
+	 * byte will resolve the comparison without consulting the authoritative
+	 * representation; specifically, some later non-NUL byte in the longer
+	 * bytea can resolve the comparison against a subsequent terminating NUL
+	 * in the shorter bytea.  There will usually be what is effectively a
+	 * "length-wise" resolution there and then.
+	 *
+	 * If that doesn't work out -- if all bytes in the longer bytea positioned
+	 * at or past the offset of the smaller bytea (first) terminating NUL are
+	 * actually representative of NUL bytes in the authoritative binary bytea
+	 * (perhaps with some *terminating* NUL bytes towards the end of the
+	 * longer bytea iff it happens to still be small) -- then an authoritative
+	 * tie-breaker will happen, and do the right thing: explicitly consider
+	 * bytea length.
+	 */
+	memcpy(pres, authoritative_data, Min(len, max_prefix_bytes));
+
+	/*
+	 * Maintain approximate cardinality of both abbreviated keys and original,
+	 * authoritative keys using HyperLogLog.  Used as cheap insurance against
+	 * the worst case, where we do many string abbreviations for no saving in
+	 * full memcmp()-based comparisons.  These statistics are used by
+	 * bytea_abbrev_abort().
+	 *
+	 * First, Hash key proper, or a significant fraction of it.  Mix in length
+	 * in order to compensate for cases where differences are past
+	 * PG_CACHE_LINE_SIZE bytes, so as to limit the overhead of hashing.
+	 */
+	hash = DatumGetUInt32(hash_any((unsigned char *) authoritative_data,
+								   Min(len, PG_CACHE_LINE_SIZE)));
+
+	if (len > PG_CACHE_LINE_SIZE)
+		hash ^= DatumGetUInt32(hash_uint32((uint32) len));
+
+	addHyperLogLog(&bss->full_card, hash);
+
+	/* Hash abbreviated key */
+	{
+		uint32		tmp;
+
+		tmp = DatumGetUInt32(res) ^ (uint32) (DatumGetUInt64(res) >> 32);
+		hash = DatumGetUInt32(hash_uint32(tmp));
+	}
+
+	addHyperLogLog(&bss->abbr_card, hash);
+
+	/*
+	 * Byteswap on little-endian machines.
+	 *
+	 * This is needed so that ssup_datum_unsigned_cmp() works correctly on all
+	 * platforms.
+	 */
+	res = DatumBigEndianToNative(res);
+
+	/* Don't leak memory here */
+	if (PointerGetDatum(authoritative) != original)
+		pfree(authoritative);
+
+	return res;
+}
+
+/*
+ * Callback for estimating effectiveness of abbreviated key optimization, using
+ * heuristic rules.  Returns value indicating if the abbreviation optimization
+ * should be aborted, based on its projected effectiveness.
+ *
+ * This is based on varstr_abbrev_abort(), but some comments have been elided
+ * for brevity. See there for more details.
+ */
+static bool
+bytea_abbrev_abort(int memtupcount, SortSupport ssup)
+{
+	ByteaSortSupport *bss = (ByteaSortSupport *) ssup->ssup_extra;
+	double		abbrev_distinct,
+				key_distinct;
+
+	Assert(ssup->abbreviate);
+
+	/* Have a little patience */
+	if (memtupcount < 100)
+		return false;
+
+	abbrev_distinct = estimateHyperLogLog(&bss->abbr_card);
+	key_distinct = estimateHyperLogLog(&bss->full_card);
+
+	/*
+	 * Clamp cardinality estimates to at least one distinct value.  While
+	 * NULLs are generally disregarded, if only NULL values were seen so far,
+	 * that might misrepresent costs if we failed to clamp.
+	 */
+	if (abbrev_distinct < 1.0)
+		abbrev_distinct = 1.0;
+
+	if (key_distinct < 1.0)
+		key_distinct = 1.0;
+
+	if (trace_sort)
+	{
+		double		norm_abbrev_card = abbrev_distinct / (double) memtupcount;
+
+		elog(LOG, "bytea_abbrev: abbrev_distinct after %d: %f "
+			 "(key_distinct: %f, norm_abbrev_card: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, norm_abbrev_card,
+			 bss->prop_card);
+	}
+
+	/*
+	 * If the number of distinct abbreviated keys approximately matches the
+	 * number of distinct original keys, continue with abbreviation.
+	 */
+	if (abbrev_distinct > key_distinct * bss->prop_card)
+	{
+		/*
+		 * Decay required cardinality aggressively after 10,000 tuples.
+		 */
+		if (memtupcount > 10000)
+			bss->prop_card *= 0.65;
+
+		return false;
+	}
+
+	/*
+	 * Abort abbreviation strategy.
+	 */
+	if (trace_sort)
+		elog(LOG, "bytea_abbrev: aborted abbreviation at %d "
+			 "(abbrev_distinct: %f, key_distinct: %f, prop_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, bss->prop_card);
+
+	return true;
+}
+
 Datum
 bytea_sortsupport(PG_FUNCTION_ARGS)
 {
@@ -1009,8 +1215,27 @@ bytea_sortsupport(PG_FUNCTION_ARGS)
 
 	oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
 
-	/* Use generic string SortSupport, forcing "C" collation */
-	varstr_sortsupport(ssup, BYTEAOID, C_COLLATION_OID);
+	ssup->comparator = byteafastcmp;
+
+	/*
+	 * Set up abbreviation support if requested.
+	 */
+	if (ssup->abbreviate)
+	{
+		ByteaSortSupport *bss;
+
+		bss = palloc(sizeof(ByteaSortSupport));
+		bss->abbreviate = true;
+		bss->prop_card = 0.20;
+		initHyperLogLog(&bss->abbr_card, 10);
+		initHyperLogLog(&bss->full_card, 10);
+
+		ssup->ssup_extra = bss;
+		ssup->abbrev_full_comparator = ssup->comparator;
+		ssup->comparator = ssup_datum_unsigned_cmp;
+		ssup->abbrev_converter = bytea_abbrev_convert;
+		ssup->abbrev_abort = bytea_abbrev_abort;
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 3894457ab40..331a30d5264 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -92,7 +92,7 @@ typedef struct
 	int			last_returned;	/* Last comparison result (cache) */
 	bool		cache_blob;		/* Does buf2 contain strxfrm() blob, etc? */
 	bool		collate_c;
-	Oid			typid;			/* Actual datatype (text/bpchar/bytea/name) */
+	Oid			typid;			/* Actual datatype (text/bpchar/name) */
 	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
 	hyperLogLogState full_card; /* Full key cardinality state */
 	double		prop_card;		/* Required cardinality proportion */
@@ -1606,10 +1606,8 @@ bttextsortsupport(PG_FUNCTION_ARGS)
  * Includes locale support, and support for BpChar semantics (i.e. removing
  * trailing spaces before comparison).
  *
- * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
- * same representation.  Callers that always use the C collation (e.g.
- * non-collatable type callers like bytea) may have NUL bytes in their strings;
- * this will not work with any other collation, though.
+ * Relies on the assumption that text, VarChar, and BpChar all have the
+ * same representation.
  */
 void
 varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
@@ -1972,7 +1970,7 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
  * representation.  Our encoding strategy is simple -- pack the first 8 bytes
  * of a strxfrm() blob into a Datum (on little-endian machines, the 8 bytes are
  * stored in reverse order), and treat it as an unsigned integer.  When the "C"
- * locale is used, or in case of bytea, just memcpy() from original instead.
+ * locale is used just memcpy() from original instead.
  */
 static Datum
 varstr_abbrev_convert(Datum original, SortSupport ssup)
@@ -1998,31 +1996,9 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		len = bpchartruelen(authoritative_data, len);
 
 	/*
-	 * If we're using the C collation, use memcpy(), rather than strxfrm(), to
-	 * abbreviate keys.  The full comparator for the C locale is always
-	 * memcmp().  It would be incorrect to allow bytea callers (callers that
-	 * always force the C collation -- bytea isn't a collatable type, but this
-	 * approach is convenient) to use strxfrm().  This is because bytea
-	 * strings may contain NUL bytes.  Besides, this should be faster, too.
-	 *
-	 * More generally, it's okay that bytea callers can have NUL bytes in
-	 * strings because abbreviated cmp need not make a distinction between
-	 * terminating NUL bytes, and NUL bytes representing actual NULs in the
-	 * authoritative representation.  Hopefully a comparison at or past one
-	 * abbreviated key's terminating NUL byte will resolve the comparison
-	 * without consulting the authoritative representation; specifically, some
-	 * later non-NUL byte in the longer string can resolve the comparison
-	 * against a subsequent terminating NUL in the shorter string.  There will
-	 * usually be what is effectively a "length-wise" resolution there and
-	 * then.
-	 *
-	 * If that doesn't work out -- if all bytes in the longer string
-	 * positioned at or past the offset of the smaller string's (first)
-	 * terminating NUL are actually representative of NUL bytes in the
-	 * authoritative binary string (perhaps with some *terminating* NUL bytes
-	 * towards the end of the longer string iff it happens to still be small)
-	 * -- then an authoritative tie-breaker will happen, and do the right
-	 * thing: explicitly consider string length.
+	 * If we're using the C collation, use memcpy(), rather than strxfrm(),
+	 * to abbreviate keys.  The full comparator for the C locale is also
+	 * memcmp().  This should be faster than strxfrm().
 	 */
 	if (sss->collate_c)
 		memcpy(pres, authoritative_data, Min(len, max_prefix_bytes));
@@ -2104,9 +2080,6 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 		 * strxfrm() blob is itself NUL terminated, leaving no danger of
 		 * misinterpreting any NUL bytes not intended to be interpreted as
 		 * logically representing termination.
-		 *
-		 * (Actually, even if there were NUL bytes in the blob it would be
-		 * okay.  See remarks on bytea case above.)
 		 */
 		memcpy(pres, sss->buf2, Min(max_prefix_bytes, bsize));
 	}
@@ -2187,10 +2160,10 @@ varstr_abbrev_abort(int memtupcount, SortSupport ssup)
 	 * NULLs are generally disregarded, if only NULL values were seen so far,
 	 * that might misrepresent costs if we failed to clamp.
 	 */
-	if (abbrev_distinct <= 1.0)
+	if (abbrev_distinct < 1.0)
 		abbrev_distinct = 1.0;
 
-	if (key_distinct <= 1.0)
+	if (key_distinct < 1.0)
 		key_distinct = 1.0;
 
 	/*
-- 
2.43.0

#15

John Naylor

john.naylor@enterprisedb.com

about 1 month ago

In reply to: Aleksander Alekseev (#14)

Re: [PATCH] Refactor bytea_sortsupport(), take two

On Thu, Nov 27, 2025 at 7:45 PM Aleksander Alekseev
<aleksander@tigerdata.com> wrote:

"Short byteas will have terminating NUL bytes in the abbreviated
datum. Abbreviated comparison need not make a distinction between
thse NUL bytes, and NUL bytes representing actual NULs in the
authoritative representation." [...]

After that, the rest seems to flow better at a quick glance.

Yes, it is much better now, thanks! Previously the comment was
reasoning about NUL bytes as if normally bytea can't have them which
IMO was confusing.

Pushed v7 with a few small adjustments, mostly for pgindent and the
new practice to prefer palloc_object().

--
John Naylor
Amazon Web Services