tuple radix sort

Started by John Naylor3 months ago32 messages

John Naylor

johncnaylorls@gmail.com

3 months ago

1 attachment(s)

First, a quick demonstration of what this PoC can do on 1 million
random not-NULL bigints:

set wip_radix_sort = 'off'; select * from test order by a offset 1_000_000_000;
240ms

set wip_radix_sort = 'on'; select * from test order by a offset 1_000_000_000;
140ms

Background: Peter Geoghegan recently mentioned to me off-list an
interesting set of techniques for sorting in the context of databases.
I'm not yet sure how to approach certain aspects of that architecture,
so I won't go into the full picture at this point. However, there is
one piece that already fits well within our existing architecture, and
that is using radix sort on datum1. The basic sequence is:

1. Partition tuples on first key NULL and not-NULL, according to NULLS
FIRST or NULLS LAST.
2. Do normal qsort on the NULL partition using the tiebreak comparator.
3. Create a "conditioned" or "normalized" datum that encodes datum1
such that unsigned comparison is order-preserving, accounting for ASC
/ DESC as well. I've reused space now unused during in-memory not-NULL
sorts:

typedef struct
{
void *tuple; /* the tuple itself */
Datum datum1; /* value of first key column */

union
{
struct
{
bool isnull1; /* is first key column NULL? */
int srctape; /* source tape number */
};
Datum cond_datum1; /* sort key for radix sort */
};
} SortTuple;

4. Radix sort on cond_datum1. For the PoC I've based it on the
implementation in "ska sort" [1]https://github.com/skarupke/ska_sort/tree/master (C++, Boost license). For
medium-sized sorts it uses "American flag sort" (there is a paper [3]http://static.usenix.org/publications/compsystems/1993/win_mcilroy.pdf
co-authored by M. D. McIlroy, same as in the paper we reference for
quicksort). For larger sorts it's similar, but performs multiple
passes, which takes better advantage of modern CPUs. Upon recursion,
sorts on small partitions divert to quicksort. Any necessary tiebreaks
are handled by quicksort, either after the end of radix sort, or when
diverting to small quicksort.
5. Reset isnull1 to "false" before returning to the caller. This also
must be done when diverting to quicksort.

Next steps: Try to find regressions (help welcome here). The v1 patch
has some optimizations, but in other ways things are simple and/or
wasteful. Exactly how things fit together will be informed by what, if
anything, has to be done to avoid regressions. I suspect the challenge
will be multikey sorts when the first key has low cardinality. This is
because tiebreaks are necessarily postponed rather than taken care of
up front. I'm optimistic, since low cardinality cases can be even
faster than our B&M qsort, so we have some headroom:

drop table if exists test;
create unlogged table test (a bigint);
insert into test select
(1_000_000_000 * random())::bigint % 8 -- mod
-- (1_000_000_000 * random())::bigint -- random, for the case at the top
from generate_series(1,1_000_000,1) i;
vacuum freeze test;

select pg_prewarm('test');
set work_mem = '64MB';

set wip_radix_sort = 'off'; select * from test order by a offset 1_000_000_000;
95ms

set wip_radix_sort = 'on'; select * from test order by a offset 1_000_000_000;
84ms

[1]: https://github.com/skarupke/ska_sort/tree/master
[2]: https://probablydance.com/2016/12/27/i-wrote-a-faster-sorting-algorithm/
[3]: http://static.usenix.org/publications/compsystems/1993/win_mcilroy.pdf

--
John Naylor
Amazon Web Services

Attachments:

v1-0001-Use-radix-sort-when-datum1-is-an-integer-type.patchapplication/x-patch; name=v1-0001-Use-radix-sort-when-datum1-is-an-integer-type.patchDownload

From 8185f5b8834d58bddc1c37968d334bd2c97bcc15 Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Fri, 17 Oct 2025 09:57:43 +0700
Subject: [PATCH v1] Use radix sort when datum1 is an integer type

XXX regression tests don't pass for underspecified queries; this
is expected
---
 src/backend/utils/misc/guc_parameters.dat |   7 +
 src/backend/utils/sort/tuplesort.c        | 641 +++++++++++++++++++++-
 src/include/utils/guc.h                   |   1 +
 src/include/utils/tuplesort.h             |  12 +-
 4 files changed, 641 insertions(+), 20 deletions(-)

diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index d6fc8333850..f8fc6c88082 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -681,6 +681,13 @@
   boot_val => 'false',
 },
 
+{ name => 'wip_radix_sort', type => 'bool', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Test radix sort for debugging.',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'wip_radix_sort',
+  boot_val => 'true',
+},
+
 # this is undocumented because not exposed in a standard build
 { name => 'trace_syncscan', type => 'bool', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
   short_desc => 'Generate debugging output for synchronized scanning.',
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 5d4411dc33f..3b820d245e9 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -104,6 +104,7 @@
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -122,6 +123,7 @@
 
 /* GUC variables */
 bool		trace_sort = false;
+bool		wip_radix_sort = true;
 
 #ifdef DEBUG_BOUNDED_SORT
 bool		optimize_bounded_sort = true;
@@ -615,6 +617,408 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 #define ST_DEFINE
 #include "lib/sort_template.h"
 
+
+/*
+ * WIP: For now prefer test coverage of radix sort in Assert builds.
+ * The thresholds are just guesses
+ */
+#ifdef USE_ASSERT_CHECKING
+#define QSORT_THRESHOLD 0
+#define AMERICAN_FLAG_THRESHOLD 0
+#else
+#define QSORT_THRESHOLD 64
+#define AMERICAN_FLAG_THRESHOLD 1500
+#endif
+
+typedef struct PartitionInfo
+{
+	union
+	{
+		size_t		count;
+		size_t		offset;
+	};
+	size_t		next_offset;
+}			PartitionInfo;
+
+static inline uint8_t
+extract_key(Datum key, int level)
+{
+	return (key >> (((SIZEOF_DATUM - 1) - level) * 8)) & 0xFF;
+}
+
+static inline void
+swap(SortTuple *a, SortTuple *b)
+{
+	SortTuple	tmp = *a;
+
+	*a = *b;
+	*b = tmp;
+}
+
+/*
+ * Condition datum to work with pure unsigned comparison,
+ * taking ASC/DESC into account as well.
+ */
+static inline Datum
+condition_datum(Datum orig, SortSupport ssup)
+{
+	Datum		cond_datum1;
+
+	if (ssup->comparator == ssup_datum_signed_cmp)
+	{
+		/* it was already cast to unsigned when stored */
+		cond_datum1 = orig ^ (UINT64CONST(1) << 63);
+	}
+	else if (ssup->comparator == ssup_datum_int32_cmp)
+	{
+		/*
+		 * First normalize to uint32. Technically, we don't need to do this,
+		 * but it forces the upper bytes to remain the same regardless of
+		 * sign.
+		 */
+		uint32		u32 = DatumGetUInt32(orig) ^ ((uint32) 1 << 31);
+
+		cond_datum1 = UInt32GetDatum(u32);
+	}
+	else
+	{
+		Assert(ssup->comparator == ssup_datum_unsigned_cmp);
+		cond_datum1 = orig;
+	}
+
+	if (ssup->ssup_reverse)
+		cond_datum1 = ~cond_datum1;
+
+	return cond_datum1;
+}
+
+/*
+ * Based on implementation in https://github.com/skarupke/ska_sort (Boost license)
+ * TODO: match qsort API with number of elements rather than end pointer
+ */
+static void
+american_flag_sort(SortTuple *begin,
+				   SortTuple *end, int level, Tuplesortstate *state)
+{
+	PartitionInfo partitions[256] = {0};
+	uint8_t		remaining_partitions[256] = {0};
+	size_t		total = 0;
+	int			num_partitions = 0;
+
+	/* count key chunks */
+	for (SortTuple *tup = begin; tup < end; tup++)
+	{
+		uint8		key_chunk;
+
+		key_chunk = extract_key(tup->cond_datum1, level);
+		partitions[key_chunk].count++;
+	}
+
+	/* compute partition offsets */
+	for (int i = 0; i < 256; ++i)
+	{
+		size_t		count = partitions[i].count;
+
+		if (!count)
+			continue;
+		partitions[i].offset = total;
+		total += count;
+		partitions[i].next_offset = total;
+		remaining_partitions[num_partitions] = i;
+		++num_partitions;
+	}
+
+	/* permute tuples to correct partition */
+	if (num_partitions > 1)
+	{
+		uint8_t    *current_block_ptr = remaining_partitions;
+		PartitionInfo *current_block = partitions + *current_block_ptr;
+		uint8_t    *last_block = remaining_partitions + num_partitions - 1;
+		SortTuple  *it = begin;
+		SortTuple  *block_end = begin + current_block->next_offset;
+		SortTuple  *last_element = end - 1;
+
+		for (;;)
+		{
+			PartitionInfo *block = partitions + extract_key(it->cond_datum1, level);
+
+			if (block == current_block)
+			{
+				++it;
+				if (it == last_element)
+					break;
+				else if (it == block_end)
+				{
+					for (;;)
+					{
+						++current_block_ptr;
+						if (current_block_ptr == last_block)
+							goto recurse;
+						current_block = partitions + *current_block_ptr;
+						if (current_block->offset != current_block->next_offset)
+							break;
+					}
+
+					it = begin + current_block->offset;
+					block_end = begin + current_block->next_offset;
+				}
+			}
+			else
+			{
+				size_t		offset = block->offset++;
+
+				swap(it, begin + offset);
+			}
+		}
+	}
+
+recurse:
+	size_t		start_offset = 0;
+	SortTuple  *partition_begin = begin;
+
+	for (uint8_t *it = remaining_partitions, *end = remaining_partitions + num_partitions;
+		 it != end;
+		 ++it)
+	{
+		size_t		end_offset = partitions[*it].next_offset;
+		SortTuple  *partition_end = begin + end_offset;
+		ptrdiff_t	num_elements = end_offset - start_offset;
+
+		if (num_elements > 1)
+		{
+			if (level < SIZEOF_DATUM - 1)
+			{
+				if (num_elements < QSORT_THRESHOLD)
+				{
+					/* restore NOT NULL for fallback qsort */
+					/*
+					 * WIP: Maybe we can have a qsort that skips the NULL
+					 * comparisons, compares directly on cond_datum1, and only
+					 * restores isnull1 if we actually call the tiebreak.
+					 */
+					for (SortTuple *tup = partition_begin;
+						 tup < partition_begin + num_elements;
+						 tup++)
+						tup->isnull1 = false;
+
+					qsort_tuple(partition_begin,
+								num_elements,
+								state->base.comparetup,
+								state);
+				}
+				else
+				{
+					american_flag_sort(partition_begin,
+									   partition_end,
+									   level + 1,
+									   state);
+				}
+			}
+			else if (state->base.onlyKey == NULL)
+			{
+				/*
+				 * Finished radix sort on all bytes of cond_datum1 (possibily
+				 * abbreviated), now qsort with tiebreak comparator.
+				 */
+
+				/* first restore NOT NULL for later comparators */
+				for (SortTuple *tup = partition_begin;
+					 tup < partition_begin + num_elements;
+					 tup++)
+					tup->isnull1 = false;
+
+				qsort_tuple(partition_begin,
+							num_elements,
+							state->base.comparetup_tiebreak,
+							state);
+			}
+		}
+		start_offset = end_offset;
+		partition_begin = partition_end;
+	}
+}
+
+/*
+ * Based on implementation in https://github.com/skarupke/ska_sort (Boost license),
+ * with the following changes:
+ *  - unroll loop in counting step
+ *  - count sorted partitions in every pass, rather than maintaining list of unsorted partitions
+ * TODO: match qsort API with number of elements rather than end pointer
+ */
+static void
+ska_byte_sort(SortTuple *begin,
+			  SortTuple *end, int level, Tuplesortstate *state)
+{
+	/* size_t		counts0[256] = {0}; */
+	size_t		counts1[256] = {0};
+	size_t		counts2[256] = {0};
+	size_t		counts3[256] = {0};
+	PartitionInfo partitions[256] = {0};
+	uint8_t		remaining_partitions[256] = {0};
+	size_t		total = 0;
+	int			num_partitions = 0;
+	int			num_remaining;
+	SortTuple  *ctup;
+
+	/* count key chunks, unrolled for speed */
+
+	for (ctup = begin; ctup + 4 < end; ctup += 4)
+	{
+		uint8		key_chunk0 = extract_key((ctup + 0)->cond_datum1, level);
+		uint8		key_chunk1 = extract_key((ctup + 1)->cond_datum1, level);
+		uint8		key_chunk2 = extract_key((ctup + 2)->cond_datum1, level);
+		uint8		key_chunk3 = extract_key((ctup + 3)->cond_datum1, level);
+
+		partitions[key_chunk0].count++;
+		counts1[key_chunk1]++;
+		counts2[key_chunk2]++;
+		counts3[key_chunk3]++;
+
+	}
+
+	for (size_t i = 0; i < 256; i++)
+		partitions[i].count += counts1[i] + counts2[i] + counts3[i];
+
+	for (; ctup < end; ctup++)
+	{
+		uint8		key_chunk;
+
+		key_chunk = extract_key(ctup->cond_datum1, level);
+		partitions[key_chunk].count++;
+	}
+
+	/* compute partition offsets */
+	for (int i = 0; i < 256; ++i)
+	{
+		size_t		count = partitions[i].count;
+
+		if (count)
+		{
+			partitions[i].offset = total;
+			total += count;
+			remaining_partitions[num_partitions] = i;
+			++num_partitions;
+		}
+		partitions[i].next_offset = total;
+	}
+
+	num_remaining = num_partitions;
+
+	/*
+	 * Permute tuples to correct partition. If we started with one partition,
+	 * there is nothing to do. If a permutation from a previous iteration
+	 * results in a single partition that hasn't been marked as sorted, we
+	 * know it's actually sorted.
+	 */
+	while (num_remaining > 1)
+	{
+		/*
+		 * We can only exit the loop when all partitions are sorted, so must
+		 * reset every iteration
+		 */
+		num_remaining = num_partitions;
+
+		for (int i = 0; i < num_partitions; i++)
+		{
+			uint8		idx = remaining_partitions[i];
+
+			PartitionInfo part = partitions[idx];
+
+			for (SortTuple *st = begin + part.offset;
+				 st < begin + part.next_offset;
+				 st++)
+			{
+				uint8		this_partition = extract_key(st->cond_datum1, level);
+				size_t		offset = partitions[this_partition].offset++;
+
+				Assert(begin + offset < end);
+				swap(st, begin + offset);
+			};
+
+			if (part.offset == part.next_offset)
+			{
+				/* partition is sorted; skip */
+				num_remaining--;
+			}
+		}
+	}
+
+	{
+		size_t		start_offset = 0;
+		SortTuple  *partition_begin = begin;
+
+		for (uint8_t *it = remaining_partitions, *end = remaining_partitions + num_partitions;
+			 it != end;
+			 ++it)
+		{
+			size_t		end_offset = partitions[*it].next_offset;
+			SortTuple  *partition_end = begin + end_offset;
+			ptrdiff_t	num_elements = end_offset - start_offset;
+
+			if (num_elements > 1)
+			{
+				if (level < SIZEOF_DATUM - 1)
+				{
+					if (num_elements < QSORT_THRESHOLD)
+					{
+						/* restore NOT NULL for fallback qsort */
+						/*
+						 * WIP: Maybe we can have a qsort that skips the NULL
+						 * comparisons, compares directly on cond_datum1, and
+						 * only restores isnull1 if we actually call the
+						 * tiebreak.
+						 */
+						for (SortTuple *tup = partition_begin;
+							 tup < partition_begin + num_elements;
+							 tup++)
+							tup->isnull1 = false;
+
+						qsort_tuple(partition_begin,
+									num_elements,
+									state->base.comparetup,
+									state);
+					}
+					else if (num_elements < AMERICAN_FLAG_THRESHOLD)
+					{
+						american_flag_sort(partition_begin,
+										   partition_end,
+										   level + 1,
+										   state);
+					}
+					else
+					{
+						ska_byte_sort(partition_begin,
+									  partition_end,
+									  level + 1,
+									  state);
+					}
+				}
+				else if (state->base.onlyKey == NULL)
+				{
+					/*
+					 * Finished radix sort on all bytes of cond_datum1
+					 * (possibily abbreviated), now qsort with tiebreak
+					 * comparator.
+					 */
+
+					/* first restore NOT NULL for later comparators */
+					for (SortTuple *tup = partition_begin;
+						 tup < partition_begin + num_elements;
+						 tup++)
+						tup->isnull1 = false;
+
+					qsort_tuple(partition_begin,
+								num_elements,
+								state->base.comparetup_tiebreak,
+								state);
+				}
+			}
+			start_offset = end_offset;
+			partition_begin = partition_end;
+		}
+	}
+}
+
 /*
  *		tuplesort_begin_xxx
  *
@@ -2663,8 +3067,192 @@ sort_bounded_heap(Tuplesortstate *state)
 	state->boundUsed = true;
 }
 
+/* WIP: allow turning common prefix skipping off for testing */
+#define COMMON_PREFIX
+
+/*
+ * Prepare SortTuples for radix sort before dispatch to the actual sort.
+ */
+static void
+radixsort_tuple(Tuplesortstate *state)
+{
+	SortSupportData ssup = state->base.sortKeys[0];
+
+	bool		nulls_first = ssup.ssup_nulls_first;
+	SortTuple  *first = state->memtuples;
+	SortTuple  *last = state->memtuples + state->memtupcount;
+	SortTuple  *not_null_start;
+	size_t		d1,
+				d2,
+				not_null_count;
+#ifdef COMMON_PREFIX
+	Datum		first_datum = 0;
+	Datum		common_upper_bits = 0;
+#endif
+	int			common_prefix;
+
+	/*
+	 * Partition by isnull1, since we can only radix sort on non-NULL
+	 * elements.
+	 */
+
+	/*
+	 * Find the leftmost NOT NULL tuple if NULLS FIRST, or leftmost NULL
+	 * element if NULLS LAST.
+	 */
+	while (first < last && first->isnull1 == nulls_first)
+		first++;
+
+	/*
+	 * XXX We must start "last" after the final tuple to maintain the
+	 * invariant that it ends up one after the first partition, and the first
+	 * partition may correspond to the entire array. If "first" isn't gotten
+	 * this far, we need to pre-decrement "last" before beginning its loop.
+	 */
+	if (first < last)
+		last--;
+
+	/*
+	 * Find the rightmost NULL tuple if NULLS FIRST, or rightmost NOT NULL
+	 * tuple if NULLS LAST.
+	 */
+	while (first < last && last->isnull1 != nulls_first)
+		last--;
+
+	/* swap pairs of tuples that are in the wrong order */
+	while (first < last)
+	{
+		swap(first, last);
+		while (first < last && first->isnull1 == nulls_first)
+			first++;
+		while (first < last && last->isnull1 != nulls_first)
+			last--;
+	}
+
+	d1 = last - state->memtuples;
+	d2 = state->memtupcount - d1;
+
+	Assert(last = first);
+	Assert(last + d2 == state->memtuples + state->memtupcount);
+	for (SortTuple *pm = state->memtuples;
+		 pm < state->memtuples + d1;
+		 pm++)
+		Assert(pm->isnull1 == nulls_first);
+	for (SortTuple *pm = last;
+		 pm < last + d2;
+		 pm++)
+		Assert(pm->isnull1 != nulls_first);
+
+	/*
+	 * Sort null partition using tiebreak comparator. XXX this will repeat the
+	 * NULL check for abbreviated keys.
+	 */
+	if (nulls_first)
+	{
+		qsort_tuple(state->memtuples,
+					d1,
+					state->base.comparetup_tiebreak,
+					state);
+		not_null_start = last;
+		not_null_count = d2;
+	}
+	else
+	{
+		qsort_tuple(last,
+					d2,
+					state->base.comparetup_tiebreak,
+					state);
+		not_null_start = state->memtuples;
+		not_null_count = d1;
+	}
+
+	/*
+	 * Condition datum so that unsigned comparision is order-preserving, and
+	 * compute the common prefix to skip unproductive recursion steps.
+	 */
+	for (SortTuple *tup = not_null_start;
+		 tup < not_null_start + not_null_count;
+		 tup++)
+	{
+		Datum		cond_datum1 = condition_datum(tup->datum1, &ssup);
+#ifdef COMMON_PREFIX
+		if (tup == not_null_start)
+		{
+			/* Need to start with some value, may as well be the first one. */
+			first_datum = cond_datum1;
+		}
+		else
+		{
+			Datum		this_common_bits;
+
+			/* The bits in common will be zero */
+			this_common_bits = first_datum ^ cond_datum1;
+
+			/*
+			 * We're really only interested in the case where the rightmost
+			 * one bit is further right, but this branch should be rare enough
+			 * not to waste cycles trying harder.
+			 */
+			if (this_common_bits > common_upper_bits)
+				common_upper_bits = this_common_bits;
+		}
+#endif
+		tup->cond_datum1 = cond_datum1;
+	}
+
+	/*
+	 * The upper bits are zero where all values are the same, if any. Turn the
+	 * byte position of the rightmost one bit into the byte where radix sort
+	 * should start bucketing. OR-ing in the lowest bit guards against
+	 * undefined behavior without changing the result.
+	 */
+#ifdef COMMON_PREFIX
+	common_prefix = sizeof(Datum) - 1 -
+		(pg_leftmost_one_pos64(common_upper_bits | 1) / BITS_PER_BYTE);
+#else
+	common_prefix = 0;
+#endif
+	/* perform the radix sort on the not-NULL partition */
+	ska_byte_sort(not_null_start,
+				  not_null_start + not_null_count,
+				  common_prefix,
+				  state);
+
+	/*
+	 * Restore fields that were overwritten with temporary conditioned datum1
+	 */
+	for (SortTuple *tup = not_null_start;
+		 tup < not_null_start + not_null_count;
+		 tup++)
+	{
+		/* need to restore NOT NULL */
+		tup->isnull1 = false;
+		/* be tidy */
+		tup->srctape = 0;
+	}
+}
+
+/* Verify sort using standard comparator. */
+static void
+check_sorted(Tuplesortstate *state)
+{
+#ifdef USE_ASSERT_CHECKING
+	for (SortTuple *pm = state->memtuples + 1;
+		 pm < state->memtuples + state->memtupcount;
+		 pm++)
+	{
+#if 0
+		Assert(COMPARETUP(state, pm - 1, pm) <= 0);
+#else
+		if (COMPARETUP(state, pm - 1, pm) > 0)
+			elog(ERROR, "SORT FAILED");
+#endif
+	}
+#endif
+}
+
 /*
- * Sort all memtuples using specialized qsort() routines.
+ * Sort all memtuples using specialized routines.
  *
  * Quicksort is used for small in-memory sorts, and external sort runs.
  */
@@ -2681,26 +3269,43 @@ tuplesort_sort_memtuples(Tuplesortstate *state)
 		 */
 		if (state->base.haveDatum1 && state->base.sortKeys)
 		{
-			if (state->base.sortKeys[0].comparator == ssup_datum_unsigned_cmp)
-			{
-				qsort_tuple_unsigned(state->memtuples,
-									 state->memtupcount,
-									 state);
-				return;
-			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_signed_cmp)
+			SortSupportData ssup = state->base.sortKeys[0];
+
+			if (wip_radix_sort)
 			{
-				qsort_tuple_signed(state->memtuples,
-								   state->memtupcount,
-								   state);
-				return;
+				if (state->memtupcount > QSORT_THRESHOLD &&
+					(ssup.comparator == ssup_datum_unsigned_cmp ||
+					 ssup.comparator == ssup_datum_signed_cmp ||
+					 ssup.comparator == ssup_datum_int32_cmp))
+				{
+					radixsort_tuple(state);
+					check_sorted(state);
+					return;
+				}
 			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_int32_cmp)
+			else
 			{
-				qsort_tuple_int32(state->memtuples,
-								  state->memtupcount,
-								  state);
-				return;
+				if (state->base.sortKeys[0].comparator == ssup_datum_unsigned_cmp)
+				{
+					qsort_tuple_unsigned(state->memtuples,
+										 state->memtupcount,
+										 state);
+					return;
+				}
+				else if (state->base.sortKeys[0].comparator == ssup_datum_signed_cmp)
+				{
+					qsort_tuple_signed(state->memtuples,
+									   state->memtupcount,
+									   state);
+					return;
+				}
+				else if (state->base.sortKeys[0].comparator == ssup_datum_int32_cmp)
+				{
+					qsort_tuple_int32(state->memtuples,
+									  state->memtupcount,
+									  state);
+					return;
+				}
 			}
 		}
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index f21ec37da89..bc6f7fa60f3 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -324,6 +324,7 @@ extern PGDLLIMPORT int tcp_user_timeout;
 extern PGDLLIMPORT char *role_string;
 extern PGDLLIMPORT bool in_hot_standby_guc;
 extern PGDLLIMPORT bool trace_sort;
+extern PGDLLIMPORT bool wip_radix_sort;
 
 #ifdef DEBUG_BOUNDED_SORT
 extern PGDLLIMPORT bool optimize_bounded_sort;
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index ef79f259f93..b2ecbbc9e51 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -149,8 +149,16 @@ typedef struct
 {
 	void	   *tuple;			/* the tuple itself */
 	Datum		datum1;			/* value of first key column */
-	bool		isnull1;		/* is first key column NULL? */
-	int			srctape;		/* source tape number */
+
+	union
+	{
+		struct
+		{
+			bool		isnull1;		/* is first key column NULL? */
+			int			srctape;		/* source tape number */
+		};
+		Datum		cond_datum1;		/* sort key for radix sort */
+	};
 } SortTuple;
 
 typedef int (*SortTupleComparator) (const SortTuple *a, const SortTuple *b,
-- 
2.51.0

Chao Li

li.evan.chao@gmail.com

2 months ago

In reply to: John Naylor (#1)

Re: tuple radix sort

On Oct 29, 2025, at 14:28, John Naylor <johncnaylorls@gmail.com> wrote:

I suspect the challenge
will be multikey sorts when the first key has low cardinality.

As you predicted, when the first key has very low cardinality, radix is a little bit slower. I built a test that proves that:

```
evantest=# drop table if exists test_multi;
evantest=# create unlogged table test_multi (category int, name text);
— first column has only 4 distinct values
evantest=# insert into test_multi select (random() * 4)::int as category, md5(random()::text) || md5(random()::text) as name from generate_series(1, 1000000);
evantest=# vacuum freeze test_multi;
evantest=# select count(*) from test_multi;
evantest=# set work_mem = '64MB’;

evantest-# \timing on
Timing is on.
evantest=# set wip_radix_sort = 'off';
Time: 0.403 ms
evantest=# \o /dev/null
evantest=# select * from test_multi order by category, name;
Time: 5607.336 ms (00:05.607)
evantest=# select * from test_multi order by category, name;
Time: 5703.555 ms (00:05.704)
evantest=# select * from test_multi order by category, name;
Time: 5692.644 ms (00:05.693)

evantest=# set wip_radix_sort = 'on';
Time: 0.859 ms
evantest=# select * from test_multi order by category, name;
Time: 5822.979 ms (00:05.823)
evantest=# select * from test_multi order by category, name;
Time: 5881.256 ms (00:05.881)
evantest=# select * from test_multi order by category, name;
Time: 5976.351 ms (00:05.976)
```

Roughly 5% slower for this corner case.

However, when I recreate the test table with high cardinality first column, wip_radix_sort seems still slower:

```
evantest=# \o
evantest=# drop table if exists test_multi;
DROP TABLE
evantest=# create unlogged table test_multi (category int, name text);
CREATE TABLE
evantest=# insert into test_multi
evantest-# select (random() * 1000000)::int as category, md5(random()::text) || md5(random()::text) as name from generate_series(1, 1000000);
INSERT 0 1000000
evantest=# vacuum freeze test_multi;
VACUUM
evantest=# select count(*) from test_multi;
count
---------
1000000
(1 row)

evantest=# select * from test_multi limit 5;
category | name
----------+------------------------------------------------------------------
607050 | c555126a5afea9f5ffe3880248c89944d211bc378f8c3b6d125b4360fe8619b7
843579 | 69b5a1dba76f52ff238566a3f88315a7425116d5d271fb54701b6e49d4afd8ce
106298 | a96e8674db219e12463ecdbb405b99c631767972e489093045c97976c17c6561
621860 | 5e6739ea9f533f9cdb0b8db76e3d4ce63be6b2b612c8aff06c4b80451f8f2edc
290110 | 56944320e5abd3a854fffdd185b969727e8d414448d440725a392cda4c6355c4
(5 rows)

evantest=# \timing on
Timing is on.

evantest=# \o /dev/null
evantest=# set wip_radix_sort = 'off';
Time: 0.904 ms
evantest=# select * from test_multi limit 5;
Time: 0.983 ms
evantest=# select * from test_multi order by category, name;
Time: 593.578 ms
evantest=# select * from test_multi order by category, name;
Time: 597.329 ms
evantest=# select * from test_multi order by category, name;
Time: 600.050 ms

evantest=# set wip_radix_sort = 'on';
Time: 0.737 ms
evantest=# select * from test_multi order by category, name;
Time: 611.604 ms
evantest=# select * from test_multi order by category, name;
Time: 613.115 ms
evantest=# select * from test_multi order by category, name;
Time: 615.003 ms
```

This seems like a real regression.

Then I tried to only sort on the first column, yes, now radix is faster:

```
evantest=# set wip_radix_sort = 'off’;
evantest=# select * from test_multi order by category;
Time: 445.498 ms
evantest=# select * from test_multi order by category;
Time: 451.834 ms
evantest=# select * from test_multi order by category;
Time: 454.531 ms

evantest=# set wip_radix_sort = 'on';
Time: 0.329 ms
evantest=# select * from test_multi order by category;
Time: 402.829 ms
evantest=# select * from test_multi order by category;
Time: 408.014 ms
evantest=# select * from test_multi order by category;
Time: 415.340 ms
evantest=# select * from test_multi order by category;
Time: 413.969 ms
```

Hope the test helps. (The test was run a MacBook M4. )

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

John Naylor

johncnaylorls@gmail.com

2 months ago

In reply to: Chao Li (#2)

Re: tuple radix sort

On Wed, Oct 29, 2025 at 3:25 PM Chao Li <li.evan.chao@gmail.com> wrote:

On Oct 29, 2025, at 14:28, John Naylor <johncnaylorls@gmail.com> wrote:

I suspect the challenge
will be multikey sorts when the first key has low cardinality.

As you predicted, when the first key has very low cardinality, radix is a little bit slower. I built a test that proves that:

```
evantest=# drop table if exists test_multi;
evantest=# create unlogged table test_multi (category int, name text);
— first column has only 4 distinct values

Thanks for testing. Note it's actually 5 because of rounding. Your
text also seems to have em-dashes and unicode apostrophes where it
should have dashes / single quotes. That's not great if you expect
others to try to reproduce. I'm also not thrilled about having to
remove your psql prompt.

drop table if exists test_multi;
create unlogged table test_multi (category int, name text);
insert into test_multi select (random() * 4)::int as category,
md5(random()::text) || md5(random()::text) as name from
generate_series(1, 1000000);
vacuum freeze test_multi;

Anyway, because this table is larger than my first example, the input
no longer fits into 64MB of work_mem and it switches to an external
merge sort. Normally I set work_mem to 1GB for testing sorts so I
don't have to think about it, but neglected to in my first email. I
don't know if that explains the disparity, but I've made that change
for my quick tests below.

evantest=# \o /dev/null
evantest=# select * from test_multi order by category, name;

[...]

Roughly 5% slower for this corner case.

Seems fine for me on this old Intel laptop (min of 5 runs):

set wip_radix_sort = 'off';
2368.369

set wip_radix_sort = 'on';
2290.654

It's close enough that I'll want to test more closely at a range of
low-cardinality inputs. I haven't done any rigorous scripted testing
yet, so take this with a grain of salt.

However, when I recreate the test table with high cardinality first column, wip_radix_sort seems still slower:

drop table if exists test_multi;
create unlogged table test_multi (category int, name text);
insert into test_multi select (random() * 1000000)::int as category,
md5(random()::text) || md5(random()::text) as name from
generate_series(1, 1000000);
vacuum freeze test_multi;

evantest=# set wip_radix_sort = 'off';
Time: 0.904 ms

evantest=# select * from test_multi order by category, name;
Time: 593.578 ms
evantest=# select * from test_multi order by category, name;
Time: 597.329 ms
evantest=# select * from test_multi order by category, name;
Time: 600.050 ms

evantest=# set wip_radix_sort = 'on';
Time: 0.737 ms
evantest=# select * from test_multi order by category, name;
Time: 611.604 ms
evantest=# select * from test_multi order by category, name;
Time: 613.115 ms
evantest=# select * from test_multi order by category, name;
Time: 615.003 ms
```

This seems like a real regression.

It's better for me here (min of 5 again), although the time scanning
the table probably dominates:

off:
536.257

on:
471.345

--
John Naylor
Amazon Web Services

Chao Li

li.evan.chao@gmail.com

2 months ago

In reply to: John Naylor (#3)

Re: tuple radix sort

On Oct 29, 2025, at 19:29, John Naylor <johncnaylorls@gmail.com> wrote:

On Wed, Oct 29, 2025 at 3:25 PM Chao Li <li.evan.chao@gmail.com> wrote:

On Oct 29, 2025, at 14:28, John Naylor <johncnaylorls@gmail.com> wrote:

I suspect the challenge
will be multikey sorts when the first key has low cardinality.

As you predicted, when the first key has very low cardinality, radix is a little bit slower. I built a test that proves that:

```
evantest=# drop table if exists test_multi;
evantest=# create unlogged table test_multi (category int, name text);
— first column has only 4 distinct values

Thanks for testing. Note it's actually 5 because of rounding.

Yes, 0-4, totally 5.

Your
text also seems to have em-dashes and unicode apostrophes where it
should have dashes / single quotes. That's not great if you expect
others to try to reproduce.

I just copied the content from psql (running in iTerm). I did a Google search, and found that was because of Mac Mail’s “smart quotes” substitution. Looks like even I manually type in a pair of single quotes, it still does the substitution. I will try to see how to disable that, but I don’t want to switch to another mail app.

I'm also not thrilled about having to
remove your psql prompt.

I just wanted to show my entire test process, so I simply copied all contents from psql. In future, I will remove psql prompts from reproduce procedure.

drop table if exists test_multi;
create unlogged table test_multi (category int, name text);
insert into test_multi select (random() * 4)::int as category,
md5(random()::text) || md5(random()::text) as name from
generate_series(1, 1000000);
vacuum freeze test_multi;

Anyway, because this table is larger than my first example, the input
no longer fits into 64MB of work_mem and it switches to an external
merge sort. Normally I set work_mem to 1GB for testing sorts so I
don't have to think about it, but neglected to in my first email.

I changed work_men to 1GB and reran the test. As the high cardinality data are still there, so I first reran with data:

```
evantest=# set work_mem = '1GB';
Time: 0.301 ms
evantest=#
evantest=# select * from test_multi order by category, name;
Time: 575.247 ms
evantest=# select * from test_multi order by category, name;
Time: 554.351 ms
evantest=# select * from test_multi order by category, name;
Time: 565.100 ms
evantest=#
evantest=# set wip_radix_sort = 'on';
Time: 0.752 ms
evantest=# select * from test_multi order by category, name;
Time: 558.057 ms
evantest=# select * from test_multi order by category, name;
Time: 565.542 ms
evantest=# select * from test_multi order by category, name;
Time: 559.973 ms
```

With radix_sort on and off, execution time are almost the same.

Then I restore the data to low cardinality, off is still faster than on:
```
evantest=# set wip_radix_sort = ‘off';
Time: 0.549 ms
evantest=# select * from test_multi order by category, name;
Time: 5509.075 ms (00:05.509)
evantest=# select * from test_multi order by category, name;
Time: 5553.566 ms (00:05.554)
evantest=# select * from test_multi order by category, name;
Time: 5598.595 ms (00:05.599)
evantest=# set wip_radix_sort = ‘on';
Time: 0.786 ms
evantest=#
evantest=# select * from test_multi order by category, name;
Time: 5770.964 ms (00:05.771)
evantest=# select * from test_multi order by category, name;
Time: 5779.755 ms (00:05.780)
evantest=# select * from test_multi order by category, name;
Time: 5851.134 ms (00:05.851)
evantest=#
evantest=# set work_mem = '2GB’; # increasing work_mem to 2GB doesn’t help
Time: 0.404 ms
evantest=#
evantest=# select * from test_multi order by category, name;
Time: 5781.005 ms (00:05.781)
evantest=# select * from test_multi order by category, name;
Time: 5826.025 ms (00:05.826)
evantest=# select * from test_multi order by category, name;
Time: 5937.919 ms (00:05.938)
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Chao Li

li.evan.chao@gmail.com

2 months ago

In reply to: John Naylor (#1)

Re: tuple radix sort

On Oct 29, 2025, at 14:28, John Naylor <johncnaylorls@gmail.com> wrote:

<v1-0001-Use-radix-sort-when-datum1-is-an-integer-type.patch>

I just quick went through the code change. I guess I need more time to understand the entire logic, but I find a typo that might effect the tests:

```
+ Assert(last = first);
```

“=“ should be “=="

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

John Naylor

johncnaylorls@gmail.com

2 months ago

In reply to: Chao Li (#5)

Re: tuple radix sort

On Thu, Oct 30, 2025 at 8:56 AM Chao Li <li.evan.chao@gmail.com> wrote:

I changed work_men to 1GB and reran the test. As the high cardinality data are still there, so I first reran with data:

With radix_sort on and off, execution time are almost the same.

Are you by chance running with asserts on? It's happened before, so I
have to make sure. That makes a big difference here because I disabled
diversion thresholds in assert builds so that regression tests (few
cases with large inputs) cover the paths I want, in addition to my
running a standalone stress test.

Speaking of tests, I forgot to mention that regression tests will fail
since in-place radix sort is an unstable sort, as qsort is as well,
but in a different way -- this is expected. In assert builds, the
patch verifies the sort after the fact with the standard comparator,
and will throw an error if it's wrong.

On Thu, Oct 30, 2025 at 9:19 AM Chao Li <li.evan.chao@gmail.com> wrote:

I just quick went through the code change. I guess I need more time to understand the entire logic, but I find a typo that might effect the tests:

```
+ Assert(last = first);
```

“=“ should be “=="

Yes, you're quite right, thanks for spotting! I reran my stress test
that has randomly distributed NULLs and the assert still holds, so
nothing further to fix yet. The NULL partitioning part of the code
hasn't been well tested in its current form, and I may arrange things
so that that step and the datum conditioning step happen in a single
pass. I'm not yet sure if it's important enough to justify the
additional complexity.

--
John Naylor
Amazon Web Services

Chao Li

li.evan.chao@gmail.com

2 months ago

In reply to: John Naylor (#6)

Re: tuple radix sort

On Oct 30, 2025, at 11:40, John Naylor <johncnaylorls@gmail.com> wrote:

On Thu, Oct 30, 2025 at 8:56 AM Chao Li <li.evan.chao@gmail.com> wrote:

I changed work_men to 1GB and reran the test. As the high cardinality data are still there, so I first reran with data:

With radix_sort on and off, execution time are almost the same.

Are you by chance running with asserts on? It's happened before, so I
have to make sure. That makes a big difference here because I disabled
diversion thresholds in assert builds so that regression tests (few
cases with large inputs) cover the paths I want, in addition to my
running a standalone stress test.

Yes, assert is always enabled in my sandbox. I can disable assert and rerun the test later.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

David Rowley

dgrowleyml@gmail.com

2 months ago

In reply to: Chao Li (#7)

Re: tuple radix sort

On Thu, 30 Oct 2025 at 16:46, Chao Li <li.evan.chao@gmail.com> wrote:

On Oct 30, 2025, at 11:40, John Naylor <johncnaylorls@gmail.com> wrote:
Are you by chance running with asserts on? It's happened before, so I
have to make sure. That makes a big difference here because I disabled
diversion thresholds in assert builds so that regression tests (few
cases with large inputs) cover the paths I want, in addition to my
running a standalone stress test.

Yes, assert is always enabled in my sandbox. I can disable assert and rerun the test later.

Never expect anything meaningful to come from running performance
tests on Assert builds. You should always be rebuilding without
Asserts before doing performance tests.

David

Chao Li

li.evan.chao@gmail.com

2 months ago

In reply to: David Rowley (#8)

Re: tuple radix sort

On Oct 30, 2025, at 13:01, David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 30 Oct 2025 at 16:46, Chao Li <li.evan.chao@gmail.com> wrote:

On Oct 30, 2025, at 11:40, John Naylor <johncnaylorls@gmail.com> wrote:
Are you by chance running with asserts on? It's happened before, so I
have to make sure. That makes a big difference here because I disabled
diversion thresholds in assert builds so that regression tests (few
cases with large inputs) cover the paths I want, in addition to my
running a standalone stress test.

Yes, assert is always enabled in my sandbox. I can disable assert and rerun the test later.

Never expect anything meaningful to come from running performance
tests on Assert builds. You should always be rebuilding without
Asserts before doing performance tests.

Sure, good to learn. Actually I am very new to PG development, so any guidance is greatly appreciated.

I just made a distclean, then configure without any parameter. Now, the overall execution time reduced ~10% than with asserts. With the low cardinality data, off and on are very close:

```
evantest=# set wip_radix_sort = 'off';
Time: 0.206 ms
evantest=# select * from test_multi order by category, name;
Time: 5070.277 ms (00:05.070)
evantest=# select * from test_multi order by category, name;
Time: 5158.748 ms (00:05.159)
evantest=# select * from test_multi order by category, name;
Time: 5072.708 ms (00:05.073)

evantest=# set wip_radix_sort = 'on';
Time: 0.177 ms
evantest=# select * from test_multi order by category, name;
Time: 4992.516 ms (00:04.993)
evantest=# select * from test_multi order by category, name;
Time: 5145.361 ms (00:05.145)
evantest=# select * from test_multi order by category, name;
Time: 5101.800 ms (00:05.102)

evantest=# \o
evantest=# show work_mem;
work_mem
----------
1GB
(1 row)

Time: 0.186 ms
evantest=# explain select * from test_multi order by category, name;
QUERY PLAN
---------------------------------------------------------------------------
Sort (cost=122003.84..124503.84 rows=1000000 width=69)
Sort Key: category, name
-> Seq Scan on test_multi (cost=0.00..22346.00 rows=1000000 width=69)
(3 rows)
```

And with high cardinality test data, on has a big win:
```
evantest=# set wip_radix_sort = 'off';
Time: 0.174 ms
evantest=# select * from test_multi order by category, name;
Time: 353.702 ms
evantest=# select * from test_multi order by category, name;
Time: 375.549 ms
evantest=# select * from test_multi order by category, name;
Time: 367.967 ms
evantest=# set wip_radix_sort = 'on';
Time: 0.147 ms
evantest=# select * from test_multi order by category, name;
Time: 279.537 ms
evantest=# select * from test_multi order by category, name;
Time: 278.114 ms
evantest=# select * from test_multi order by category, name;
Time: 284.273 ms
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#10

John Naylor

johncnaylorls@gmail.com

2 months ago

In reply to: John Naylor (#1)

2 attachment(s)

Re: tuple radix sort

I wrote:

The v1 patch
has some optimizations, but in other ways things are simple and/or
wasteful. Exactly how things fit together will be informed by what, if
anything, has to be done to avoid regressions.

In v1, radix sort diverts to qsort_tuple for small partitions (similar
to how quicksort diverts to insertion sort), but qsort_tuple is
inefficient because the comparator is called via a function pointer.

I also thought having two different radix sorts was too complex, so I
wondered if it'd be better to get rid of the smaller radix sort (whose
control flow I find harder to understand, even ignoring the unsightly
goto) and have the larger sort divert to a new quicksort
specialization that compares on the conditioned datum. That allows
skipping branches for NULL comparisons and order reversal. I've done
this in v2. It makes sense to replace the three current
integer-comparison quicksorts with one.

v1 was careful to restore isnull1 to false when diverting to quicksort
for the tiebreak. v2 doesn't bother, since the only tiebreak in core
that looks at isnull1 is comparetup_datum_tiebreak, which is not
reachable by radix sort, requiring a pass-by-value datum that compares
like an integer. This is a bit of a risk, since it's possible a third
party fork could be doing something weird. Seems unlikely, but
something to keep in mind.

I used a standalone program (attached) to microbenchmark this new
fallback qsort vs. a pass of radix sort on one byte to get a decent
threshold value. This is not quite fair, since the quicksort will then
be finished, but the radix sort could still need to recurse to the
next byte(s), so these number could underestimate the threshold. This
is just to get an idea.

The numbers are in RDTSC ticks per element sorted.

cardinality: 256
number of elements: 100 qsort: 35.4 radix: 49.2
number of elements: 200 qsort: 34.9 radix: 38.1
number of elements: 400 qsort: 42.4 radix: 34.4
number of elements: 800 qsort: 95.0 radix: 29.2
number of elements: 1600 qsort: 115.0 radix: 22.4
number of elements: 3200 qsort: 125.5 radix: 19.4
number of elements: 6400 qsort: 128.1 radix: 17.6

With the highest cardinality possible on a single byte, radix sort is
actually not bad at low inputs. Notice that the time per element is
consistently going down with larger inputs. Smaller inputs have large
constant overheads, made worse by my unrolling the counting step.

cardinality: 2
number of elements: 100 qsort: 09.2 radix: 28.0
number of elements: 200 qsort: 09.1 radix: 19.5
number of elements: 400 qsort: 10.4 radix: 15.7
number of elements: 800 qsort: 10.1 radix: 14.5
number of elements: 1600 qsort: 10.4 radix: 13.7
number of elements: 3200 qsort: 15.8 radix: 13.6
number of elements: 6400 qsort: 22.2 radix: 13.8

This is an extreme best case for B&M quicksort, which is basically
O(n) -- the point at which the per-element time goes up seems purely
due to exceeding L1 cache. Radix sort takes a big input to catch up,
but it doesn't seem awful, either.

cardinality: 16
number of elements: 100 qsort: 19.5 radix: 34.5
number of elements: 200 qsort: 18.7 radix: 22.6
number of elements: 400 qsort: 18.5 radix: 17.2
number of elements: 800 qsort: 25.0 radix: 14.8
number of elements: 1600 qsort: 43.8 radix: 13.8
number of elements: 3200 qsort: 51.2 radix: 13.2
number of elements: 6400 qsort: 59.0 radix: 12.8

This is still low cardinality, but behaves more like the high cardinality case.

I've set the threshold to 400 for now, but I'm not claiming that's the
end story. In addition to the underestimation mentioned above,
unrolling the counting step is a factor. Unrolling makes smaller
inputs worse (which we can reach by recursing from larger inputs), but
unrolling seems important for large inputs with low cardinality (a few
percent, but I haven't shared numbers yet). We've found that a large
input with only 4-5 distinct values just barely wins with radix sort.
I'll be curious to see if unrolling is actually needed to prevent
regressions there.

Other things to consider:

- I don't quite like how the NULL partitioning step looks, and it
could be costly when the distribution of NULL is not predictable, so
I'm thinking of turning part of that into a branch-free cyclic
permutation, similar to
/messages/by-id/CANWCAZbAmaZ7P+ARjS97sJLXsBB5CPZyzFgqNDiqe-L+BqXzug@mail.gmail.com

- The quicksort on the NULL partition still compares isnull1 -- the
branches are predictable but perhaps it's worth it to add a
specialization that skips that.

--
John Naylor
Amazon Web Services

Attachments:

v2-0001-Use-radix-sort-when-datum1-is-an-integer-type.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Use-radix-sort-when-datum1-is-an-integer-type.patchDownload

From 8d643a61386d10a7996f9cddb0965f18ec0fe832 Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Fri, 17 Oct 2025 09:57:43 +0700
Subject: [PATCH v2] Use radix sort when datum1 is an integer type

XXX regression tests don't pass for underspecified queries; this
is expected
---
 src/backend/utils/misc/guc_parameters.dat |   7 +
 src/backend/utils/sort/tuplesort.c        | 506 +++++++++++++++++++++-
 src/include/utils/guc.h                   |   1 +
 src/include/utils/tuplesort.h             |  12 +-
 4 files changed, 506 insertions(+), 20 deletions(-)

diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index d6fc8333850..f8fc6c88082 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -681,6 +681,13 @@
   boot_val => 'false',
 },
 
+{ name => 'wip_radix_sort', type => 'bool', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Test radix sort for debugging.',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'wip_radix_sort',
+  boot_val => 'true',
+},
+
 # this is undocumented because not exposed in a standard build
 { name => 'trace_syncscan', type => 'bool', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
   short_desc => 'Generate debugging output for synchronized scanning.',
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 5d4411dc33f..ac91089ac42 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -104,6 +104,7 @@
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -122,6 +123,7 @@
 
 /* GUC variables */
 bool		trace_sort = false;
+bool		wip_radix_sort = true;
 
 #ifdef DEBUG_BOUNDED_SORT
 bool		optimize_bounded_sort = true;
@@ -490,6 +492,25 @@ static void tuplesort_updatemax(Tuplesortstate *state);
  * abbreviations of text or multi-key sorts.  There could be!  Is it worth it?
  */
 
+/* Used for conditioned datums, so we can ignore NULLs and sort direction. */
+static pg_attribute_always_inline int
+qsort_tuple_conditioned_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
+{
+	if (a->cond_datum1 < b->cond_datum1)
+		return -1;
+	if (a->cond_datum1 > b->cond_datum1)
+		return 1;
+
+	/*
+	 * No need to waste effort calling the tiebreak function when there are no
+	 * other keys to sort on.
+	 */
+	if (state->base.onlyKey != NULL)
+		return 0;
+
+	return state->base.comparetup_tiebreak(a, b, state);
+}
+
 /* Used if first key's comparator is ssup_datum_unsigned_cmp */
 static pg_attribute_always_inline int
 qsort_tuple_unsigned_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
@@ -567,6 +588,15 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
  * common comparison functions on pass-by-value leading datums.
  */
 
+#define ST_SORT qsort_tuple_conditioned
+#define ST_ELEMENT_TYPE SortTuple
+#define ST_COMPARE(a, b, state) qsort_tuple_conditioned_compare(a, b, state)
+#define ST_COMPARE_ARG_TYPE Tuplesortstate
+#define ST_CHECK_FOR_INTERRUPTS
+#define ST_SCOPE static
+#define ST_DEFINE
+#include "lib/sort_template.h"
+
 #define ST_SORT qsort_tuple_unsigned
 #define ST_ELEMENT_TYPE SortTuple
 #define ST_COMPARE(a, b, state) qsort_tuple_unsigned_compare(a, b, state)
@@ -615,6 +645,235 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 #define ST_DEFINE
 #include "lib/sort_template.h"
 
+
+/*
+ * WIP: For now prefer test coverage of radix sort in Assert builds.
+ */
+#ifdef USE_ASSERT_CHECKING
+#define QSORT_THRESHOLD 0
+#else
+#define QSORT_THRESHOLD 400
+#endif
+
+typedef struct PartitionInfo
+{
+	union
+	{
+		size_t		count;
+		size_t		offset;
+	};
+	size_t		next_offset;
+}			PartitionInfo;
+
+static inline uint8_t
+extract_key(Datum key, int level)
+{
+	return (key >> (((SIZEOF_DATUM - 1) - level) * 8)) & 0xFF;
+}
+
+static inline void
+swap(SortTuple *a, SortTuple *b)
+{
+	SortTuple	tmp = *a;
+
+	*a = *b;
+	*b = tmp;
+}
+
+/*
+ * Condition datum to work with pure unsigned comparison,
+ * taking ASC/DESC into account as well.
+ */
+static inline Datum
+condition_datum(Datum orig, SortSupport ssup)
+{
+	Datum		cond_datum1;
+
+	if (ssup->comparator == ssup_datum_signed_cmp)
+	{
+		/* it was already cast to unsigned when stored */
+		cond_datum1 = orig ^ (UINT64CONST(1) << 63);
+	}
+	else if (ssup->comparator == ssup_datum_int32_cmp)
+	{
+		/*
+		 * First normalize to uint32. Technically, we don't need to do this,
+		 * but it forces the upper bytes to remain the same regardless of
+		 * sign.
+		 */
+		uint32		u32 = DatumGetUInt32(orig) ^ ((uint32) 1 << 31);
+
+		cond_datum1 = UInt32GetDatum(u32);
+	}
+	else
+	{
+		Assert(ssup->comparator == ssup_datum_unsigned_cmp);
+		cond_datum1 = orig;
+	}
+
+	if (ssup->ssup_reverse)
+		cond_datum1 = ~cond_datum1;
+
+	return cond_datum1;
+}
+
+/*
+ * Based on implementation in https://github.com/skarupke/ska_sort (Boost license),
+ * with the following changes:
+ *  - unroll loop in counting step
+ *  - count sorted partitions in every pass, rather than maintaining list of unsorted partitions
+ * TODO: match qsort API with number of elements rather than end pointer
+ */
+static void
+ska_byte_sort(SortTuple *begin,
+			  SortTuple *end, int level, Tuplesortstate *state)
+{
+	/* size_t		counts0[256] = {0}; */
+	size_t		counts1[256] = {0};
+	size_t		counts2[256] = {0};
+	size_t		counts3[256] = {0};
+	PartitionInfo partitions[256] = {0};
+	uint8_t		remaining_partitions[256] = {0};
+	size_t		total = 0;
+	int			num_partitions = 0;
+	int			num_remaining;
+	SortTuple  *ctup;
+
+	/* count key chunks, unrolled for speed */
+
+	for (ctup = begin; ctup + 4 < end; ctup += 4)
+	{
+		uint8		key_chunk0 = extract_key((ctup + 0)->cond_datum1, level);
+		uint8		key_chunk1 = extract_key((ctup + 1)->cond_datum1, level);
+		uint8		key_chunk2 = extract_key((ctup + 2)->cond_datum1, level);
+		uint8		key_chunk3 = extract_key((ctup + 3)->cond_datum1, level);
+
+		partitions[key_chunk0].count++;
+		counts1[key_chunk1]++;
+		counts2[key_chunk2]++;
+		counts3[key_chunk3]++;
+
+	}
+
+	for (size_t i = 0; i < 256; i++)
+		partitions[i].count += counts1[i] + counts2[i] + counts3[i];
+
+	for (; ctup < end; ctup++)
+	{
+		uint8		key_chunk;
+
+		key_chunk = extract_key(ctup->cond_datum1, level);
+		partitions[key_chunk].count++;
+	}
+
+	/* compute partition offsets */
+	for (int i = 0; i < 256; ++i)
+	{
+		size_t		count = partitions[i].count;
+
+		if (count)
+		{
+			partitions[i].offset = total;
+			total += count;
+			remaining_partitions[num_partitions] = i;
+			++num_partitions;
+		}
+		partitions[i].next_offset = total;
+	}
+
+	num_remaining = num_partitions;
+
+	/*
+	 * Permute tuples to correct partition. If we started with one partition,
+	 * there is nothing to do. If a permutation from a previous iteration
+	 * results in a single partition that hasn't been marked as sorted, we
+	 * know it's actually sorted.
+	 */
+	while (num_remaining > 1)
+	{
+		/*
+		 * We can only exit the loop when all partitions are sorted, so must
+		 * reset every iteration
+		 */
+		num_remaining = num_partitions;
+
+		for (int i = 0; i < num_partitions; i++)
+		{
+			uint8		idx = remaining_partitions[i];
+
+			PartitionInfo part = partitions[idx];
+
+			for (SortTuple *st = begin + part.offset;
+				 st < begin + part.next_offset;
+				 st++)
+			{
+				uint8		this_partition = extract_key(st->cond_datum1, level);
+				size_t		offset = partitions[this_partition].offset++;
+
+				Assert(begin + offset < end);
+				swap(st, begin + offset);
+			};
+
+			if (part.offset == part.next_offset)
+			{
+				/* partition is sorted; skip */
+				num_remaining--;
+			}
+		}
+	}
+
+	{
+		size_t		start_offset = 0;
+		SortTuple  *partition_begin = begin;
+
+		for (uint8_t *it = remaining_partitions, *end = remaining_partitions + num_partitions;
+			 it != end;
+			 ++it)
+		{
+			size_t		end_offset = partitions[*it].next_offset;
+			SortTuple  *partition_end = begin + end_offset;
+			ptrdiff_t	num_elements = end_offset - start_offset;
+
+			if (num_elements > 1)
+			{
+				if (level < SIZEOF_DATUM - 1)
+				{
+					if (num_elements < QSORT_THRESHOLD)
+					{
+						qsort_tuple_conditioned(partition_begin,
+												num_elements,
+												state);
+					}
+					else
+					{
+						ska_byte_sort(partition_begin,
+									  partition_end,
+									  level + 1,
+									  state);
+					}
+				}
+				else if (state->base.onlyKey == NULL)
+				{
+					/*
+					 * Finished radix sort on all bytes of cond_datum1
+					 * (possibily abbreviated), now qsort with tiebreak
+					 * comparator.
+					 * XXX comparetup_tiebreak cannot inspect isnull1.
+					 * In core, the only tiebreak that does so is comparetup_datum_tiebreak,
+					 * but we wouldn't have gotten here if that's the case.
+					 */
+					qsort_tuple(partition_begin,
+								num_elements,
+								state->base.comparetup_tiebreak,
+								state);
+				}
+			}
+			start_offset = end_offset;
+			partition_begin = partition_end;
+		}
+	}
+}
+
 /*
  *		tuplesort_begin_xxx
  *
@@ -2663,8 +2922,203 @@ sort_bounded_heap(Tuplesortstate *state)
 	state->boundUsed = true;
 }
 
+/* WIP: allow turning common prefix skipping off for testing */
+#define COMMON_PREFIX
+
+/*
+ * Compute conditioned datums for SortTuples so that a single
+ * unsigned comparison resolves the sort order up to comparetup_*_tiebreak.
+ * Then dispatch to either radix sort or a specialized qsort.
+ */
+static void
+sort_tuple_conditioned(Tuplesortstate *state)
+{
+	SortSupportData ssup = state->base.sortKeys[0];
+
+	bool		nulls_first = ssup.ssup_nulls_first;
+	SortTuple  *first = state->memtuples;
+	SortTuple  *last = state->memtuples + state->memtupcount;
+	SortTuple  *not_null_start;
+	size_t		d1,
+				d2,
+				not_null_count;
+#ifdef COMMON_PREFIX
+	Datum		first_datum = 0;
+	Datum		common_upper_bits = 0;
+#endif
+	int			common_prefix;
+
+	/*
+	 * Partition by isnull1, since we can only radix sort on non-NULL
+	 * elements.
+	 */
+
+	/*
+	 * Find the leftmost NOT NULL tuple if NULLS FIRST, or leftmost NULL
+	 * element if NULLS LAST.
+	 */
+	while (first < last && first->isnull1 == nulls_first)
+		first++;
+
+	/*
+	 * XXX We must start "last" after the final tuple to maintain the
+	 * invariant that it ends up one after the first partition, and the first
+	 * partition may correspond to the entire array. If "first" isn't gotten
+	 * this far, we need to pre-decrement "last" before beginning its loop.
+	 */
+	if (first < last)
+		last--;
+
+	/*
+	 * Find the rightmost NULL tuple if NULLS FIRST, or rightmost NOT NULL
+	 * tuple if NULLS LAST.
+	 */
+	while (first < last && last->isnull1 != nulls_first)
+		last--;
+
+	/* swap pairs of tuples that are in the wrong order */
+	while (first < last)
+	{
+		swap(first, last);
+		while (first < last && first->isnull1 == nulls_first)
+			first++;
+		while (first < last && last->isnull1 != nulls_first)
+			last--;
+	}
+
+	d1 = last - state->memtuples;
+	d2 = state->memtupcount - d1;
+
+	Assert(last == first);
+	Assert(last + d2 == state->memtuples + state->memtupcount);
+	for (SortTuple *pm = state->memtuples;
+		 pm < state->memtuples + d1;
+		 pm++)
+		Assert(pm->isnull1 == nulls_first);
+	for (SortTuple *pm = last;
+		 pm < last + d2;
+		 pm++)
+		Assert(pm->isnull1 != nulls_first);
+
+	/*
+	 * Sort null partition using tiebreak comparator. XXX this will repeat the
+	 * NULL check for abbreviated keys.
+	 */
+	if (nulls_first)
+	{
+		qsort_tuple(state->memtuples,
+					d1,
+					state->base.comparetup_tiebreak,
+					state);
+		not_null_start = last;
+		not_null_count = d2;
+	}
+	else
+	{
+		qsort_tuple(last,
+					d2,
+					state->base.comparetup_tiebreak,
+					state);
+		not_null_start = state->memtuples;
+		not_null_count = d1;
+	}
+
+	/*
+	 * Condition datum so that unsigned comparision is order-preserving, and
+	 * compute the common prefix to skip unproductive recursion steps during
+	 * radix sort.
+	 */
+	for (SortTuple *tup = not_null_start;
+		 tup < not_null_start + not_null_count;
+		 tup++)
+	{
+		Datum		cond_datum1 = condition_datum(tup->datum1, &ssup);
+#ifdef COMMON_PREFIX
+		if (tup == not_null_start)
+		{
+			/* Need to start with some value, may as well be the first one. */
+			first_datum = cond_datum1;
+		}
+		else
+		{
+			Datum		this_common_bits;
+
+			/* The bits in common will be zero */
+			this_common_bits = first_datum ^ cond_datum1;
+
+			/*
+			 * We're really only interested in the case where the rightmost
+			 * one bit is further right, but this branch should be rare enough
+			 * not to waste cycles trying harder.
+			 */
+			if (this_common_bits > common_upper_bits)
+				common_upper_bits = this_common_bits;
+		}
+#endif
+		tup->cond_datum1 = cond_datum1;
+	}
+
+	if (not_null_count < QSORT_THRESHOLD)
+		qsort_tuple_conditioned(not_null_start,
+								not_null_count,
+								state);
+	else
+	{
+
+		/*
+		 * The upper bits are zero where all values are the same, if any. Turn
+		 * the byte position of the rightmost one bit into the byte where
+		 * radix sort should start bucketing. OR-ing in the lowest bit guards
+		 * against undefined behavior without changing the result.
+		 */
+#ifdef COMMON_PREFIX
+		common_prefix = sizeof(Datum) - 1 -
+			(pg_leftmost_one_pos64(common_upper_bits | 1) / BITS_PER_BYTE);
+#else
+		common_prefix = 0;
+#endif
+		/* perform the radix sort on the not-NULL partition */
+		ska_byte_sort(not_null_start,
+					  not_null_start + not_null_count,
+					  common_prefix,
+					  state);
+	}
+
+	/*
+	 * Restore fields that were overwritten with temporary conditioned datum1
+	 */
+	for (SortTuple *tup = not_null_start;
+		 tup < not_null_start + not_null_count;
+		 tup++)
+	{
+		/* need to restore NOT NULL */
+		tup->isnull1 = false;
+		/* be tidy */
+		tup->srctape = 0;
+	}
+}
+
+/* Verify sort using standard comparator. */
+static void
+check_sorted(Tuplesortstate *state)
+{
+#ifdef USE_ASSERT_CHECKING
+	for (SortTuple *pm = state->memtuples + 1;
+		 pm < state->memtuples + state->memtupcount;
+		 pm++)
+	{
+#if 0
+		Assert(COMPARETUP(state, pm - 1, pm) <= 0);
+#else
+		if (COMPARETUP(state, pm - 1, pm) > 0)
+			elog(ERROR, "SORT FAILED");
+#endif
+	}
+#endif
+}
+
 /*
- * Sort all memtuples using specialized qsort() routines.
+ * Sort all memtuples using specialized routines.
  *
  * Quicksort is used for small in-memory sorts, and external sort runs.
  */
@@ -2681,26 +3135,42 @@ tuplesort_sort_memtuples(Tuplesortstate *state)
 		 */
 		if (state->base.haveDatum1 && state->base.sortKeys)
 		{
-			if (state->base.sortKeys[0].comparator == ssup_datum_unsigned_cmp)
-			{
-				qsort_tuple_unsigned(state->memtuples,
-									 state->memtupcount,
-									 state);
-				return;
-			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_signed_cmp)
+			SortSupportData ssup = state->base.sortKeys[0];
+
+			if (wip_radix_sort)
 			{
-				qsort_tuple_signed(state->memtuples,
-								   state->memtupcount,
-								   state);
-				return;
+				if ((ssup.comparator == ssup_datum_unsigned_cmp ||
+					 ssup.comparator == ssup_datum_signed_cmp ||
+					 ssup.comparator == ssup_datum_int32_cmp))
+				{
+					sort_tuple_conditioned(state);
+					check_sorted(state);
+					return;
+				}
 			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_int32_cmp)
+			else
 			{
-				qsort_tuple_int32(state->memtuples,
-								  state->memtupcount,
-								  state);
-				return;
+				if (state->base.sortKeys[0].comparator == ssup_datum_unsigned_cmp)
+				{
+					qsort_tuple_unsigned(state->memtuples,
+										 state->memtupcount,
+										 state);
+					return;
+				}
+				else if (state->base.sortKeys[0].comparator == ssup_datum_signed_cmp)
+				{
+					qsort_tuple_signed(state->memtuples,
+									   state->memtupcount,
+									   state);
+					return;
+				}
+				else if (state->base.sortKeys[0].comparator == ssup_datum_int32_cmp)
+				{
+					qsort_tuple_int32(state->memtuples,
+									  state->memtupcount,
+									  state);
+					return;
+				}
 			}
 		}
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index f21ec37da89..bc6f7fa60f3 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -324,6 +324,7 @@ extern PGDLLIMPORT int tcp_user_timeout;
 extern PGDLLIMPORT char *role_string;
 extern PGDLLIMPORT bool in_hot_standby_guc;
 extern PGDLLIMPORT bool trace_sort;
+extern PGDLLIMPORT bool wip_radix_sort;
 
 #ifdef DEBUG_BOUNDED_SORT
 extern PGDLLIMPORT bool optimize_bounded_sort;
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index ef79f259f93..b2ecbbc9e51 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -149,8 +149,16 @@ typedef struct
 {
 	void	   *tuple;			/* the tuple itself */
 	Datum		datum1;			/* value of first key column */
-	bool		isnull1;		/* is first key column NULL? */
-	int			srctape;		/* source tape number */
+
+	union
+	{
+		struct
+		{
+			bool		isnull1;		/* is first key column NULL? */
+			int			srctape;		/* source tape number */
+		};
+		Datum		cond_datum1;		/* sort key for radix sort */
+	};
 } SortTuple;
 
 typedef int (*SortTupleComparator) (const SortTuple *a, const SortTuple *b,
-- 
2.51.0

test-ska-byte-sort-threshold.ctext/x-csrc; charset=US-ASCII; name=test-ska-byte-sort-threshold.cDownload

#11

John Naylor

johncnaylorls@gmail.com

2 months ago

In reply to: John Naylor (#10)

2 attachment(s)

Re: tuple radix sort

I wrote:

I've set the threshold to 400 for now, but I'm not claiming that's the
end story. In addition to the underestimation mentioned above,
unrolling the counting step is a factor. Unrolling makes smaller
inputs worse (which we can reach by recursing from larger inputs), but
unrolling seems important for large inputs with low cardinality (a few
percent, but I haven't shared numbers yet). We've found that a large
input with only 4-5 distinct values just barely wins with radix sort.
I'll be curious to see if unrolling is actually needed to prevent
regressions there.

Looking more closely at my development history, it turns out I added
loop unrolling before adding common prefix detection. Most real-world
non-negative integers have the upper bytes the same, especially since
the datum is 8 bytes regardless of underlying type. For those bytes,
the radix sort finds only one unique byte and continues on to the next
byte. By detecting the common prefix as we condition the datums, it
matters less how fast we can count since we simply skip some useless
work. (This is not as relevant when we have an abbreviated datum)

Repeating part of the microbenchmark from last time with no unrolling,
a threshold of 200 works for all but the lowest cardinality inputs:

cardinality: 256
number of elements: 100 qsort: 34.8 radix: 38.3
number of elements: 200 qsort: 34.9 radix: 29.7
number of elements: 400 qsort: 40.8 radix: 29.2

cardinality: 16
number of elements: 100 qsort: 19.3 radix: 26.2
number of elements: 200 qsort: 18.9 radix: 18.2
number of elements: 400 qsort: 18.5 radix: 14.5

cardinality: 2
number of elements: 100 qsort: 09.3 radix: 21.6
number of elements: 200 qsort: 08.9 radix: 15.4
number of elements: 400 qsort: 10.3 radix: 14.0

To test further, I dug up a test from [1]/messages/by-id/CAApHDvqkHZsT2gaAWFM7D=7qyQ=eKXQvvn+BkwCn4Rvj1w4EKQ@mail.gmail.com that stresses low
cardinality on multiple sort keys (attached in a form to allow turing
radix sort on and off via a command line argument), and found no
regression with or without loop unrolling:

V2:
off:
4 ^ 8: latency average = 101.070 ms
5 ^ 8: latency average = 704.862 ms
6 ^ 8: latency average = 3651.015 ms
7 ^ 8: latency average = 15141.412 ms

on:
4 ^ 8: latency average = 99.939 ms
5 ^ 8: latency average = 683.018 ms
6 ^ 8: latency average = 3545.626 ms
7 ^ 8: latency average = 14095.677 ms

V3:
off:
4 ^ 8: latency average = 99.486 ms
5 ^ 8: latency average = 693.434 ms
6 ^ 8: latency average = 3607.940 ms
7 ^ 8: latency average = 14602.325 ms

on:
4 ^ 8: latency average = 97.664 ms
5 ^ 8: latency average = 678.752 ms
6 ^ 8: latency average = 3361.765 ms
7 ^ 8: latency average = 14121.190 ms

So v3 gets rid of loop unrolling, reduces the threshold to 200.

[1]: /messages/by-id/CAApHDvqkHZsT2gaAWFM7D=7qyQ=eKXQvvn+BkwCn4Rvj1w4EKQ@mail.gmail.com

TODOs:
- adding a "sorted pre-check" to keep up with our qsort for ascending inputs
- further performance validation
- possibly doing NULL partitioning differently
- possibly specializing qsort on the NULL partition
- code cleanup

--
John Naylor
Amazon Web Services

Attachments:

bench_cartesiansort.shapplication/x-sh; name=bench_cartesiansort.shDownload

v3-0001-Use-radix-sort-when-datum1-is-an-integer-type.patchapplication/x-patch; name=v3-0001-Use-radix-sort-when-datum1-is-an-integer-type.patchDownload

From 8247793423da15e6c7558808897203f3234b2666 Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Fri, 17 Oct 2025 09:57:43 +0700
Subject: [PATCH v3] Use radix sort when datum1 is an integer type

XXX regression tests don't pass for underspecified queries; this
is expected
---
 src/backend/utils/misc/guc_parameters.dat |   7 +
 src/backend/utils/sort/tuplesort.c        | 488 +++++++++++++++++++++-
 src/include/utils/guc.h                   |   1 +
 src/include/utils/tuplesort.h             |  12 +-
 4 files changed, 487 insertions(+), 21 deletions(-)

diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 25da769eb35..bbfc52adcaf 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -3469,6 +3469,13 @@
   max => 'INT_MAX',
 },
 
+{ name => 'wip_radix_sort', type => 'bool', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Test radix sort for debugging.',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'wip_radix_sort',
+  boot_val => 'true',
+},
+
 { name => 'work_mem', type => 'int', context => 'PGC_USERSET', group => 'RESOURCES_MEM',
   short_desc => 'Sets the maximum memory to be used for query workspaces.',
   long_desc => 'This much memory can be used by each internal sort operation and hash table before switching to temporary disk files.',
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 5d4411dc33f..84abd1995f7 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -104,6 +104,7 @@
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -122,6 +123,7 @@
 
 /* GUC variables */
 bool		trace_sort = false;
+bool		wip_radix_sort = true;
 
 #ifdef DEBUG_BOUNDED_SORT
 bool		optimize_bounded_sort = true;
@@ -490,6 +492,25 @@ static void tuplesort_updatemax(Tuplesortstate *state);
  * abbreviations of text or multi-key sorts.  There could be!  Is it worth it?
  */
 
+/* Used for conditioned datums, so we can ignore NULLs and sort direction. */
+static pg_attribute_always_inline int
+qsort_tuple_conditioned_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
+{
+	if (a->cond_datum1 < b->cond_datum1)
+		return -1;
+	if (a->cond_datum1 > b->cond_datum1)
+		return 1;
+
+	/*
+	 * No need to waste effort calling the tiebreak function when there are no
+	 * other keys to sort on.
+	 */
+	if (state->base.onlyKey != NULL)
+		return 0;
+
+	return state->base.comparetup_tiebreak(a, b, state);
+}
+
 /* Used if first key's comparator is ssup_datum_unsigned_cmp */
 static pg_attribute_always_inline int
 qsort_tuple_unsigned_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
@@ -567,6 +588,15 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
  * common comparison functions on pass-by-value leading datums.
  */
 
+#define ST_SORT qsort_tuple_conditioned
+#define ST_ELEMENT_TYPE SortTuple
+#define ST_COMPARE(a, b, state) qsort_tuple_conditioned_compare(a, b, state)
+#define ST_COMPARE_ARG_TYPE Tuplesortstate
+#define ST_CHECK_FOR_INTERRUPTS
+#define ST_SCOPE static
+#define ST_DEFINE
+#include "lib/sort_template.h"
+
 #define ST_SORT qsort_tuple_unsigned
 #define ST_ELEMENT_TYPE SortTuple
 #define ST_COMPARE(a, b, state) qsort_tuple_unsigned_compare(a, b, state)
@@ -615,6 +645,212 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 #define ST_DEFINE
 #include "lib/sort_template.h"
 
+
+/*
+ * WIP: For now prefer test coverage of radix sort in Assert builds.
+ */
+#ifdef USE_ASSERT_CHECKING
+#define QSORT_THRESHOLD 0
+#else
+#define QSORT_THRESHOLD 200
+#endif
+
+typedef struct PartitionInfo
+{
+	union
+	{
+		size_t		count;
+		size_t		offset;
+	};
+	size_t		next_offset;
+}			PartitionInfo;
+
+static inline uint8_t
+extract_key(Datum key, int level)
+{
+	return (key >> (((SIZEOF_DATUM - 1) - level) * 8)) & 0xFF;
+}
+
+static inline void
+swap(SortTuple *a, SortTuple *b)
+{
+	SortTuple	tmp = *a;
+
+	*a = *b;
+	*b = tmp;
+}
+
+/*
+ * Condition datum to work with pure unsigned comparison,
+ * taking ASC/DESC into account as well.
+ */
+static inline Datum
+condition_datum(Datum orig, SortSupport ssup)
+{
+	Datum		cond_datum1;
+
+	if (ssup->comparator == ssup_datum_signed_cmp)
+	{
+		/* it was already cast to unsigned when stored */
+		cond_datum1 = orig ^ (UINT64CONST(1) << 63);
+	}
+	else if (ssup->comparator == ssup_datum_int32_cmp)
+	{
+		/*
+		 * First normalize to uint32. Technically, we don't need to do this,
+		 * but it forces the upper bytes to remain the same regardless of
+		 * sign.
+		 */
+		uint32		u32 = DatumGetUInt32(orig) ^ ((uint32) 1 << 31);
+
+		cond_datum1 = UInt32GetDatum(u32);
+	}
+	else
+	{
+		Assert(ssup->comparator == ssup_datum_unsigned_cmp);
+		cond_datum1 = orig;
+	}
+
+	if (ssup->ssup_reverse)
+		cond_datum1 = ~cond_datum1;
+
+	return cond_datum1;
+}
+
+/*
+ * Based on implementation in https://github.com/skarupke/ska_sort (Boost license),
+ * with the following changes:
+ *  - unroll loop in counting step
+ *  - count sorted partitions in every pass, rather than maintaining list of unsorted partitions
+ * TODO: match qsort API with number of elements rather than end pointer
+ */
+static void
+ska_byte_sort(SortTuple *begin,
+			  SortTuple *end, int level, Tuplesortstate *state)
+{
+	PartitionInfo partitions[256] = {0};
+	uint8_t		remaining_partitions[256] = {0};
+	size_t		total = 0;
+	int			num_partitions = 0;
+	int			num_remaining;
+
+	/* count key chunks */
+	for (SortTuple *tup = begin; tup < end; tup++)
+	{
+		uint8		key_chunk;
+
+		key_chunk = extract_key(tup->cond_datum1, level);
+		partitions[key_chunk].count++;
+	}
+
+	/* compute partition offsets */
+	for (int i = 0; i < 256; ++i)
+	{
+		size_t		count = partitions[i].count;
+
+		if (count)
+		{
+			partitions[i].offset = total;
+			total += count;
+			remaining_partitions[num_partitions] = i;
+			++num_partitions;
+		}
+		partitions[i].next_offset = total;
+	}
+
+	num_remaining = num_partitions;
+
+	/*
+	 * Permute tuples to correct partition. If we started with one partition,
+	 * there is nothing to do. If a permutation from a previous iteration
+	 * results in a single partition that hasn't been marked as sorted, we
+	 * know it's actually sorted.
+	 */
+	while (num_remaining > 1)
+	{
+		/*
+		 * We can only exit the loop when all partitions are sorted, so must
+		 * reset every iteration
+		 */
+		num_remaining = num_partitions;
+
+		for (int i = 0; i < num_partitions; i++)
+		{
+			uint8		idx = remaining_partitions[i];
+
+			PartitionInfo part = partitions[idx];
+
+			for (SortTuple *st = begin + part.offset;
+				 st < begin + part.next_offset;
+				 st++)
+			{
+				uint8		this_partition = extract_key(st->cond_datum1, level);
+				size_t		offset = partitions[this_partition].offset++;
+
+				Assert(begin + offset < end);
+				swap(st, begin + offset);
+			};
+
+			if (part.offset == part.next_offset)
+			{
+				/* partition is sorted */
+				num_remaining--;
+			}
+		}
+	}
+
+	{
+		size_t		start_offset = 0;
+		SortTuple  *partition_begin = begin;
+
+		for (uint8_t *it = remaining_partitions, *end = remaining_partitions + num_partitions;
+			 it != end;
+			 ++it)
+		{
+			size_t		end_offset = partitions[*it].next_offset;
+			SortTuple  *partition_end = begin + end_offset;
+			ptrdiff_t	num_elements = end_offset - start_offset;
+
+			if (num_elements > 1)
+			{
+				if (level < SIZEOF_DATUM - 1)
+				{
+					if (num_elements < QSORT_THRESHOLD)
+					{
+						qsort_tuple_conditioned(partition_begin,
+												num_elements,
+												state);
+					}
+					else
+					{
+						ska_byte_sort(partition_begin,
+									  partition_end,
+									  level + 1,
+									  state);
+					}
+				}
+				else if (state->base.onlyKey == NULL)
+				{
+					/*
+					 * Finished radix sort on all bytes of cond_datum1
+					 * (possibily abbreviated), now qsort with tiebreak
+					 * comparator. XXX comparetup_tiebreak cannot inspect
+					 * isnull1. In core, the only tiebreak that does so is
+					 * comparetup_datum_tiebreak, but we wouldn't have gotten
+					 * here if that's the case.
+					 */
+					qsort_tuple(partition_begin,
+								num_elements,
+								state->base.comparetup_tiebreak,
+								state);
+				}
+			}
+			start_offset = end_offset;
+			partition_begin = partition_end;
+		}
+	}
+}
+
 /*
  *		tuplesort_begin_xxx
  *
@@ -2663,10 +2899,208 @@ sort_bounded_heap(Tuplesortstate *state)
 	state->boundUsed = true;
 }
 
+/* WIP: allow turning common prefix skipping off for testing */
+#define COMMON_PREFIX
+
+/*
+ * Compute conditioned datums for SortTuples so that a single
+ * unsigned comparison resolves the sort order of datum1.
+ * Then dispatch to either radix sort or a specialized qsort.
+ */
+static void
+sort_tuple_conditioned(Tuplesortstate *state)
+{
+	SortSupportData ssup = state->base.sortKeys[0];
+
+	bool		nulls_first = ssup.ssup_nulls_first;
+	SortTuple  *first = state->memtuples;
+	SortTuple  *last = state->memtuples + state->memtupcount;
+	SortTuple  *not_null_start;
+	size_t		d1,
+				d2,
+				not_null_count;
+#ifdef COMMON_PREFIX
+	Datum		first_datum = 0;
+	Datum		common_upper_bits = 0;
+#endif
+	int			common_prefix;
+
+	/*
+	 * Partition by isnull1, since we can only radix sort on non-NULL
+	 * elements. This also allows the qsort fallback to ignore NULLs.
+	 */
+
+	/*
+	 * Find the leftmost NOT NULL tuple if NULLS FIRST, or leftmost NULL
+	 * element if NULLS LAST.
+	 */
+	while (first < last && first->isnull1 == nulls_first)
+		first++;
+
+	/*
+	 * XXX We must start "last" after the final tuple to maintain the
+	 * invariant that it ends up one after the first partition, and the first
+	 * partition may correspond to the entire array. If "first" isn't gotten
+	 * this far, we need to pre-decrement "last" before beginning its loop.
+	 */
+	if (first < last)
+		last--;
+
+	/*
+	 * Find the rightmost NULL tuple if NULLS FIRST, or rightmost NOT NULL
+	 * tuple if NULLS LAST.
+	 */
+	while (first < last && last->isnull1 != nulls_first)
+		last--;
+
+	/* swap pairs of tuples that are in the wrong order */
+	while (first < last)
+	{
+		swap(first, last);
+		while (first < last && first->isnull1 == nulls_first)
+			first++;
+		while (first < last && last->isnull1 != nulls_first)
+			last--;
+	}
+
+	d1 = last - state->memtuples;
+	d2 = state->memtupcount - d1;
+
+	Assert(last == first);
+	Assert(last + d2 == state->memtuples + state->memtupcount);
+	for (SortTuple *pm = state->memtuples;
+		 pm < state->memtuples + d1;
+		 pm++)
+		Assert(pm->isnull1 == nulls_first);
+	for (SortTuple *pm = last;
+		 pm < last + d2;
+		 pm++)
+		Assert(pm->isnull1 != nulls_first);
+
+	/*
+	 * Sort NULL partition using tiebreak comparator. XXX this will repeat the
+	 * NULL check for abbreviated keys.
+	 */
+	if (nulls_first)
+	{
+		qsort_tuple(state->memtuples,
+					d1,
+					state->base.comparetup_tiebreak,
+					state);
+		not_null_start = last;
+		not_null_count = d2;
+	}
+	else
+	{
+		qsort_tuple(last,
+					d2,
+					state->base.comparetup_tiebreak,
+					state);
+		not_null_start = state->memtuples;
+		not_null_count = d1;
+	}
+
+	/*
+	 * Condition datum so that unsigned comparision is order-preserving, and
+	 * compute the common prefix to skip unproductive recursion steps during
+	 * radix sort.
+	 */
+	for (SortTuple *tup = not_null_start;
+		 tup < not_null_start + not_null_count;
+		 tup++)
+	{
+		Datum		cond_datum1 = condition_datum(tup->datum1, &ssup);
+#ifdef COMMON_PREFIX
+		if (tup == not_null_start)
+		{
+			/* Need to start with some value, may as well be the first one. */
+			first_datum = cond_datum1;
+		}
+		else
+		{
+			Datum		this_common_bits;
+
+			/* The bits in common will be zero */
+			this_common_bits = first_datum ^ cond_datum1;
+
+			/*
+			 * We're really only interested in the case where the leftmost one
+			 * bit is further left, but this branch should be predictable
+			 * enough not to waste cycles trying harder.
+			 */
+			if (this_common_bits > common_upper_bits)
+				common_upper_bits = this_common_bits;
+		}
+#endif
+		tup->cond_datum1 = cond_datum1;
+	}
+
+	/*
+	 * Sort the not-NULL partition, using radix sort if large enough,
+	 * otherwise fall back to quicksort.
+	 */
+	if (not_null_count < QSORT_THRESHOLD)
+		qsort_tuple_conditioned(not_null_start,
+								not_null_count,
+								state);
+	else
+	{
+
+		/*
+		 * The upper bits of common_upper_bits are zero where all values have
+		 * the same bits. The byte position of the leftmost one bit is the
+		 * byte where radix sort should start bucketing. OR-ing in the lowest
+		 * bit guards against undefined behavior without changing the result.
+		 */
+#ifdef COMMON_PREFIX
+		common_prefix = sizeof(Datum) - 1 -
+			(pg_leftmost_one_pos64(common_upper_bits | 1) / BITS_PER_BYTE);
+#else
+		common_prefix = 0;
+#endif
+		ska_byte_sort(not_null_start,
+					  not_null_start + not_null_count,
+					  common_prefix,
+					  state);
+	}
+
+	/*
+	 * Restore fields that were overwritten with temporary conditioned datum1
+	 */
+	for (SortTuple *tup = not_null_start;
+		 tup < not_null_start + not_null_count;
+		 tup++)
+	{
+		/* need to restore NOT NULL */
+		tup->isnull1 = false;
+		/* be tidy */
+		tup->srctape = 0;
+	}
+}
+
+/* Verify sort using standard comparator. */
+static void
+check_sorted(Tuplesortstate *state)
+{
+#ifdef USE_ASSERT_CHECKING
+	for (SortTuple *pm = state->memtuples + 1;
+		 pm < state->memtuples + state->memtupcount;
+		 pm++)
+	{
+#if 0
+		Assert(COMPARETUP(state, pm - 1, pm) <= 0);
+#else
+		if (COMPARETUP(state, pm - 1, pm) > 0)
+			elog(ERROR, "SORT FAILED");
+#endif
+	}
+#endif
+}
+
 /*
- * Sort all memtuples using specialized qsort() routines.
+ * Sort all memtuples using specialized routines.
  *
- * Quicksort is used for small in-memory sorts, and external sort runs.
+ * Quicksort or radix sort is used for small in-memory sorts, and external sort runs.
  */
 static void
 tuplesort_sort_memtuples(Tuplesortstate *state)
@@ -2681,26 +3115,42 @@ tuplesort_sort_memtuples(Tuplesortstate *state)
 		 */
 		if (state->base.haveDatum1 && state->base.sortKeys)
 		{
-			if (state->base.sortKeys[0].comparator == ssup_datum_unsigned_cmp)
-			{
-				qsort_tuple_unsigned(state->memtuples,
-									 state->memtupcount,
-									 state);
-				return;
-			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_signed_cmp)
+			SortSupportData ssup = state->base.sortKeys[0];
+
+			if (wip_radix_sort)
 			{
-				qsort_tuple_signed(state->memtuples,
-								   state->memtupcount,
-								   state);
-				return;
+				if ((ssup.comparator == ssup_datum_unsigned_cmp ||
+					 ssup.comparator == ssup_datum_signed_cmp ||
+					 ssup.comparator == ssup_datum_int32_cmp))
+				{
+					sort_tuple_conditioned(state);
+					check_sorted(state);
+					return;
+				}
 			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_int32_cmp)
+			else
 			{
-				qsort_tuple_int32(state->memtuples,
-								  state->memtupcount,
-								  state);
-				return;
+				if (state->base.sortKeys[0].comparator == ssup_datum_unsigned_cmp)
+				{
+					qsort_tuple_unsigned(state->memtuples,
+										 state->memtupcount,
+										 state);
+					return;
+				}
+				else if (state->base.sortKeys[0].comparator == ssup_datum_signed_cmp)
+				{
+					qsort_tuple_signed(state->memtuples,
+									   state->memtupcount,
+									   state);
+					return;
+				}
+				else if (state->base.sortKeys[0].comparator == ssup_datum_int32_cmp)
+				{
+					qsort_tuple_int32(state->memtuples,
+									  state->memtupcount,
+									  state);
+					return;
+				}
 			}
 		}
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index f21ec37da89..bc6f7fa60f3 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -324,6 +324,7 @@ extern PGDLLIMPORT int tcp_user_timeout;
 extern PGDLLIMPORT char *role_string;
 extern PGDLLIMPORT bool in_hot_standby_guc;
 extern PGDLLIMPORT bool trace_sort;
+extern PGDLLIMPORT bool wip_radix_sort;
 
 #ifdef DEBUG_BOUNDED_SORT
 extern PGDLLIMPORT bool optimize_bounded_sort;
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 0bf55902aa1..27cd12985fa 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -149,8 +149,16 @@ typedef struct
 {
 	void	   *tuple;			/* the tuple itself */
 	Datum		datum1;			/* value of first key column */
-	bool		isnull1;		/* is first key column NULL? */
-	int			srctape;		/* source tape number */
+
+	union
+	{
+		struct
+		{
+			bool		isnull1;		/* is first key column NULL? */
+			int			srctape;		/* source tape number */
+		};
+		Datum		cond_datum1;		/* sort key for radix sort */
+	};
 } SortTuple;
 
 typedef int (*SortTupleComparator) (const SortTuple *a, const SortTuple *b,
-- 
2.51.1

#12

John Naylor

johncnaylorls@gmail.com

2 months ago

In reply to: John Naylor (#10)

4 attachment(s)

Re: tuple radix sort

On Mon, Nov 3, 2025 at 8:24 PM I wrote:

v1 was careful to restore isnull1 to false when diverting to quicksort
for the tiebreak. v2 doesn't bother, since the only tiebreak in core
that looks at isnull1 is comparetup_datum_tiebreak, which is not
reachable by radix sort, requiring a pass-by-value datum that compares
like an integer. This is a bit of a risk, since it's possible a third
party fork could be doing something weird. Seems unlikely, but
something to keep in mind.

I decided I wasn't quite comfortable with the full normalized datum
sharing space in SortTuple with isnull1. There's too much of a
cognitive burden involved in deciding when we do or don't need to
reset isnull1, and there's a non-zero risk of difficult-to-detect
bugs. For v4 I've instead used one byte of padding space in SortTuple
to store only the byte used for the current pass. That means we must
compute the normalized datum on every pass. That's actually better
than it sounds, since that one byte can now be used directly during
the "deal" step, rather than having to extract the byte from the
normalized datum by shifting and masking. That extraction step might
add significant cycles in cases where a pass requires multiple
iterations through the "deal" loop. It doesn't seem to make much
difference in practice, performance-wise, even with the following
pessimization:

I had to scrap the qsort specialization on the normalized datum for
small sorts, since it's no longer stored. It could still be worth it
to compute the "next byte of the normalized datum" and perform a qsort
on that (falling back to the comparator function in the usual way),
but I haven't felt the need to resort to that yet. For v4, I just
divert to qsort_tuple in non-assert builds, with a threshold of 40.

- I don't quite like how the NULL partitioning step looks, and it
could be costly when the distribution of NULL is not predictable, so
I'm thinking of turning part of that into a branch-free cyclic
permutation, similar to
/messages/by-id/CANWCAZbAmaZ7P+ARjS97sJLXsBB5CPZyzFgqNDiqe-L+BqXzug@mail.gmail.com

This is done. Even though the inner loop is mysterious at first
glance, it's really quite elegant.

I made an attempt at clean-up, but it's still under-commented. The
common prefix detection has moved to a separate patch (v4-0004).

I've been forcing all eligible sorts to use radix sort in assert
builds, even when small enough that qsort would be faster. Since both
qsort and in-place radix sort are unstable, it's expected that some
regression tests need adjustment (v4-0002). One thing surprised me,
however: The pg_upgrade TAP test that runs regression tests on the old
cluster showed additional failures that I can't explain. I haven't
seen this before, but it's possible I never ran TAP tests when testing
new sort algorithms previously. This doesn't happen if you change the
current insertion sort threshold, so I haven't been able to reproduce
it aside from this patch. For that reason I can't completely rule out
an actual bug, although I actually have more confidence in the
verification of correct sort order in v4, since isnull1 now never
changes, just as in master. I found that changing some tests to have
additional sort keys seems to fix it (v4-0003). I did this in a rather
quick and haphazard fashion. There's probably a longer conversation to
be had about making test output more deterministic while still
covering the intended executor paths.

Aside from that, this seems like a good place to settle down, so I'm
going to create a CF entry for this. I'll start more rigorous
performance testing in the near future.

--
John Naylor
Amazon Web Services

Attachments:

v4-0004-Detect-common-prefix-to-avoid-wasted-work-during-.patchtext/x-patch; charset=US-ASCII; name=v4-0004-Detect-common-prefix-to-avoid-wasted-work-during-.patchDownload

From d816f92e1c290c771bfd323ffba6f33319a4eb72 Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Wed, 12 Nov 2025 14:31:24 +0700
Subject: [PATCH v4 4/4] Detect common prefix to avoid wasted work during radix
 sort

This is particularly useful for integers, since they commonly
have some zero upper bytes.
---
 src/backend/utils/sort/tuplesort.c | 53 +++++++++++++++++++++++++++++-
 1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 3efc22de24e..68fa9c13ca8 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -104,6 +104,7 @@
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2980,9 +2981,59 @@ sort_byvalue_datum(Tuplesortstate *state)
 		}
 		else
 		{
+			int			common_prefix;
+			Datum		first_datum = 0;
+			Datum		common_upper_bits = 0;
+
+			/*
+			 * Compute the common prefix to skip unproductive recursion steps
+			 * during radix sort.
+			 */
+			for (SortTuple *tup = not_null_start;
+				 tup < not_null_start + not_null_count;
+				 tup++)
+			{
+				Datum		this_datum = tup->datum1;
+
+				if (tup == not_null_start)
+				{
+					/*
+					 * Need to start with some value, may as well be the first
+					 * one.
+					 */
+					first_datum = this_datum;
+				}
+				else
+				{
+					Datum		this_common_bits;
+
+					/* The bits in common will be zero */
+					this_common_bits = first_datum ^ this_datum;
+
+					/*
+					 * We're really only interested in the case where the
+					 * leftmost one bit is further left, but this branch
+					 * should be predictable enough not to waste cycles trying
+					 * harder.
+					 */
+					if (this_common_bits > common_upper_bits)
+						common_upper_bits = this_common_bits;
+				}
+			}
+
+			/*
+			 * The upper bits of common_upper_bits are zero where all values
+			 * have the same bits. The byte position of the leftmost one bit
+			 * is the byte where radix sort should start bucketing. OR-ing in
+			 * the lowest bit guards against undefined behavior without
+			 * changing the result.
+			 */
+			common_prefix = sizeof(Datum) - 1 -
+				(pg_leftmost_one_pos64(common_upper_bits | 1) / BITS_PER_BYTE);
+
 			radix_sort_tuple(not_null_start,
 							 not_null_count,
-							 0,
+							 common_prefix,
 							 state);
 		}
 	}
-- 
2.51.1

v4-0002-WIP-Adjust-regression-tests.patchtext/x-patch; charset=US-ASCII; name=v4-0002-WIP-Adjust-regression-tests.patchDownload

From 12d0603a3aa94b033b4111d1944f2b054f00628d Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Tue, 11 Nov 2025 17:41:26 +0700
Subject: [PATCH v4 2/4] WIP Adjust regression tests

Regression tests don't pass for underspecified queries; this
is expected since both qsort and in-place radix sort are
unstable sorts. For the query

SELECT a, b from mytable ORDER BY a;

...a stable sort would guarantee the relative position of 'b'
for each group of 'a', compared to the input.

This is separated out since the relative order changes with
the qsort threshold, same as it would for qsort and its
insertion sort threshold. Assert builds always do radix
sort regardless of input size, if the data type allows it.
The final commit will have the same threshold for all builds.
---
 contrib/pg_stat_statements/expected/dml.out   |   6 +-
 .../postgres_fdw/expected/postgres_fdw.out    |   8 +-
 src/backend/utils/sort/tuplesort.c            |   1 -
 src/test/regress/expected/groupingsets.out    |   2 +-
 src/test/regress/expected/inet.out            |   4 +-
 src/test/regress/expected/join.out            |  20 +-
 src/test/regress/expected/plancache.out       |   8 +-
 src/test/regress/expected/sqljson.out         |   8 +-
 src/test/regress/expected/tsrf.out            |  38 +-
 src/test/regress/expected/tuplesort.out       |   6 +-
 src/test/regress/expected/window.out          | 500 +++++++++---------
 11 files changed, 300 insertions(+), 301 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/dml.out b/contrib/pg_stat_statements/expected/dml.out
index 347cb8699e4..aa6e91e1c7f 100644
--- a/contrib/pg_stat_statements/expected/dml.out
+++ b/contrib/pg_stat_statements/expected/dml.out
@@ -44,12 +44,12 @@ SELECT *
 SELECT * FROM pgss_dml_tab ORDER BY a;
  a |          b           
 ---+----------------------
- 1 | a                   
  1 | 111                 
- 2 | b                   
+ 1 | a                   
  2 | 222                 
- 3 | c                   
+ 2 | b                   
  3 | 333                 
+ 3 | c                   
  4 | 444                 
  5 | 555                 
  6 | 666                 
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index cd28126049d..95f8fd388e6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2626,15 +2626,15 @@ SELECT * FROM ft1, ft4, ft5, local_tbl WHERE ft1.c1 = ft4.c1 AND ft1.c1 = ft5.c1
  12 |  2 | 00012 | Tue Jan 13 00:00:00 1970 PST | Tue Jan 13 00:00:00 1970 | 2  | 2          | foo | 12 | 13 | AAA012 | 12 | 13 | AAA012 |  2 |  2 | 0002
  42 |  2 | 00042 | Thu Feb 12 00:00:00 1970 PST | Thu Feb 12 00:00:00 1970 | 2  | 2          | foo | 42 | 43 | AAA042 | 42 | 43 | AAA042 |  2 |  2 | 0002
  72 |  2 | 00072 | Sat Mar 14 00:00:00 1970 PST | Sat Mar 14 00:00:00 1970 | 2  | 2          | foo | 72 | 73 | AAA072 | 72 | 73 |        |  2 |  2 | 0002
- 24 |  4 | 00024 | Sun Jan 25 00:00:00 1970 PST | Sun Jan 25 00:00:00 1970 | 4  | 4          | foo | 24 | 25 | AAA024 | 24 | 25 | AAA024 |  4 |  4 | 0004
  54 |  4 | 00054 | Tue Feb 24 00:00:00 1970 PST | Tue Feb 24 00:00:00 1970 | 4  | 4          | foo | 54 | 55 | AAA054 | 54 | 55 |        |  4 |  4 | 0004
+ 24 |  4 | 00024 | Sun Jan 25 00:00:00 1970 PST | Sun Jan 25 00:00:00 1970 | 4  | 4          | foo | 24 | 25 | AAA024 | 24 | 25 | AAA024 |  4 |  4 | 0004
  84 |  4 | 00084 | Thu Mar 26 00:00:00 1970 PST | Thu Mar 26 00:00:00 1970 | 4  | 4          | foo | 84 | 85 | AAA084 | 84 | 85 | AAA084 |  4 |  4 | 0004
- 96 |  6 | 00096 | Tue Apr 07 00:00:00 1970 PST | Tue Apr 07 00:00:00 1970 | 6  | 6          | foo | 96 | 97 | AAA096 | 96 | 97 | AAA096 |  6 |  6 | 0006
+  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006 |  6 |  6 | 0006
  36 |  6 | 00036 | Fri Feb 06 00:00:00 1970 PST | Fri Feb 06 00:00:00 1970 | 6  | 6          | foo | 36 | 37 | AAA036 | 36 | 37 |        |  6 |  6 | 0006
  66 |  6 | 00066 | Sun Mar 08 00:00:00 1970 PST | Sun Mar 08 00:00:00 1970 | 6  | 6          | foo | 66 | 67 | AAA066 | 66 | 67 | AAA066 |  6 |  6 | 0006
-  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006 |  6 |  6 | 0006
- 48 |  8 | 00048 | Wed Feb 18 00:00:00 1970 PST | Wed Feb 18 00:00:00 1970 | 8  | 8          | foo | 48 | 49 | AAA048 | 48 | 49 | AAA048 |  8 |  8 | 0008
+ 96 |  6 | 00096 | Tue Apr 07 00:00:00 1970 PST | Tue Apr 07 00:00:00 1970 | 6  | 6          | foo | 96 | 97 | AAA096 | 96 | 97 | AAA096 |  6 |  6 | 0006
  18 |  8 | 00018 | Mon Jan 19 00:00:00 1970 PST | Mon Jan 19 00:00:00 1970 | 8  | 8          | foo | 18 | 19 | AAA018 | 18 | 19 |        |  8 |  8 | 0008
+ 48 |  8 | 00048 | Wed Feb 18 00:00:00 1970 PST | Wed Feb 18 00:00:00 1970 | 8  | 8          | foo | 48 | 49 | AAA048 | 48 | 49 | AAA048 |  8 |  8 | 0008
  78 |  8 | 00078 | Fri Mar 20 00:00:00 1970 PST | Fri Mar 20 00:00:00 1970 | 8  | 8          | foo | 78 | 79 | AAA078 | 78 | 79 | AAA078 |  8 |  8 | 0008
 (13 rows)
 
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index c2b9625ae88..3efc22de24e 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -104,7 +104,6 @@
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
-#include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 398cf6965e0..02abebdb0d7 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -94,8 +94,8 @@ select a, b, grouping(a,b), sum(v), count(*), max(v)
  1 |   |        1 |  60 |     5 |  14
  1 | 1 |        0 |  21 |     2 |  11
  2 |   |        1 |  15 |     1 |  15
- 3 |   |        1 |  33 |     2 |  17
  1 | 2 |        0 |  25 |     2 |  13
+ 3 |   |        1 |  33 |     2 |  17
  1 | 3 |        0 |  14 |     1 |  14
  4 |   |        1 |  37 |     2 |  19
  4 | 1 |        0 |  37 |     2 |  19
diff --git a/src/test/regress/expected/inet.out b/src/test/regress/expected/inet.out
index 1705bff4dd3..85a3a6a7de5 100644
--- a/src/test/regress/expected/inet.out
+++ b/src/test/regress/expected/inet.out
@@ -465,9 +465,9 @@ SELECT * FROM inet_tbl WHERE i < '192.168.1.0/24'::cidr ORDER BY i;
       c      |      i      
 -------------+-------------
  10.0.0.0/8  | 9.1.2.3/8
- 10.0.0.0/32 | 10.1.2.3/8
  10.0.0.0/8  | 10.1.2.3/8
  10.0.0.0/8  | 10.1.2.3/8
+ 10.0.0.0/32 | 10.1.2.3/8
  10.1.0.0/16 | 10.1.2.3/16
  10.1.2.0/24 | 10.1.2.3/24
  10.1.2.3/32 | 10.1.2.3
@@ -613,9 +613,9 @@ SELECT * FROM inet_tbl WHERE i < '192.168.1.0/24'::cidr ORDER BY i;
       c      |      i      
 -------------+-------------
  10.0.0.0/8  | 9.1.2.3/8
- 10.0.0.0/32 | 10.1.2.3/8
  10.0.0.0/8  | 10.1.2.3/8
  10.0.0.0/8  | 10.1.2.3/8
+ 10.0.0.0/32 | 10.1.2.3/8
  10.1.0.0/16 | 10.1.2.3/16
  10.1.2.0/24 | 10.1.2.3/24
  10.1.2.3/32 | 10.1.2.3
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 0e82ca1867a..9b6a9f536d3 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -220,8 +220,8 @@ SELECT t1.a, t2.e
 ---+----
  0 |   
  1 | -1
- 2 |  2
  2 |  4
+ 2 |  2
  3 | -3
  5 | -5
  5 | -5
@@ -1575,8 +1575,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -1589,8 +1589,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -1683,8 +1683,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -1696,8 +1696,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -1720,8 +1720,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -1736,8 +1736,8 @@ SELECT *
 ---+---+-------+---+----
  0 |   | zero  | 0 |   
  1 | 4 | one   | 1 | -1
- 2 | 3 | two   | 2 |  2
  2 | 3 | two   | 2 |  4
+ 2 | 3 | two   | 2 |  2
  3 | 2 | three | 3 | -3
  5 | 0 | five  | 5 | -5
  5 | 0 | five  | 5 | -5
@@ -1820,8 +1820,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -1835,8 +1835,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -2776,8 +2776,8 @@ select * from
 ---+---+-------+---+----
    |   |       |   |  0
    |   |       |   |   
-   | 0 | zero  |   |   
    |   | null  |   |   
+   | 0 | zero  |   |   
  8 | 8 | eight |   |   
  7 | 7 | seven |   |   
  6 | 6 | six   |   |   
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..41440c10cdd 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -38,8 +38,8 @@ EXECUTE prepstmt;
  4567890123456789 | -4567890123456789
  4567890123456789 |               123
               123 |               456
-              123 |  4567890123456789
  4567890123456789 |  4567890123456789
+              123 |  4567890123456789
 (5 rows)
 
 EXECUTE prepstmt2(123);
@@ -64,8 +64,8 @@ EXECUTE prepstmt;
  4567890123456789 | -4567890123456789
  4567890123456789 |               123
               123 |               456
-              123 |  4567890123456789
  4567890123456789 |  4567890123456789
+              123 |  4567890123456789
 (5 rows)
 
 EXECUTE prepstmt2(123);
@@ -86,8 +86,8 @@ EXECUTE vprep;
  4567890123456789 | -4567890123456789
  4567890123456789 |               123
               123 |               456
-              123 |  4567890123456789
  4567890123456789 |  4567890123456789
+              123 |  4567890123456789
 (5 rows)
 
 CREATE OR REPLACE TEMP VIEW pcacheview AS
@@ -98,8 +98,8 @@ EXECUTE vprep;
  4567890123456789 | -2283945061728394
  4567890123456789 |                61
               123 |               228
-              123 |  2283945061728394
  4567890123456789 |  2283945061728394
+              123 |  2283945061728394
 (5 rows)
 
 -- Check basic SPI plan invalidation
diff --git a/src/test/regress/expected/sqljson.out b/src/test/regress/expected/sqljson.out
index c7b9e575445..13fa4f2262d 100644
--- a/src/test/regress/expected/sqljson.out
+++ b/src/test/regress/expected/sqljson.out
@@ -870,10 +870,10 @@ FROM
    4 | [4, 4]
    4 | [4, 4]
    2 | [4, 4]
-   5 | [5, 3, 5]
-   3 | [5, 3, 5]
-   1 | [5, 3, 5]
-   5 | [5, 3, 5]
+   3 | [3, 5, 5]
+   1 | [3, 5, 5]
+   5 | [3, 5, 5]
+   5 | [3, 5, 5]
      | 
      | 
      | 
diff --git a/src/test/regress/expected/tsrf.out b/src/test/regress/expected/tsrf.out
index c4f7b187f5b..fd3914b0fad 100644
--- a/src/test/regress/expected/tsrf.out
+++ b/src/test/regress/expected/tsrf.out
@@ -354,18 +354,18 @@ SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(d
  a     | foo | 1 |     1
  a     |     | 1 |     2
  b     | bar | 1 |     1
- b     |     | 1 |     1
-       |     | 1 |     3
        | bar | 1 |     2
        | foo | 1 |     1
-       | foo | 2 |     1
+       |     | 1 |     3
+ b     |     | 1 |     1
  a     | bar | 2 |     1
- b     |     | 2 |     1
  a     | foo | 2 |     1
-       | bar | 2 |     2
  a     |     | 2 |     2
-       |     | 2 |     3
  b     | bar | 2 |     1
+       | bar | 2 |     2
+       | foo | 2 |     1
+ b     |     | 2 |     1
+       |     | 2 |     3
 (16 rows)
 
 SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g);
@@ -433,26 +433,26 @@ SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(d
  a     | foo | 1 |     1
  b     | bar | 1 |     1
        | bar | 1 |     2
-       | foo | 1 |     1
+       |     | 1 |     3
  a     |     | 1 |     2
  b     |     | 1 |     1
-       |     | 1 |     3
+       | foo | 1 |     1
+ a     | bar | 2 |     1
+ a     | foo | 2 |     1
+ b     | bar | 2 |     1
+       | bar | 2 |     2
  a     |     | 2 |     2
  b     |     | 2 |     1
-       | bar | 2 |     2
        |     | 2 |     3
        | foo | 2 |     1
- a     | bar | 2 |     1
- a     | foo | 2 |     1
- b     | bar | 2 |     1
- a     |     |   |     4
- b     | bar |   |     2
- b     |     |   |     2
+ a     | bar |   |     2
+       | foo |   |     2
        |     |   |     6
  a     | foo |   |     2
- a     | bar |   |     2
+ a     |     |   |     4
        | bar |   |     4
-       | foo |   |     2
+ b     | bar |   |     2
+ b     |     |   |     2
 (24 rows)
 
 reset enable_hashagg;
@@ -600,8 +600,8 @@ FROM (VALUES (3, 2), (3,1), (1,1), (1,4), (5,3), (5,1)) AS t(a, b);
  a | b | g 
 ---+---+---
  3 | 2 | 1
- 5 | 1 | 2
- 3 | 1 | 3
+ 3 | 2 | 2
+ 3 | 2 | 3
 (3 rows)
 
 -- LIMIT / OFFSET is evaluated after SRF evaluation
diff --git a/src/test/regress/expected/tuplesort.out b/src/test/regress/expected/tuplesort.out
index 6dd97e7427a..fc1321bf443 100644
--- a/src/test/regress/expected/tuplesort.out
+++ b/src/test/regress/expected/tuplesort.out
@@ -304,9 +304,9 @@ FROM abbrev_abort_uuids
 ORDER BY ctid DESC LIMIT 5;
   id   |           abort_increasing           |           abort_decreasing           |          noabort_increasing          |          noabort_decreasing          
 -------+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------
-     0 |                                      |                                      |                                      | 
  20002 |                                      |                                      |                                      | 
  20003 |                                      |                                      |                                      | 
+     0 |                                      |                                      |                                      | 
  10009 | 00000000-0000-0000-0000-000000010008 | 00000000-0000-0000-0000-000000009992 | 00010008-0000-0000-0000-000000010008 | 00009992-0000-0000-0000-000000009992
  10008 | 00000000-0000-0000-0000-000000010007 | 00000000-0000-0000-0000-000000009993 | 00010007-0000-0000-0000-000000010007 | 00009993-0000-0000-0000-000000009993
 (5 rows)
@@ -335,9 +335,9 @@ FROM abbrev_abort_uuids
 ORDER BY ctid DESC LIMIT 5;
   id   |           abort_increasing           |           abort_decreasing           |          noabort_increasing          |          noabort_decreasing          
 -------+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------
-     0 |                                      |                                      |                                      | 
- 20003 |                                      |                                      |                                      | 
  20002 |                                      |                                      |                                      | 
+ 20003 |                                      |                                      |                                      | 
+     0 |                                      |                                      |                                      | 
   9993 | 00000000-0000-0000-0000-000000009992 | 00000000-0000-0000-0000-000000010008 | 00009992-0000-0000-0000-000000009992 | 00010008-0000-0000-0000-000000010008
   9994 | 00000000-0000-0000-0000-000000009993 | 00000000-0000-0000-0000-000000010007 | 00009993-0000-0000-0000-000000009993 | 00010007-0000-0000-0000-000000010007
 (5 rows)
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 9e2f53726f5..b3cdeaea4b3 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -95,13 +95,13 @@ SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITI
 -----------+-------+--------+------
  develop   |     7 |   4200 |    1
  personnel |     5 |   3500 |    1
- sales     |     3 |   4800 |    1
  sales     |     4 |   4800 |    1
- personnel |     2 |   3900 |    2
+ sales     |     3 |   4800 |    1
  develop   |     9 |   4500 |    2
- sales     |     1 |   5000 |    3
+ personnel |     2 |   3900 |    2
  develop   |    11 |   5200 |    3
  develop   |    10 |   5200 |    3
+ sales     |     1 |   5000 |    3
  develop   |     8 |   6000 |    5
 (10 rows)
 
@@ -394,12 +394,12 @@ SELECT first_value(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM te
 SELECT last_value(four) OVER (ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;
  last_value | ten | four 
 ------------+-----+------
-          0 |   0 |    0
-          0 |   0 |    2
-          0 |   0 |    0
-          1 |   1 |    1
+          2 |   0 |    0
+          2 |   0 |    0
+          2 |   0 |    2
           1 |   1 |    3
           1 |   1 |    1
+          1 |   1 |    1
           3 |   3 |    3
           0 |   4 |    0
           1 |   7 |    1
@@ -821,14 +821,14 @@ SELECT sum(unique1) over (order by four range between current row and unbounded
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  45 |       0 |    0
-  45 |       8 |    0
   45 |       4 |    0
-  33 |       5 |    1
-  33 |       9 |    1
+  45 |       8 |    0
+  45 |       0 |    0
   33 |       1 |    1
-  18 |       6 |    2
+  33 |       9 |    1
+  33 |       5 |    1
   18 |       2 |    2
+  18 |       6 |    2
   10 |       3 |    3
   10 |       7 |    3
 (10 rows)
@@ -940,14 +940,14 @@ SELECT first_value(unique1) over (ORDER BY four rows between current row and 2 f
 FROM tenk1 WHERE unique1 < 10;
  first_value | unique1 | four 
 -------------+---------+------
-           8 |       0 |    0
-           4 |       8 |    0
-           5 |       4 |    0
-           9 |       5 |    1
-           1 |       9 |    1
-           6 |       1 |    1
-           2 |       6 |    2
-           3 |       2 |    2
+           8 |       4 |    0
+           0 |       8 |    0
+           1 |       0 |    0
+           9 |       1 |    1
+           5 |       9 |    1
+           2 |       5 |    1
+           6 |       2 |    2
+           3 |       6 |    2
            7 |       3 |    3
              |       7 |    3
 (10 rows)
@@ -957,14 +957,14 @@ SELECT first_value(unique1) over (ORDER BY four rows between current row and 2 f
 FROM tenk1 WHERE unique1 < 10;
  first_value | unique1 | four 
 -------------+---------+------
-             |       0 |    0
-           5 |       8 |    0
-           5 |       4 |    0
-             |       5 |    1
-           6 |       9 |    1
-           6 |       1 |    1
-           3 |       6 |    2
+             |       4 |    0
+           1 |       8 |    0
+           1 |       0 |    0
+             |       1 |    1
+           2 |       9 |    1
+           2 |       5 |    1
            3 |       2 |    2
+           3 |       6 |    2
              |       3 |    3
              |       7 |    3
 (10 rows)
@@ -974,14 +974,14 @@ SELECT first_value(unique1) over (ORDER BY four rows between current row and 2 f
 FROM tenk1 WHERE unique1 < 10;
  first_value | unique1 | four 
 -------------+---------+------
-           0 |       0 |    0
-           8 |       8 |    0
            4 |       4 |    0
-           5 |       5 |    1
-           9 |       9 |    1
+           8 |       8 |    0
+           0 |       0 |    0
            1 |       1 |    1
-           6 |       6 |    2
+           9 |       9 |    1
+           5 |       5 |    1
            2 |       2 |    2
+           6 |       6 |    2
            3 |       3 |    3
            7 |       7 |    3
 (10 rows)
@@ -991,14 +991,14 @@ SELECT last_value(unique1) over (ORDER BY four rows between current row and 2 fo
 FROM tenk1 WHERE unique1 < 10;
  last_value | unique1 | four 
 ------------+---------+------
-          4 |       0 |    0
-          5 |       8 |    0
-          9 |       4 |    0
-          1 |       5 |    1
-          6 |       9 |    1
-          2 |       1 |    1
-          3 |       6 |    2
-          7 |       2 |    2
+          0 |       4 |    0
+          1 |       8 |    0
+          9 |       0 |    0
+          5 |       1 |    1
+          2 |       9 |    1
+          6 |       5 |    1
+          3 |       2 |    2
+          7 |       6 |    2
           7 |       3 |    3
             |       7 |    3
 (10 rows)
@@ -1008,14 +1008,14 @@ SELECT last_value(unique1) over (ORDER BY four rows between current row and 2 fo
 FROM tenk1 WHERE unique1 < 10;
  last_value | unique1 | four 
 ------------+---------+------
-            |       0 |    0
-          5 |       8 |    0
-          9 |       4 |    0
-            |       5 |    1
-          6 |       9 |    1
-          2 |       1 |    1
-          3 |       6 |    2
-          7 |       2 |    2
+            |       4 |    0
+          1 |       8 |    0
+          9 |       0 |    0
+            |       1 |    1
+          2 |       9 |    1
+          6 |       5 |    1
+          3 |       2 |    2
+          7 |       6 |    2
             |       3 |    3
             |       7 |    3
 (10 rows)
@@ -1025,14 +1025,14 @@ SELECT last_value(unique1) over (ORDER BY four rows between current row and 2 fo
 FROM tenk1 WHERE unique1 < 10;
  last_value | unique1 | four 
 ------------+---------+------
-          0 |       0 |    0
-          5 |       8 |    0
-          9 |       4 |    0
-          5 |       5 |    1
-          6 |       9 |    1
-          2 |       1 |    1
-          3 |       6 |    2
-          7 |       2 |    2
+          4 |       4 |    0
+          1 |       8 |    0
+          9 |       0 |    0
+          1 |       1 |    1
+          2 |       9 |    1
+          6 |       5 |    1
+          3 |       2 |    2
+          7 |       6 |    2
           3 |       3 |    3
           7 |       7 |    3
 (10 rows)
@@ -1093,14 +1093,14 @@ SELECT sum(unique1) over (w range between current row and unbounded following),
 FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four);
  sum | unique1 | four 
 -----+---------+------
-  45 |       0 |    0
-  45 |       8 |    0
   45 |       4 |    0
-  33 |       5 |    1
-  33 |       9 |    1
+  45 |       8 |    0
+  45 |       0 |    0
   33 |       1 |    1
-  18 |       6 |    2
+  33 |       9 |    1
+  33 |       5 |    1
   18 |       2 |    2
+  18 |       6 |    2
   10 |       3 |    3
   10 |       7 |    3
 (10 rows)
@@ -1110,14 +1110,14 @@ SELECT sum(unique1) over (w range between unbounded preceding and current row ex
 FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four);
  sum | unique1 | four 
 -----+---------+------
-  12 |       0 |    0
-   4 |       8 |    0
    8 |       4 |    0
-  22 |       5 |    1
-  18 |       9 |    1
+   4 |       8 |    0
+  12 |       0 |    0
   26 |       1 |    1
-  29 |       6 |    2
+  18 |       9 |    1
+  22 |       5 |    1
   33 |       2 |    2
+  29 |       6 |    2
   42 |       3 |    3
   38 |       7 |    3
 (10 rows)
@@ -1127,14 +1127,14 @@ SELECT sum(unique1) over (w range between unbounded preceding and current row ex
 FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four);
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   35 |       3 |    3
   35 |       7 |    3
 (10 rows)
@@ -1144,14 +1144,14 @@ SELECT sum(unique1) over (w range between unbounded preceding and current row ex
 FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four);
  sum | unique1 | four 
 -----+---------+------
-   0 |       0 |    0
-   8 |       8 |    0
    4 |       4 |    0
-  17 |       5 |    1
-  21 |       9 |    1
+   8 |       8 |    0
+   0 |       0 |    0
   13 |       1 |    1
-  33 |       6 |    2
+  21 |       9 |    1
+  17 |       5 |    1
   29 |       2 |    2
+  33 |       6 |    2
   38 |       3 |    3
   42 |       7 |    3
 (10 rows)
@@ -1163,14 +1163,14 @@ FROM tenk1 WHERE unique1 < 10
 WINDOW w AS (order by four range between current row and unbounded following);
  first_value | nth_2 | last_value | unique1 | four 
 -------------+-------+------------+---------+------
-           0 |     8 |          7 |       0 |    0
-           0 |     8 |          7 |       8 |    0
-           0 |     8 |          7 |       4 |    0
-           5 |     9 |          7 |       5 |    1
-           5 |     9 |          7 |       9 |    1
-           5 |     9 |          7 |       1 |    1
-           6 |     2 |          7 |       6 |    2
-           6 |     2 |          7 |       2 |    2
+           4 |     8 |          7 |       4 |    0
+           4 |     8 |          7 |       8 |    0
+           4 |     8 |          7 |       0 |    0
+           1 |     9 |          7 |       1 |    1
+           1 |     9 |          7 |       9 |    1
+           1 |     9 |          7 |       5 |    1
+           2 |     6 |          7 |       2 |    2
+           2 |     6 |          7 |       6 |    2
            3 |     7 |          7 |       3 |    3
            3 |     7 |          7 |       7 |    3
 (10 rows)
@@ -1367,14 +1367,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -1386,14 +1386,14 @@ FROM tenk1 WHERE unique1 < 10;
 -----+---------+------
      |       3 |    3
      |       7 |    3
-  10 |       6 |    2
   10 |       2 |    2
+  10 |       6 |    2
   18 |       9 |    1
   18 |       5 |    1
   18 |       1 |    1
-  23 |       0 |    0
-  23 |       8 |    0
   23 |       4 |    0
+  23 |       8 |    0
+  23 |       0 |    0
 (10 rows)
 
 SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::int2 preceding exclude no others),
@@ -1401,14 +1401,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -1418,14 +1418,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -1435,14 +1435,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -1452,14 +1452,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -1469,14 +1469,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 6::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  33 |       0 |    0
-  41 |       8 |    0
   37 |       4 |    0
-  35 |       5 |    1
-  39 |       9 |    1
+  41 |       8 |    0
+  33 |       0 |    0
   31 |       1 |    1
-  43 |       6 |    2
+  39 |       9 |    1
+  35 |       5 |    1
   39 |       2 |    2
+  43 |       6 |    2
   26 |       3 |    3
   30 |       7 |    3
 (10 rows)
@@ -1486,14 +1486,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 6::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  33 |       0 |    0
-  33 |       8 |    0
   33 |       4 |    0
-  30 |       5 |    1
-  30 |       9 |    1
+  33 |       8 |    0
+  33 |       0 |    0
   30 |       1 |    1
-  37 |       6 |    2
+  30 |       9 |    1
+  30 |       5 |    1
   37 |       2 |    2
+  37 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -1539,13 +1539,13 @@ select sum(salary) over (order by enroll_date range between '1 year'::interval p
  34900 |   5000 | 10-01-2006
  34900 |   6000 | 10-01-2006
  38400 |   3900 | 12-23-2006
- 47100 |   4800 | 08-01-2007
  47100 |   5200 | 08-01-2007
+ 47100 |   4800 | 08-01-2007
  47100 |   4800 | 08-08-2007
  47100 |   5200 | 08-15-2007
  36100 |   3500 | 12-10-2007
- 32200 |   4500 | 01-01-2008
  32200 |   4200 | 01-01-2008
+ 32200 |   4500 | 01-01-2008
 (10 rows)
 
 select sum(salary) over (order by enroll_date desc range between '1 year'::interval preceding and '1 year'::interval following),
@@ -1557,8 +1557,8 @@ select sum(salary) over (order by enroll_date desc range between '1 year'::inter
  36100 |   3500 | 12-10-2007
  47100 |   5200 | 08-15-2007
  47100 |   4800 | 08-08-2007
- 47100 |   4800 | 08-01-2007
  47100 |   5200 | 08-01-2007
+ 47100 |   4800 | 08-01-2007
  38400 |   3900 | 12-23-2006
  34900 |   5000 | 10-01-2006
  34900 |   6000 | 10-01-2006
@@ -1573,8 +1573,8 @@ select sum(salary) over (order by enroll_date desc range between '1 year'::inter
      |   3500 | 12-10-2007
      |   5200 | 08-15-2007
      |   4800 | 08-08-2007
-     |   4800 | 08-01-2007
      |   5200 | 08-01-2007
+     |   4800 | 08-01-2007
      |   3900 | 12-23-2006
      |   5000 | 10-01-2006
      |   6000 | 10-01-2006
@@ -1587,13 +1587,13 @@ select sum(salary) over (order by enroll_date range between '1 year'::interval p
  29900 |   5000 | 10-01-2006
  28900 |   6000 | 10-01-2006
  34500 |   3900 | 12-23-2006
- 42300 |   4800 | 08-01-2007
  41900 |   5200 | 08-01-2007
+ 42300 |   4800 | 08-01-2007
  42300 |   4800 | 08-08-2007
  41900 |   5200 | 08-15-2007
  32600 |   3500 | 12-10-2007
- 27700 |   4500 | 01-01-2008
  28000 |   4200 | 01-01-2008
+ 27700 |   4500 | 01-01-2008
 (10 rows)
 
 select sum(salary) over (order by enroll_date range between '1 year'::interval preceding and '1 year'::interval following
@@ -1603,13 +1603,13 @@ select sum(salary) over (order by enroll_date range between '1 year'::interval p
  23900 |   5000 | 10-01-2006
  23900 |   6000 | 10-01-2006
  34500 |   3900 | 12-23-2006
- 37100 |   4800 | 08-01-2007
  37100 |   5200 | 08-01-2007
+ 37100 |   4800 | 08-01-2007
  42300 |   4800 | 08-08-2007
  41900 |   5200 | 08-15-2007
  32600 |   3500 | 12-10-2007
- 23500 |   4500 | 01-01-2008
  23500 |   4200 | 01-01-2008
+ 23500 |   4500 | 01-01-2008
 (10 rows)
 
 select sum(salary) over (order by enroll_date range between '1 year'::interval preceding and '1 year'::interval following
@@ -1619,13 +1619,13 @@ select sum(salary) over (order by enroll_date range between '1 year'::interval p
  28900 |   5000 | 10-01-2006
  29900 |   6000 | 10-01-2006
  38400 |   3900 | 12-23-2006
- 41900 |   4800 | 08-01-2007
  42300 |   5200 | 08-01-2007
+ 41900 |   4800 | 08-01-2007
  47100 |   4800 | 08-08-2007
  47100 |   5200 | 08-15-2007
  36100 |   3500 | 12-10-2007
- 28000 |   4500 | 01-01-2008
  27700 |   4200 | 01-01-2008
+ 28000 |   4500 | 01-01-2008
 (10 rows)
 
 select first_value(salary) over(order by salary range between 1000 preceding and 1000 following),
@@ -1710,13 +1710,13 @@ select first_value(salary) over(order by enroll_date range between unbounded pre
         5000 |       5200 |   5000 | 10-01-2006
         6000 |       5200 |   6000 | 10-01-2006
         5000 |       3500 |   3900 | 12-23-2006
-        5000 |       4200 |   4800 | 08-01-2007
-        5000 |       4200 |   5200 | 08-01-2007
-        5000 |       4200 |   4800 | 08-08-2007
-        5000 |       4200 |   5200 | 08-15-2007
-        5000 |       4200 |   3500 | 12-10-2007
-        5000 |       4200 |   4500 | 01-01-2008
-        5000 |       4200 |   4200 | 01-01-2008
+        5000 |       4500 |   5200 | 08-01-2007
+        5000 |       4500 |   4800 | 08-01-2007
+        5000 |       4500 |   4800 | 08-08-2007
+        5000 |       4500 |   5200 | 08-15-2007
+        5000 |       4500 |   3500 | 12-10-2007
+        5000 |       4500 |   4200 | 01-01-2008
+        5000 |       4500 |   4500 | 01-01-2008
 (10 rows)
 
 select first_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following
@@ -1729,13 +1729,13 @@ select first_value(salary) over(order by enroll_date range between unbounded pre
         5000 |       5200 |   5000 | 10-01-2006
         6000 |       5200 |   6000 | 10-01-2006
         5000 |       3500 |   3900 | 12-23-2006
-        5000 |       4200 |   4800 | 08-01-2007
-        5000 |       4200 |   5200 | 08-01-2007
-        5000 |       4200 |   4800 | 08-08-2007
-        5000 |       4200 |   5200 | 08-15-2007
-        5000 |       4200 |   3500 | 12-10-2007
-        5000 |       4500 |   4500 | 01-01-2008
+        5000 |       4500 |   5200 | 08-01-2007
+        5000 |       4500 |   4800 | 08-01-2007
+        5000 |       4500 |   4800 | 08-08-2007
+        5000 |       4500 |   5200 | 08-15-2007
+        5000 |       4500 |   3500 | 12-10-2007
         5000 |       4200 |   4200 | 01-01-2008
+        5000 |       4500 |   4500 | 01-01-2008
 (10 rows)
 
 select first_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following
@@ -1748,13 +1748,13 @@ select first_value(salary) over(order by enroll_date range between unbounded pre
         3900 |       5200 |   5000 | 10-01-2006
         3900 |       5200 |   6000 | 10-01-2006
         5000 |       3500 |   3900 | 12-23-2006
-        5000 |       4200 |   4800 | 08-01-2007
-        5000 |       4200 |   5200 | 08-01-2007
-        5000 |       4200 |   4800 | 08-08-2007
-        5000 |       4200 |   5200 | 08-15-2007
-        5000 |       4200 |   3500 | 12-10-2007
-        5000 |       3500 |   4500 | 01-01-2008
+        5000 |       4500 |   5200 | 08-01-2007
+        5000 |       4500 |   4800 | 08-01-2007
+        5000 |       4500 |   4800 | 08-08-2007
+        5000 |       4500 |   5200 | 08-15-2007
+        5000 |       4500 |   3500 | 12-10-2007
         5000 |       3500 |   4200 | 01-01-2008
+        5000 |       3500 |   4500 | 01-01-2008
 (10 rows)
 
 select first_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following
@@ -1767,13 +1767,13 @@ select first_value(salary) over(order by enroll_date range between unbounded pre
         6000 |       5200 |   5000 | 10-01-2006
         5000 |       5200 |   6000 | 10-01-2006
         5000 |       3500 |   3900 | 12-23-2006
-        5000 |       4200 |   4800 | 08-01-2007
-        5000 |       4200 |   5200 | 08-01-2007
-        5000 |       4200 |   4800 | 08-08-2007
-        5000 |       4200 |   5200 | 08-15-2007
-        5000 |       4200 |   3500 | 12-10-2007
-        5000 |       4200 |   4500 | 01-01-2008
+        5000 |       4500 |   5200 | 08-01-2007
+        5000 |       4500 |   4800 | 08-01-2007
+        5000 |       4500 |   4800 | 08-08-2007
+        5000 |       4500 |   5200 | 08-15-2007
+        5000 |       4500 |   3500 | 12-10-2007
         5000 |       4500 |   4200 | 01-01-2008
+        5000 |       4200 |   4500 | 01-01-2008
 (10 rows)
 
 -- RANGE offset PRECEDING/FOLLOWING with null values
@@ -1828,8 +1828,8 @@ window w as
   (order by x desc nulls first range between 2 preceding and 2 following);
  x | y  | first_value | last_value 
 ---+----+-------------+------------
-   | 43 |          43 |         42
-   | 42 |          43 |         42
+   | 42 |          42 |         43
+   | 43 |          42 |         43
  5 |  5 |           5 |          3
  4 |  4 |           5 |          2
  3 |  3 |           5 |          1
@@ -2751,10 +2751,10 @@ window w as (order by f_timestamptz desc range between
   7 | Wed Oct 19 02:23:54 2005 PDT |           8 |          6
   6 | Tue Oct 19 02:23:54 2004 PDT |           7 |          5
   5 | Sun Oct 19 02:23:54 2003 PDT |           6 |          4
-  4 | Sat Oct 19 02:23:54 2002 PDT |           5 |          2
-  3 | Fri Oct 19 02:23:54 2001 PDT |           4 |          1
+  4 | Sat Oct 19 02:23:54 2002 PDT |           5 |          3
   2 | Fri Oct 19 02:23:54 2001 PDT |           4 |          1
-  1 | Thu Oct 19 02:23:54 2000 PDT |           3 |          1
+  3 | Fri Oct 19 02:23:54 2001 PDT |           4 |          1
+  1 | Thu Oct 19 02:23:54 2000 PDT |           2 |          1
   0 | -infinity                    |           0 |          0
 (12 rows)
 
@@ -2862,10 +2862,10 @@ window w as (order by f_timestamp desc range between
   7 | Wed Oct 19 10:23:54 2005 |           8 |          6
   6 | Tue Oct 19 10:23:54 2004 |           7 |          5
   5 | Sun Oct 19 10:23:54 2003 |           6 |          4
-  4 | Sat Oct 19 10:23:54 2002 |           5 |          2
-  3 | Fri Oct 19 10:23:54 2001 |           4 |          1
+  4 | Sat Oct 19 10:23:54 2002 |           5 |          3
   2 | Fri Oct 19 10:23:54 2001 |           4 |          1
-  1 | Thu Oct 19 10:23:54 2000 |           3 |          1
+  3 | Fri Oct 19 10:23:54 2001 |           4 |          1
+  1 | Thu Oct 19 10:23:54 2000 |           2 |          1
   0 | -infinity                |           0 |          0
 (12 rows)
 
@@ -2983,14 +2983,14 @@ SELECT sum(unique1) over (order by four groups between unbounded preceding and c
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  12 |       0 |    0
-  12 |       8 |    0
   12 |       4 |    0
-  27 |       5 |    1
-  27 |       9 |    1
+  12 |       8 |    0
+  12 |       0 |    0
   27 |       1 |    1
-  35 |       6 |    2
+  27 |       9 |    1
+  27 |       5 |    1
   35 |       2 |    2
+  35 |       6 |    2
   45 |       3 |    3
   45 |       7 |    3
 (10 rows)
@@ -3000,14 +3000,14 @@ SELECT sum(unique1) over (order by four groups between unbounded preceding and u
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  45 |       0 |    0
-  45 |       8 |    0
   45 |       4 |    0
-  45 |       5 |    1
-  45 |       9 |    1
+  45 |       8 |    0
+  45 |       0 |    0
   45 |       1 |    1
-  45 |       6 |    2
+  45 |       9 |    1
+  45 |       5 |    1
   45 |       2 |    2
+  45 |       6 |    2
   45 |       3 |    3
   45 |       7 |    3
 (10 rows)
@@ -3017,14 +3017,14 @@ SELECT sum(unique1) over (order by four groups between current row and unbounded
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  45 |       0 |    0
-  45 |       8 |    0
   45 |       4 |    0
-  33 |       5 |    1
-  33 |       9 |    1
+  45 |       8 |    0
+  45 |       0 |    0
   33 |       1 |    1
-  18 |       6 |    2
+  33 |       9 |    1
+  33 |       5 |    1
   18 |       2 |    2
+  18 |       6 |    2
   10 |       3 |    3
   10 |       7 |    3
 (10 rows)
@@ -3034,14 +3034,14 @@ SELECT sum(unique1) over (order by four groups between 1 preceding and unbounded
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  45 |       0 |    0
-  45 |       8 |    0
   45 |       4 |    0
-  45 |       5 |    1
-  45 |       9 |    1
+  45 |       8 |    0
+  45 |       0 |    0
   45 |       1 |    1
-  33 |       6 |    2
+  45 |       9 |    1
+  45 |       5 |    1
   33 |       2 |    2
+  33 |       6 |    2
   18 |       3 |    3
   18 |       7 |    3
 (10 rows)
@@ -3051,14 +3051,14 @@ SELECT sum(unique1) over (order by four groups between 1 following and unbounded
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  33 |       0 |    0
-  33 |       8 |    0
   33 |       4 |    0
-  18 |       5 |    1
-  18 |       9 |    1
+  33 |       8 |    0
+  33 |       0 |    0
   18 |       1 |    1
-  10 |       6 |    2
+  18 |       9 |    1
+  18 |       5 |    1
   10 |       2 |    2
+  10 |       6 |    2
      |       3 |    3
      |       7 |    3
 (10 rows)
@@ -3068,14 +3068,14 @@ SELECT sum(unique1) over (order by four groups between unbounded preceding and 2
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  35 |       0 |    0
-  35 |       8 |    0
   35 |       4 |    0
-  45 |       5 |    1
-  45 |       9 |    1
+  35 |       8 |    0
+  35 |       0 |    0
   45 |       1 |    1
-  45 |       6 |    2
+  45 |       9 |    1
+  45 |       5 |    1
   45 |       2 |    2
+  45 |       6 |    2
   45 |       3 |    3
   45 |       7 |    3
 (10 rows)
@@ -3085,14 +3085,14 @@ SELECT sum(unique1) over (order by four groups between 2 preceding and 1 precedi
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -3102,14 +3102,14 @@ SELECT sum(unique1) over (order by four groups between 2 preceding and 1 followi
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  27 |       0 |    0
-  27 |       8 |    0
   27 |       4 |    0
-  35 |       5 |    1
-  35 |       9 |    1
+  27 |       8 |    0
+  27 |       0 |    0
   35 |       1 |    1
-  45 |       6 |    2
+  35 |       9 |    1
+  35 |       5 |    1
   45 |       2 |    2
+  45 |       6 |    2
   33 |       3 |    3
   33 |       7 |    3
 (10 rows)
@@ -3119,14 +3119,14 @@ SELECT sum(unique1) over (order by four groups between 0 preceding and 0 followi
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  12 |       0 |    0
-  12 |       8 |    0
   12 |       4 |    0
-  15 |       5 |    1
-  15 |       9 |    1
+  12 |       8 |    0
+  12 |       0 |    0
   15 |       1 |    1
-   8 |       6 |    2
+  15 |       9 |    1
+  15 |       5 |    1
    8 |       2 |    2
+   8 |       6 |    2
   10 |       3 |    3
   10 |       7 |    3
 (10 rows)
@@ -3136,14 +3136,14 @@ SELECT sum(unique1) over (order by four groups between 2 preceding and 1 followi
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  27 |       0 |    0
-  19 |       8 |    0
   23 |       4 |    0
-  30 |       5 |    1
-  26 |       9 |    1
+  19 |       8 |    0
+  27 |       0 |    0
   34 |       1 |    1
-  39 |       6 |    2
+  26 |       9 |    1
+  30 |       5 |    1
   43 |       2 |    2
+  39 |       6 |    2
   30 |       3 |    3
   26 |       7 |    3
 (10 rows)
@@ -3153,14 +3153,14 @@ SELECT sum(unique1) over (order by four groups between 2 preceding and 1 followi
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  15 |       0 |    0
-  15 |       8 |    0
   15 |       4 |    0
-  20 |       5 |    1
-  20 |       9 |    1
+  15 |       8 |    0
+  15 |       0 |    0
   20 |       1 |    1
-  37 |       6 |    2
+  20 |       9 |    1
+  20 |       5 |    1
   37 |       2 |    2
+  37 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -3170,14 +3170,14 @@ SELECT sum(unique1) over (order by four groups between 2 preceding and 1 followi
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  15 |       0 |    0
-  23 |       8 |    0
   19 |       4 |    0
-  25 |       5 |    1
-  29 |       9 |    1
+  23 |       8 |    0
+  15 |       0 |    0
   21 |       1 |    1
-  43 |       6 |    2
+  29 |       9 |    1
+  25 |       5 |    1
   39 |       2 |    2
+  43 |       6 |    2
   26 |       3 |    3
   30 |       7 |    3
 (10 rows)
@@ -3258,14 +3258,14 @@ select first_value(salary) over(order by enroll_date groups between 1 preceding
 -------------+------+-----------+--------+-------------
         5000 | 6000 |      5000 |   5000 | 10-01-2006
         5000 | 3900 |      5000 |   6000 | 10-01-2006
-        5000 | 4800 |      5000 |   3900 | 12-23-2006
-        3900 | 5200 |      3900 |   4800 | 08-01-2007
+        5000 | 5200 |      5000 |   3900 | 12-23-2006
         3900 | 4800 |      3900 |   5200 | 08-01-2007
-        4800 | 5200 |      4800 |   4800 | 08-08-2007
+        3900 | 4800 |      3900 |   4800 | 08-01-2007
+        5200 | 5200 |      5200 |   4800 | 08-08-2007
         4800 | 3500 |      4800 |   5200 | 08-15-2007
-        5200 | 4500 |      5200 |   3500 | 12-10-2007
-        3500 | 4200 |      3500 |   4500 | 01-01-2008
-        3500 |      |      3500 |   4200 | 01-01-2008
+        5200 | 4200 |      5200 |   3500 | 12-10-2007
+        3500 | 4500 |      3500 |   4200 | 01-01-2008
+        3500 |      |      3500 |   4500 | 01-01-2008
 (10 rows)
 
 select last_value(salary) over(order by enroll_date groups between 1 preceding and 1 following),
@@ -3275,14 +3275,14 @@ select last_value(salary) over(order by enroll_date groups between 1 preceding a
 ------------+------+--------+-------------
        3900 |      |   5000 | 10-01-2006
        3900 | 5000 |   6000 | 10-01-2006
-       5200 | 6000 |   3900 | 12-23-2006
-       4800 | 3900 |   4800 | 08-01-2007
-       4800 | 4800 |   5200 | 08-01-2007
-       5200 | 5200 |   4800 | 08-08-2007
+       4800 | 6000 |   3900 | 12-23-2006
+       4800 | 3900 |   5200 | 08-01-2007
+       4800 | 5200 |   4800 | 08-01-2007
+       5200 | 4800 |   4800 | 08-08-2007
        3500 | 4800 |   5200 | 08-15-2007
-       4200 | 5200 |   3500 | 12-10-2007
-       4200 | 3500 |   4500 | 01-01-2008
-       4200 | 4500 |   4200 | 01-01-2008
+       4500 | 5200 |   3500 | 12-10-2007
+       4500 | 3500 |   4200 | 01-01-2008
+       4500 | 4200 |   4500 | 01-01-2008
 (10 rows)
 
 select first_value(salary) over(order by enroll_date groups between 1 following and 3 following
@@ -3295,14 +3295,14 @@ select first_value(salary) over(order by enroll_date groups between 1 following
 -------------+------+-----------+--------+-------------
         3900 | 6000 |      3900 |   5000 | 10-01-2006
         3900 | 3900 |      3900 |   6000 | 10-01-2006
-        4800 | 4800 |      4800 |   3900 | 12-23-2006
-        4800 | 5200 |      4800 |   4800 | 08-01-2007
+        5200 | 5200 |      5200 |   3900 | 12-23-2006
         4800 | 4800 |      4800 |   5200 | 08-01-2007
+        4800 | 4800 |      4800 |   4800 | 08-01-2007
         5200 | 5200 |      5200 |   4800 | 08-08-2007
         3500 | 3500 |      3500 |   5200 | 08-15-2007
-        4500 | 4500 |      4500 |   3500 | 12-10-2007
-             | 4200 |           |   4500 | 01-01-2008
-             |      |           |   4200 | 01-01-2008
+        4200 | 4200 |      4200 |   3500 | 12-10-2007
+             | 4500 |           |   4200 | 01-01-2008
+             |      |           |   4500 | 01-01-2008
 (10 rows)
 
 select last_value(salary) over(order by enroll_date groups between 1 following and 3 following
@@ -3314,13 +3314,13 @@ select last_value(salary) over(order by enroll_date groups between 1 following a
        4800 |      |   5000 | 10-01-2006
        4800 | 5000 |   6000 | 10-01-2006
        5200 | 6000 |   3900 | 12-23-2006
-       3500 | 3900 |   4800 | 08-01-2007
-       3500 | 4800 |   5200 | 08-01-2007
-       4200 | 5200 |   4800 | 08-08-2007
-       4200 | 4800 |   5200 | 08-15-2007
-       4200 | 5200 |   3500 | 12-10-2007
-            | 3500 |   4500 | 01-01-2008
-            | 4500 |   4200 | 01-01-2008
+       3500 | 3900 |   5200 | 08-01-2007
+       3500 | 5200 |   4800 | 08-01-2007
+       4500 | 4800 |   4800 | 08-08-2007
+       4500 | 4800 |   5200 | 08-15-2007
+       4500 | 5200 |   3500 | 12-10-2007
+            | 3500 |   4200 | 01-01-2008
+            | 4200 |   4500 | 01-01-2008
 (10 rows)
 
 -- Show differences in offset interpretation between ROWS, RANGE, and GROUPS
-- 
2.51.1

v4-0003-WIP-make-some-regression-tests-sort-order-more-de.patchtext/x-patch; charset=US-ASCII; name=v4-0003-WIP-make-some-regression-tests-sort-order-more-de.patchDownload

From 0525e6f284779da7f68f35bc415f3da0848ec0df Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Wed, 12 Nov 2025 18:56:29 +0700
Subject: [PATCH v4 3/4] WIP make some regression tests' sort order more
 deterministic

The previous commit still results in failures in the TAP test
002_pg_upgrade.pl, namely that the regression tests fail on
the old cluster.

XXX it's not clear why only some tests fail this way
---
 src/test/regress/expected/tsrf.out   | 16 +++----
 src/test/regress/expected/window.out | 72 ++++++++++++++--------------
 src/test/regress/sql/tsrf.sql        |  2 +-
 src/test/regress/sql/window.sql      | 20 ++++----
 4 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/src/test/regress/expected/tsrf.out b/src/test/regress/expected/tsrf.out
index fd3914b0fad..f5647ee561c 100644
--- a/src/test/regress/expected/tsrf.out
+++ b/src/test/regress/expected/tsrf.out
@@ -397,26 +397,24 @@ SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(d
        |     | 2 |     3
 (24 rows)
 
-SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g) ORDER BY dataa;
+SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g) ORDER BY dataa, datab, g;
  dataa |  b  | g | count 
 -------+-----+---+-------
- a     | foo |   |     2
- a     |     |   |     4
- a     |     | 2 |     2
  a     | bar | 1 |     1
  a     | bar | 2 |     1
  a     | bar |   |     2
  a     | foo | 1 |     1
  a     | foo | 2 |     1
+ a     | foo |   |     2
  a     |     | 1 |     2
+ a     |     | 2 |     2
+ a     |     |   |     4
  b     | bar | 1 |     1
- b     |     |   |     2
- b     |     | 1 |     1
  b     | bar | 2 |     1
  b     | bar |   |     2
+ b     |     | 1 |     1
  b     |     | 2 |     1
-       |     | 2 |     3
-       |     |   |     6
+ b     |     |   |     2
        | bar | 1 |     2
        | bar | 2 |     2
        | bar |   |     4
@@ -424,6 +422,8 @@ SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(d
        | foo | 2 |     1
        | foo |   |     2
        |     | 1 |     3
+       |     | 2 |     3
+       |     |   |     6
 (24 rows)
 
 SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g) ORDER BY g;
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index b3cdeaea4b3..8a38417e721 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -18,13 +18,13 @@ INSERT INTO empsalary VALUES
 ('sales', 3, 4800, '2007-08-01'),
 ('develop', 8, 6000, '2006-10-01'),
 ('develop', 11, 5200, '2007-08-15');
-SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary;
+SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary, empno;
   depname  | empno | salary |  sum  
 -----------+-------+--------+-------
  develop   |     7 |   4200 | 25100
  develop   |     9 |   4500 | 25100
- develop   |    11 |   5200 | 25100
  develop   |    10 |   5200 | 25100
+ develop   |    11 |   5200 | 25100
  develop   |     8 |   6000 | 25100
  personnel |     5 |   3500 |  7400
  personnel |     2 |   3900 |  7400
@@ -33,18 +33,18 @@ SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM emps
  sales     |     1 |   5000 | 14600
 (10 rows)
 
-SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary) FROM empsalary;
+SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary, empno) FROM empsalary;
   depname  | empno | salary | rank 
 -----------+-------+--------+------
  develop   |     7 |   4200 |    1
  develop   |     9 |   4500 |    2
- develop   |    11 |   5200 |    3
  develop   |    10 |   5200 |    3
+ develop   |    11 |   5200 |    4
  develop   |     8 |   6000 |    5
  personnel |     5 |   3500 |    1
  personnel |     2 |   3900 |    2
  sales     |     3 |   4800 |    1
- sales     |     4 |   4800 |    1
+ sales     |     4 |   4800 |    2
  sales     |     1 |   5000 |    3
 (10 rows)
 
@@ -75,33 +75,33 @@ GROUP BY four, ten ORDER BY four, ten;
     3 |   9 | 7500 |     9.0000000000000000
 (20 rows)
 
-SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname);
+SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname) ORDER BY depname, empno;
   depname  | empno | salary |  sum  
 -----------+-------+--------+-------
- develop   |    11 |   5200 | 25100
  develop   |     7 |   4200 | 25100
- develop   |     9 |   4500 | 25100
  develop   |     8 |   6000 | 25100
+ develop   |     9 |   4500 | 25100
  develop   |    10 |   5200 | 25100
- personnel |     5 |   3500 |  7400
+ develop   |    11 |   5200 | 25100
  personnel |     2 |   3900 |  7400
- sales     |     3 |   4800 | 14600
+ personnel |     5 |   3500 |  7400
  sales     |     1 |   5000 | 14600
+ sales     |     3 |   4800 | 14600
  sales     |     4 |   4800 | 14600
 (10 rows)
 
-SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w;
+SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w, empno;
   depname  | empno | salary | rank 
 -----------+-------+--------+------
- develop   |     7 |   4200 |    1
- personnel |     5 |   3500 |    1
- sales     |     4 |   4800 |    1
  sales     |     3 |   4800 |    1
- develop   |     9 |   4500 |    2
+ sales     |     4 |   4800 |    1
+ personnel |     5 |   3500 |    1
+ develop   |     7 |   4200 |    1
  personnel |     2 |   3900 |    2
- develop   |    11 |   5200 |    3
- develop   |    10 |   5200 |    3
+ develop   |     9 |   4500 |    2
  sales     |     1 |   5000 |    3
+ develop   |    10 |   5200 |    3
+ develop   |    11 |   5200 |    3
  develop   |     8 |   6000 |    5
 (10 rows)
 
@@ -3754,7 +3754,7 @@ FROM empsalary;
 SELECT
     empno,
     depname,
-    row_number() OVER (PARTITION BY depname ORDER BY enroll_date) rn,
+    row_number() OVER (PARTITION BY depname ORDER BY enroll_date, empno) rn,
     rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS BETWEEN
                  UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk,
     count(*) OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN
@@ -3765,8 +3765,8 @@ FROM empsalary;
      8 | develop   |  1 |   1 |   1
     10 | develop   |  2 |   2 |   1
     11 | develop   |  3 |   3 |   1
-     9 | develop   |  4 |   4 |   2
-     7 | develop   |  5 |   4 |   2
+     7 | develop   |  4 |   4 |   2
+     9 | develop   |  5 |   4 |   2
      2 | personnel |  1 |   1 |   1
      5 | personnel |  2 |   2 |   1
      1 | sales     |  1 |   1 |   1
@@ -4202,7 +4202,7 @@ SELECT * FROM
 
 -- Ensure we correctly filter out all of the run conditions from each window
 SELECT * FROM
-  (SELECT *,
+  (SELECT depname,
           count(salary) OVER (PARTITION BY depname || '') c1, -- w1
           row_number() OVER (PARTITION BY depname) rn, -- w2
           count(*) OVER (PARTITION BY depname) c2, -- w2
@@ -4210,10 +4210,10 @@ SELECT * FROM
           ntile(2) OVER (PARTITION BY depname) nt -- w2
    FROM empsalary
 ) e WHERE rn <= 1 AND c1 <= 3 AND nt < 2;
-  depname  | empno | salary | enroll_date | c1 | rn | c2 | c3 | nt 
------------+-------+--------+-------------+----+----+----+----+----
- personnel |     5 |   3500 | 12-10-2007  |  2 |  1 |  2 |  2 |  1
- sales     |     3 |   4800 | 08-01-2007  |  3 |  1 |  3 |  3 |  1
+  depname  | c1 | rn | c2 | c3 | nt 
+-----------+----+----+----+----+----
+ personnel |  2 |  1 |  2 |  2 |  1
+ sales     |  3 |  1 |  3 |  3 |  1
 (2 rows)
 
 -- Ensure we remove references to reduced outer joins as nulling rels in run
@@ -4498,23 +4498,23 @@ SELECT * FROM
           empno,
           salary,
           enroll_date,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date) AS first_emp,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC) AS last_emp
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date, empno) AS first_emp,
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC, empno) AS last_emp
    FROM empsalary) emp
 WHERE first_emp = 1 OR last_emp = 1;
-                                                         QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------------
+                                                                 QUERY PLAN                                                                  
+---------------------------------------------------------------------------------------------------------------------------------------------
  Subquery Scan on emp
    Filter: ((emp.first_emp = 1) OR (emp.last_emp = 1))
    ->  WindowAgg
-         Window: w2 AS (PARTITION BY empsalary.depname ORDER BY empsalary.enroll_date ROWS UNBOUNDED PRECEDING)
+         Window: w2 AS (PARTITION BY empsalary.depname ORDER BY empsalary.enroll_date, empsalary.empno ROWS UNBOUNDED PRECEDING)
          ->  Incremental Sort
-               Sort Key: empsalary.depname, empsalary.enroll_date
+               Sort Key: empsalary.depname, empsalary.enroll_date, empsalary.empno
                Presorted Key: empsalary.depname
                ->  WindowAgg
-                     Window: w1 AS (PARTITION BY empsalary.depname ORDER BY empsalary.enroll_date ROWS UNBOUNDED PRECEDING)
+                     Window: w1 AS (PARTITION BY empsalary.depname ORDER BY empsalary.enroll_date, empsalary.empno ROWS UNBOUNDED PRECEDING)
                      ->  Sort
-                           Sort Key: empsalary.depname, empsalary.enroll_date DESC
+                           Sort Key: empsalary.depname, empsalary.enroll_date DESC, empsalary.empno
                            ->  Seq Scan on empsalary
 (12 rows)
 
@@ -4523,14 +4523,14 @@ SELECT * FROM
           empno,
           salary,
           enroll_date,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date) AS first_emp,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC) AS last_emp
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date, empno) AS first_emp,
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC, empno) AS last_emp
    FROM empsalary) emp
 WHERE first_emp = 1 OR last_emp = 1;
   depname  | empno | salary | enroll_date | first_emp | last_emp 
 -----------+-------+--------+-------------+-----------+----------
  develop   |     8 |   6000 | 10-01-2006  |         1 |        5
- develop   |     7 |   4200 | 01-01-2008  |         5 |        1
+ develop   |     7 |   4200 | 01-01-2008  |         4 |        1
  personnel |     2 |   3900 | 12-23-2006  |         1 |        2
  personnel |     5 |   3500 | 12-10-2007  |         2 |        1
  sales     |     1 |   5000 | 10-01-2006  |         1 |        3
diff --git a/src/test/regress/sql/tsrf.sql b/src/test/regress/sql/tsrf.sql
index 7c22529a0db..af7bd4bdd95 100644
--- a/src/test/regress/sql/tsrf.sql
+++ b/src/test/regress/sql/tsrf.sql
@@ -96,7 +96,7 @@ SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(d
 SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab) ORDER BY dataa;
 SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab) ORDER BY g;
 SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g);
-SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g) ORDER BY dataa;
+SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g) ORDER BY dataa, datab, g;
 SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g) ORDER BY g;
 reset enable_hashagg;
 
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index 37d837a2f66..cb28d552fe8 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -21,17 +21,17 @@ INSERT INTO empsalary VALUES
 ('develop', 8, 6000, '2006-10-01'),
 ('develop', 11, 5200, '2007-08-15');
 
-SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary;
+SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary, empno;
 
-SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary) FROM empsalary;
+SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary, empno) FROM empsalary;
 
 -- with GROUP BY
 SELECT four, ten, SUM(SUM(four)) OVER (PARTITION BY four), AVG(ten) FROM tenk1
 GROUP BY four, ten ORDER BY four, ten;
 
-SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname);
+SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname) ORDER BY depname, empno;
 
-SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w;
+SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w, empno;
 
 -- empty window specification
 SELECT COUNT(*) OVER () FROM tenk1 WHERE unique2 < 10;
@@ -1145,7 +1145,7 @@ FROM empsalary;
 SELECT
     empno,
     depname,
-    row_number() OVER (PARTITION BY depname ORDER BY enroll_date) rn,
+    row_number() OVER (PARTITION BY depname ORDER BY enroll_date, empno) rn,
     rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS BETWEEN
                  UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk,
     count(*) OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN
@@ -1366,7 +1366,7 @@ SELECT * FROM
 
 -- Ensure we correctly filter out all of the run conditions from each window
 SELECT * FROM
-  (SELECT *,
+  (SELECT depname,
           count(salary) OVER (PARTITION BY depname || '') c1, -- w1
           row_number() OVER (PARTITION BY depname) rn, -- w2
           count(*) OVER (PARTITION BY depname) c2, -- w2
@@ -1507,8 +1507,8 @@ SELECT * FROM
           empno,
           salary,
           enroll_date,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date) AS first_emp,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC) AS last_emp
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date, empno) AS first_emp,
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC, empno) AS last_emp
    FROM empsalary) emp
 WHERE first_emp = 1 OR last_emp = 1;
 
@@ -1517,8 +1517,8 @@ SELECT * FROM
           empno,
           salary,
           enroll_date,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date) AS first_emp,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC) AS last_emp
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date, empno) AS first_emp,
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC, empno) AS last_emp
    FROM empsalary) emp
 WHERE first_emp = 1 OR last_emp = 1;
 
-- 
2.51.1

v4-0001-Use-radix-sort-when-datum1-is-an-integer-type.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Use-radix-sort-when-datum1-is-an-integer-type.patchDownload

From 4564bbf46e40834368975e0cea528d6077437576 Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Fri, 17 Oct 2025 09:57:43 +0700
Subject: [PATCH v4 1/4] Use radix sort when datum1 is an integer type

For now this only works for signed and unsigned ints
with the usual comparison semantics, the same types
for which we previously had separate qsort
specializations.

Temporary GUC wip_radix_sort for testing
---
 src/backend/utils/misc/guc_parameters.dat |   7 +
 src/backend/utils/sort/tuplesort.c        | 399 ++++++++++++++++++++--
 src/include/utils/guc.h                   |   1 +
 src/include/utils/tuplesort.h             |   1 +
 4 files changed, 389 insertions(+), 19 deletions(-)

diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 1128167c025..c9167eb4bb4 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -3469,6 +3469,13 @@
   max => 'INT_MAX',
 },
 
+{ name => 'wip_radix_sort', type => 'bool', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Test radix sort for debugging.',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'wip_radix_sort',
+  boot_val => 'true',
+},
+
 { name => 'work_mem', type => 'int', context => 'PGC_USERSET', group => 'RESOURCES_MEM',
   short_desc => 'Sets the maximum memory to be used for query workspaces.',
   long_desc => 'This much memory can be used by each internal sort operation and hash table before switching to temporary disk files.',
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 5d4411dc33f..c2b9625ae88 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -104,6 +104,7 @@
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -122,6 +123,7 @@
 
 /* GUC variables */
 bool		trace_sort = false;
+bool		wip_radix_sort = true;	/* FIXME not for commit */
 
 #ifdef DEBUG_BOUNDED_SORT
 bool		optimize_bounded_sort = true;
@@ -615,6 +617,25 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 #define ST_DEFINE
 #include "lib/sort_template.h"
 
+
+#ifdef USE_ASSERT_CHECKING
+/* WIP: for now prefer test coverage of radix sort in Assert builds. */
+#define QSORT_THRESHOLD 0
+#else
+/* WIP: low because qsort_tuple() is slow -- we could raise this with a new specialization */
+#define QSORT_THRESHOLD 40
+#endif
+
+typedef struct RadixPartitionInfo
+{
+	union
+	{
+		size_t		count;
+		size_t		offset;
+	};
+	size_t		next_offset;
+}			RadixPartitionInfo;
+
 /*
  *		tuplesort_begin_xxx
  *
@@ -2663,10 +2684,334 @@ sort_bounded_heap(Tuplesortstate *state)
 	state->boundUsed = true;
 }
 
+static inline uint8_t
+extract_byte(Datum key, int level)
+{
+	return (key >> (((SIZEOF_DATUM - 1) - level) * 8)) & 0xFF;
+}
+
+/*
+ * Normalize datum to work with pure unsigned comparison,
+ * taking ASC/DESC into account as well.
+ */
+static inline Datum
+normalize_datum(Datum orig, SortSupport ssup)
+{
+	Datum		norm_datum1;
+
+	if (ssup->comparator == ssup_datum_signed_cmp)
+	{
+		norm_datum1 = orig + ((uint64) PG_INT64_MAX) + 1;
+	}
+	else if (ssup->comparator == ssup_datum_int32_cmp)
+	{
+		/*
+		 * First truncate to uint32. Technically, we don't need to do this,
+		 * but it forces the upper bytes to remain the same regardless of
+		 * sign.
+		 */
+		uint32		u32 = DatumGetUInt32(orig) + ((uint32) PG_INT32_MAX) + 1;
+
+		norm_datum1 = UInt32GetDatum(u32);
+	}
+	else
+	{
+		Assert(ssup->comparator == ssup_datum_unsigned_cmp);
+		norm_datum1 = orig;
+	}
+
+	if (ssup->ssup_reverse)
+		norm_datum1 = ~norm_datum1;
+
+	return norm_datum1;
+}
+
+/*
+ * Based on implementation in https://github.com/skarupke/ska_sort (Boost license),
+ * with the following noncosmetic change:
+ *  - count sorted partitions in every pass, rather than maintaining a
+ *    list of unsorted partitions
+ */
+static void
+radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state)
+{
+	RadixPartitionInfo partitions[256] = {0};
+	uint8_t		remaining_partitions[256] = {0};
+	size_t		total = 0;
+	int			num_partitions = 0;
+	int			num_remaining;
+	SortSupport ssup = &state->base.sortKeys[0];
+	size_t		start_offset = 0;
+	SortTuple  *partition_begin = begin;
+
+	/* count key chunks */
+	for (SortTuple *tup = begin; tup < begin + n_elems; tup++)
+	{
+		uint8		current_byte;
+
+		/* extract the byte for this level from the normalized datum */
+		current_byte = extract_byte(normalize_datum(tup->datum1, ssup),
+									level);
+
+		/* save it for the permutation step */
+		tup->current_byte = current_byte;
+
+		partitions[current_byte].count++;
+	}
+
+	/* compute partition offsets */
+	for (int i = 0; i < 256; i++)
+	{
+		size_t		count = partitions[i].count;
+
+		if (count)
+		{
+			partitions[i].offset = total;
+			total += count;
+			remaining_partitions[num_partitions] = i;
+			num_partitions++;
+		}
+		partitions[i].next_offset = total;
+	}
+
+	num_remaining = num_partitions;
+
+	/*
+	 * Permute tuples to correct partition. If we started with one partition,
+	 * there is nothing to do. If a permutation from a previous iteration
+	 * results in a single partition that hasn't been marked as sorted, we
+	 * know it's actually sorted.
+	 */
+	while (num_remaining > 1)
+	{
+		/*
+		 * We can only exit the loop when all partitions are sorted, so must
+		 * reset every iteration
+		 */
+		num_remaining = num_partitions;
+
+		for (int i = 0; i < num_partitions; i++)
+		{
+			uint8		idx = remaining_partitions[i];
+
+			RadixPartitionInfo part = partitions[idx];
+
+			for (SortTuple *st = begin + part.offset;
+				 st < begin + part.next_offset;
+				 st++)
+			{
+				size_t		offset = partitions[st->current_byte].offset++;
+				SortTuple	tmp;
+
+				/* swap current tuple with destination position */
+				Assert(offset < n_elems);
+				tmp = *st;
+				*st = begin[offset];
+				begin[offset] = tmp;
+			};
+
+			if (part.offset == part.next_offset)
+			{
+				/* partition is sorted */
+				num_remaining--;
+			}
+		}
+	}
+
+	/* recurse */
+	for (uint8_t *rp = remaining_partitions;
+		 rp < remaining_partitions + num_partitions;
+		 rp++)
+	{
+		size_t		end_offset = partitions[*rp].next_offset;
+		SortTuple  *partition_end = begin + end_offset;
+		ptrdiff_t	num_elements = end_offset - start_offset;
+
+		if (num_elements > 1)
+		{
+			if (level < SIZEOF_DATUM - 1)
+			{
+				if (num_elements < QSORT_THRESHOLD)
+				{
+					qsort_tuple(partition_begin,
+								num_elements,
+								state->base.comparetup,
+								state);
+				}
+				else
+				{
+					radix_sort_tuple(partition_begin,
+									 num_elements,
+									 level + 1,
+									 state);
+				}
+			}
+			else if (state->base.onlyKey == NULL)
+			{
+				/*
+				 * We've finished radix sort on all bytes of the pass-by-value
+				 * datum (possibly abbreviated), now qsort with the tiebreak
+				 * comparator.
+				 */
+				qsort_tuple(partition_begin,
+							num_elements,
+							state->base.comparetup_tiebreak,
+							state);
+			}
+		}
+
+		start_offset = end_offset;
+		partition_begin = partition_end;
+	}
+}
+
 /*
- * Sort all memtuples using specialized qsort() routines.
+ * Partition tuples by NULL and NOT NULL first sort key.
+ * Then dispatch to either radix sort or qsort.
+ */
+static void
+sort_byvalue_datum(Tuplesortstate *state)
+{
+	SortSupportData ssup = state->base.sortKeys[0];
+
+	bool		nulls_first = ssup.ssup_nulls_first;
+	SortTuple  *data = state->memtuples;
+	SortTuple  *null_start;
+	SortTuple  *not_null_start;
+	size_t		d1 = 0,
+				d2,
+				null_count,
+				not_null_count;
+
+	/*
+	 * First, partition by NULL-ness of the leading sort key, since we can
+	 * only radix sort on NOT NULL pass-by-value datums.
+	 */
+
+	/*
+	 * Find the first NOT NULL tuple if NULLS FIRST, or first NULL element if
+	 * NULLS LAST. This is a quick check for the common case where all tuples
+	 * are NOT NULL in the first sort key.
+	 */
+	while (d1 < state->memtupcount && data[d1].isnull1 == nulls_first)
+		d1++;
+
+	/*
+	 * If we have more than one tuple left after the quick check, partition
+	 * the remainder using branchless cyclic permutation, based on
+	 * https://orlp.net/blog/branchless-lomuto-partitioning/
+	 */
+	if (d1 < state->memtupcount - 1)
+	{
+		size_t		j = d1;
+		SortTuple	save = data[d1];	/* create gap at front */
+
+		/* WIP: more comments */
+		while (j < state->memtupcount - 1)
+		{
+			data[j] = data[d1];
+			j += 1;
+			data[d1] = data[j];
+			d1 += (data[d1].isnull1 == nulls_first);
+		}
+
+		data[j] = data[d1];
+		data[d1] = save;
+		d1 += (data[d1].isnull1 == nulls_first);
+	}
+
+	/* d1 is now the number of elements in the left partition */
+	d2 = state->memtupcount - d1;
+
+	/* set pointers and counts for each partition */
+	if (nulls_first)
+	{
+		null_start = state->memtuples;
+		null_count = d1;
+		not_null_start = state->memtuples + d1;
+		not_null_count = d2;
+	}
+	else
+	{
+		not_null_start = state->memtuples;
+		not_null_count = d1;
+		null_start = state->memtuples + d1;
+		null_count = d2;
+	}
+
+	for (SortTuple *tup = null_start;
+		 tup < null_start + null_count;
+		 tup++)
+		Assert(tup->isnull1 == true);
+	for (SortTuple *tup = not_null_start;
+		 tup < not_null_start + not_null_count;
+		 tup++)
+		Assert(tup->isnull1 == false);
+
+	/*
+	 * Sort the NULL partition using tiebreak comparator, if necessary. XXX
+	 * this will repeat the comparison on isnull1 for abbreviated keys.
+	 */
+	if (state->base.onlyKey == NULL && null_count > 1)
+	{
+		qsort_tuple(null_start,
+					null_count,
+					state->base.comparetup_tiebreak,
+					state);
+	}
+
+	/*
+	 * Sort the NOT NULL partition, using radix sort if large enough,
+	 * otherwise fall back to quicksort.
+	 */
+	if (not_null_count > 1)
+	{
+		if (not_null_count < QSORT_THRESHOLD)
+		{
+			/*
+			 * WIP: We could compute the common prefix, save the following
+			 * byte in current_byte, and use a new qsort specialization for
+			 * that. Same for the diversion to qsort while recursing during
+			 * radix sort.
+			 */
+			qsort_tuple(not_null_start,
+						not_null_count,
+						state->base.comparetup,
+						state);
+		}
+		else
+		{
+			radix_sort_tuple(not_null_start,
+							 not_null_count,
+							 0,
+							 state);
+		}
+	}
+}
+
+/* Verify sort using standard comparator. */
+static void
+verify_sorted_memtuples(Tuplesortstate *state)
+{
+#ifdef USE_ASSERT_CHECKING
+	for (SortTuple *tup = state->memtuples + 1;
+		 tup < state->memtuples + state->memtupcount;
+		 tup++)
+	{
+#if 0
+		Assert(COMPARETUP(state, tup - 1, tup) <= 0);
+#else
+		if (COMPARETUP(state, tup - 1, tup) > 0)
+			elog(ERROR, "SORT FAILED");
+#endif
+	}
+#endif
+}
+
+/*
+ * Sort all memtuples using specialized routines.
  *
- * Quicksort is used for small in-memory sorts, and external sort runs.
+ * Quicksort or radix sort is used for small in-memory sorts, and external sort runs.
  */
 static void
 tuplesort_sort_memtuples(Tuplesortstate *state)
@@ -2681,26 +3026,42 @@ tuplesort_sort_memtuples(Tuplesortstate *state)
 		 */
 		if (state->base.haveDatum1 && state->base.sortKeys)
 		{
-			if (state->base.sortKeys[0].comparator == ssup_datum_unsigned_cmp)
-			{
-				qsort_tuple_unsigned(state->memtuples,
-									 state->memtupcount,
-									 state);
-				return;
-			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_signed_cmp)
+			SortSupportData ssup = state->base.sortKeys[0];
+
+			if (wip_radix_sort)
 			{
-				qsort_tuple_signed(state->memtuples,
-								   state->memtupcount,
-								   state);
-				return;
+				if ((ssup.comparator == ssup_datum_unsigned_cmp ||
+					 ssup.comparator == ssup_datum_signed_cmp ||
+					 ssup.comparator == ssup_datum_int32_cmp))
+				{
+					sort_byvalue_datum(state);
+					verify_sorted_memtuples(state);
+					return;
+				}
 			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_int32_cmp)
+			else
 			{
-				qsort_tuple_int32(state->memtuples,
-								  state->memtupcount,
-								  state);
-				return;
+				if (state->base.sortKeys[0].comparator == ssup_datum_unsigned_cmp)
+				{
+					qsort_tuple_unsigned(state->memtuples,
+										 state->memtupcount,
+										 state);
+					return;
+				}
+				else if (state->base.sortKeys[0].comparator == ssup_datum_signed_cmp)
+				{
+					qsort_tuple_signed(state->memtuples,
+									   state->memtupcount,
+									   state);
+					return;
+				}
+				else if (state->base.sortKeys[0].comparator == ssup_datum_int32_cmp)
+				{
+					qsort_tuple_int32(state->memtuples,
+									  state->memtupcount,
+									  state);
+					return;
+				}
 			}
 		}
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index f21ec37da89..bc6f7fa60f3 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -324,6 +324,7 @@ extern PGDLLIMPORT int tcp_user_timeout;
 extern PGDLLIMPORT char *role_string;
 extern PGDLLIMPORT bool in_hot_standby_guc;
 extern PGDLLIMPORT bool trace_sort;
+extern PGDLLIMPORT bool wip_radix_sort;
 
 #ifdef DEBUG_BOUNDED_SORT
 extern PGDLLIMPORT bool optimize_bounded_sort;
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 0bf55902aa1..e40c6e52f81 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -150,6 +150,7 @@ typedef struct
 	void	   *tuple;			/* the tuple itself */
 	Datum		datum1;			/* value of first key column */
 	bool		isnull1;		/* is first key column NULL? */
+	uint8		current_byte;	/* chunk of datum1 conditioned for radix sort */
 	int			srctape;		/* source tape number */
 } SortTuple;
 
-- 
2.51.1

#13

David Geier

geidav.pg@gmail.com

2 months ago

In reply to: John Naylor (#1)

Re: tuple radix sort

On 29.10.2025 07:28, John Naylor wrote:

Next steps: Try to find regressions (help welcome here). The v1 patch
has some optimizations, but in other ways things are simple and/or
wasteful. Exactly how things fit together will be informed by what, if
anything, has to be done to avoid regressions. I suspect the challenge
will be multikey sorts when the first key has low cardinality. This is
because tiebreaks are necessarily postponed rather than taken care of
up front. I'm optimistic, since low cardinality cases can be even
faster than our B&M qsort, so we have some headroom:

Hi John,

I've also been looking into radix sort the last days to accelerate GIN
index builds. Ordering and removing duplicates requires a fast sort in
generate_trgm(). My own implementation (likely slower than the
algorithms you used) also showed a decent speedup.

Beyond that there are many more places in the code base that could be
changed to use radix sort instead of qsort.

What would be great is if we could build a generic radix sort
implementation, similarly to sort_template.h that can be used in other
places. We would have to think a bit about the interface because instead
of a comparator we would require some radix extraction callback.

If you're open to that idea I could give abstracting the code a try.

--
David Geier

#14

John Naylor

johncnaylorls@gmail.com

2 months ago

In reply to: David Geier (#13)

Re: tuple radix sort

On Wed, Nov 12, 2025 at 9:28 PM David Geier <geidav.pg@gmail.com> wrote:

I've also been looking into radix sort the last days to accelerate GIN
index builds. Ordering and removing duplicates requires a fast sort in
generate_trgm().

If that's the case then I suggest first seeing if dfd8e6c73ee made
things any worse. A simpler possible improvement is to use a similar
normalization step for the chars, if needed, then do the sort and
quinique with a specialization for unsigned chars. (We don't yet
specialize qunique, but that can be remedied). If you're interested,
please start a separate thread for that.

What would be great is if we could build a generic radix sort
implementation, similarly to sort_template.h that can be used in other
places. We would have to think a bit about the interface because instead
of a comparator we would require some radix extraction callback.

That's moving the goalposts too far IMO. I want to get to a place
where I feel comfortable with the decisions made, and that already
requires a lot of testing. Also, I don't want to risk introducing
abstractions that make future improvements to tuplesort more
cumbersome.

--
John Naylor
Amazon Web Services

#15

David Geier

geidav.pg@gmail.com

about 2 months ago

In reply to: John Naylor (#14)

Re: tuple radix sort

Hi John!

On 13.11.2025 05:01, John Naylor wrote:

If that's the case then I suggest first seeing if dfd8e6c73ee made
things any worse. A simpler possible improvement is to use a similar
normalization step for the chars, if needed, then do the sort and
quinique with a specialization for unsigned chars. (We don't yet
specialize qunique, but that can be remedied). If you're interested,
please start a separate thread for that.

It did but only a bit. I worked around it by having two sort
specializations, one for signed and one for unsigned. I also wanted to
try to use a hash table to filter out duplicates and then only sort the
remaining unique trigams, which are, most of the times, a lot less.

Generally speaking, the GIN code is death by a thousand cuts. I've got a
patch coming up that cuts CREATE INDEX runtime in half for columns with
relatively short strings and yields even better results for columns with
longer strings. But that's not only changing the sort but requires a few
changes in a couple of places. More details in the upcoming thread.

I thought qunique() is already pretty optimal because it's defined in a
header file. I believe that even the comparator gets inlined. What would
be useful though is if qunique() used an equality comparator which only
returns true/false instead of a sort comparator. In the GIN code this
also shaved off a few percent. I'll take a closer look at qunique() at
open a thread with the findings / ideas for changes.

Anyways. In this context GIN was just one example for where a generic
radix sort would be useful and there are certainly more.

That's moving the goalposts too far IMO. I want to get to a place
where I feel comfortable with the decisions made, and that already
requires a lot of testing. Also, I don't want to risk introducing
abstractions that make future improvements to tuplesort more
cumbersome.

On a quick glance it looks like you didn't specialize much. So the
testing seems related to if the new algo introduces regressions, not if
the abstraction would cause problems. So it should be possible to
extract out the code fairly easily without invalidating your existing
benchmark results.

I understand that you want to make progress with the use case at hand
but I feel like we're missing out on a lot of opportunity where the
introduced code would also be very beneficial. Beyond that we could
nicely test the new sort code in the spirit of test_rbtree.c and
friends. Maybe you want to give it a 2nd thought.

--
David Geier

#16

John Naylor

johncnaylorls@gmail.com

about 2 months ago

In reply to: David Geier (#15)

Re: tuple radix sort

On Sat, Nov 15, 2025 at 1:05 AM David Geier <geidav.pg@gmail.com> wrote:

I understand that you want to make progress with the use case at hand
but I feel like we're missing out on a lot of opportunity where the
introduced code would also be very beneficial.

The patch is independently beneficial, but is also just a stepping
stone toward something larger, and I don't yet know exactly how it's
going to look. Premature abstractions are just going to get in the
way. I'd be open to hear proposals for possible wider application
after the dust settles, but that's not going to happen during the PG19
cycle.

--
John Naylor
Amazon Web Services

On Sat, Nov 15, 2025 at 1:05 AM David Geier <geidav.pg@gmail.com> wrote:

Hi John!

On 13.11.2025 05:01, John Naylor wrote:

If that's the case then I suggest first seeing if dfd8e6c73ee made
things any worse. A simpler possible improvement is to use a similar
normalization step for the chars, if needed, then do the sort and
quinique with a specialization for unsigned chars. (We don't yet
specialize qunique, but that can be remedied). If you're interested,
please start a separate thread for that.

It did but only a bit. I worked around it by having two sort
specializations, one for signed and one for unsigned. I also wanted to
try to use a hash table to filter out duplicates and then only sort the
remaining unique trigams, which are, most of the times, a lot less.

Generally speaking, the GIN code is death by a thousand cuts. I've got a
patch coming up that cuts CREATE INDEX runtime in half for columns with
relatively short strings and yields even better results for columns with
longer strings. But that's not only changing the sort but requires a few
changes in a couple of places. More details in the upcoming thread.

I thought qunique() is already pretty optimal because it's defined in a
header file. I believe that even the comparator gets inlined. What would
be useful though is if qunique() used an equality comparator which only
returns true/false instead of a sort comparator. In the GIN code this
also shaved off a few percent. I'll take a closer look at qunique() at
open a thread with the findings / ideas for changes.

Anyways. In this context GIN was just one example for where a generic
radix sort would be useful and there are certainly more.

That's moving the goalposts too far IMO. I want to get to a place
where I feel comfortable with the decisions made, and that already
requires a lot of testing. Also, I don't want to risk introducing
abstractions that make future improvements to tuplesort more
cumbersome.

On a quick glance it looks like you didn't specialize much. So the
testing seems related to if the new algo introduces regressions, not if
the abstraction would cause problems. So it should be possible to
extract out the code fairly easily without invalidating your existing
benchmark results.

I understand that you want to make progress with the use case at hand
but I feel like we're missing out on a lot of opportunity where the
introduced code would also be very beneficial. Beyond that we could
nicely test the new sort code in the spirit of test_rbtree.c and
friends. Maybe you want to give it a 2nd thought.

--
David Geier

--
John Naylor
Amazon Web Services

#17

David Geier

geidav.pg@gmail.com

about 2 months ago

In reply to: John Naylor (#16)

Re: tuple radix sort

Hi John!

On 15.11.2025 03:47, John Naylor wrote:

On Sat, Nov 15, 2025 at 1:05 AM David Geier <geidav.pg@gmail.com> wrote:

I understand that you want to make progress with the use case at hand
but I feel like we're missing out on a lot of opportunity where the
introduced code would also be very beneficial.

The patch is independently beneficial, but is also just a stepping
stone toward something larger, and I don't yet know exactly how it's
going to look. Premature abstractions are just going to get in the
way. I'd be open to hear proposals for possible wider application
after the dust settles, but that's not going to happen during the PG19
cycle.

That sounds like a good compromise. Let's see what else can profit from
the new sorting code once we've got the tuple sort in.

--
David Geier

#18

John Naylor

johncnaylorls@gmail.com

about 2 months ago

In reply to: John Naylor (#12)

6 attachment(s)

Re: tuple radix sort

I wrote:

Aside from that, this seems like a good place to settle down, so I'm
going to create a CF entry for this. I'll start more rigorous
performance testing in the near future.

Here's the first systematic test results, with scripts. Overall, I'm
very pleased. With extremely low cardinality, it's close enough to our
B&M quicksort that any difference (a hair slower or faster) can be
discarded as insignificant. It's massively faster with most other
inputs, so I'll just highlight the exceptions:

"ascending" - Our qsort runs a "presorted check" before every
partitioning step, and I hadn't done this for radix sort yet because I
wanted to see what the "natural" difference is. I'm inclined to put in
a single precheck at the beginning (people have come to expect that to
be there), but not one at every recursion because I don't think that's
useful. (Aside: that precheck at every recursion should be replaced by
something that detects ascending/descending runs at the very start,
but that's a separate thread)

"stagger" with multiplier = no. records / 2 - This seems to be a case
where the qsort's presorted check happens to get lucky. As I said
above, we should actually detect more sorted runs with something more
comprehensive.

"p5" - This is explicitly designed to favor the B&M qsort. The input
is 95% zeros, 2.5% negative numbers, and 2.5% positive numbers. The
first qsort pivot is pretty much guaranteed to be zero, and the first
partitioning step completes very quickly. Radix sort must do a lot
more work, but it not different than the amount of work it does with
other patterns -- it's much less sensitive to the input distribution
than qsort. In this case, there's a mix of negative and positive
bigints. That defeats common prefix detection, and the first iteration
deals into two piles: negative and non-negative. Then a few recursions
happen where there is only a single distinct byte, so no useful work
happens. I suppose I could try common prefix detection at every
recursion, but I don't think that's widely beneficial for integers.
Maybe the single-byte-plus comparator small qsort would help a little,
and I'm considering adding that anyway. In one sense this is the most
worrying, since there doesn't seem to be a widely-useful mitigation,
but in another sense it's the least worrying, since this case is
deliberately constructed to be at a disadvantage.

--
John Naylor
Amazon Web Services

Attachments:

BM-perf-test-rand-stag-sawtooth-20251120.shapplication/x-shellscript; name=BM-perf-test-rand-stag-sawtooth-20251120.shDownload

BM-perf-test-misc-20251120.shapplication/x-shellscript; name=BM-perf-test-misc-20251120.shDownload

v4-test-stagger.pngimage/png; name=v4-test-stagger.pngDownload

�PNG


IHDR]Te�~	pHYs��-z8 IDATx���{Xe����������������#��gC,K�R�R_��h�y��)�D�����RA-��4�4!�b������������
��;���uq]���������<��J!@DDDDr�oa��,�����������H,�����������H,�����������H,�����������H,�����������H,�����������H,�����������H,�����������H,�����������H,�������c���h��%���`eeooo�77n�0w�����B��D�����___���b��h��%T*N�>��s�B��!66�����'�v��X�d���ND������"�w���@RR�9�N�:��o�O>�$F��'N�g�����h����LQQ�j���AD�b��HD����*,Hp��C��R���K�B�R���S�?�O<��������.]����6ZW\\:w��VWWW��9�~�)T*���B��;����j�x��gp�������z=&O���{j�
6����Q\\l�5j|}}�m�6��[o���L{��j
]D�H�v�
;;;b���HKK{hlzz:���0v�X�������6m���1j�(�={G����3���c(�


��O\�~���G����#,,P�vm@xx8����Q�F!&&��M����q��mC<�����i/^��/b��EX�z5�|�MC�Z�FFF��]�-[�`���2�E"��#:~��h��� ��M����)����b����[o�e����qqqF1111���w�B���@|�������R��Y3@dff
!�h���h������n�*��'�
@|��GFq�-"%%E!��q����?�h/Q
7�g����=��38{�,bcc�b�
<����k������������[������h��\]]��[7@FFp��"�k���w8p�������b`�������[��'=z�0�����������~~~�h�=�E�L��-1a�|����y�&V�X��'Ob��y��3f`��4h���{�={F1999GGG��...�����QXXX.F��FN@vv6��'��F�1����n��i����7*���G/�#����}�66lh4�v���0a6n��_����i�&:����yIIIF1:�����:u�����u?_�Z����o�����5L;88:��u��kO�z�$m7����pD�H�u��g�}w��-�,;;����_������B ;;NNNF�7n�h��I���g�����&M����--����:t�l��53�xxx@���;SFDd*,�����������@����z�j9rG������c��i�xGGGDGG#&&�����]�p��Y\�r����j5,--����"++={����=&M����h$&&���_/w�o��!����d�\�r���Kaccc����G�=0a�|��WHNN��'�����c������������_ll�������J��j��Q#1r�H���`�f�akk+���������D�N��N�nnn�������b���B�����!����[�j�Z<��c������������'����H��?����B���=z�.{{{1m�4Crss�����c�=&j��%\\\���C���W
1���NNN��C"����1@D��p��]�T*��Vc�����{�����Qkzz:\]]
1�n������[���`�����|^^$�*���>>>���N�8�K�.a��
���O�nh�v�Z��_+V�������/�`����[�.
d�m "��."�W8�<�M���~�	yyy������#1i�$����@l!�-[����#99������CXXZ�li�M ��m>�.""""���"�Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt)�E��U����?�������5j�+V����j�����~�z@YYBCCQ�n]���#((���f�"""�n�M�����~��a��I�����={0s�L�8q��|���(((0����q��A���"55j����7�QuRm����"�\�C��j�
��7���{E���}��u�V������Z�������;QXX��6Q�Um��z����W_5L_�v
.\@�N�������<==1y�d�����ys��z{{���IIIf�"""��j��r�~�:������{�78m������������@��1��-�^��F�1��J��F��^�7ZoFF��������@YY�[FDDd���w7w3��TBa�F���S����/b���Fg���W_!44���prrBDD:t�())�Z�FBB�6mZi���H������ADDT�������6� ::�
���~jTp������#F�EEEP���������A�����S��QuVm����<26l���>k�L��� �\�B\�~�-BPP`���X�|9RRR������g��W^1eDDDD���]������W���{qM�<:������m����������. $$���������lll�t�RsoU#��O�9�:7Uw��{d��OQU������H,�����������H,�����������H,�����������H,�����P��
 ""zS�|�����4N����;')�&����""������+�+)-S�=D���DDDD
`�EDDD�]DDDD
`�EDDD�]DDDD
`�EDDD�]DDDD
`�EDDD�]DDDD
`�EDDD�]DDDD
`�EDDD�]DDDD
`�EDDD�]DDDD
`�EDDD��Z�n=�[Yy(-+�4N��{k�"m""c�����������.]���&O���'n�����>|j��G�FXXT*���0m�4l��	��������_soIT&�K$�j��`i�2i��i�H��#)��&�
m�n���������@��'��U��+++�����5k0t�P���   ~~~�����;;;���"++]�v��������"66vvv5j�����7�{��H�sI��z�FI��M����[�4������/~�[|`jY�wQMSm����"�\�C��j�
��7������={� ))	Z�Z�S�L��-[��[�"44����y����'���
P�N3o��GVnJJE�q�kY����-"�Y�M�U�^=������k��������._��:u�������5k���x@BB�7onX������R$%%�'���m�6��|���NO6��%�iQUQm��?�~�:������{�7FTT4���:�z�������T*h4���222Ua���HY����$;;[R\FF��_3I���c��=��d��OL����o��}&�SVPP ).55���_%'_�WVVV����]�]�u��)�����={������u����,���T����z����>'''����\����
��4��H(~�Zzg��m�%��������t�y�5]Rl��}M��K�5�-�4������_��-{��8�j��Jb����UO���h4�~����Fooo�T*\�x�0/66-[���������8����8h�Zxzz*�DDDT]U�3]yyy2d6l��g�}�h�V�EPPf���
6 ==+W�����������O���a���x��W�V���5D�LrZ6.\�����!��,eo���]������W����c��������G!88nnn�j��0a^~�e@HH���������b���K�.5���s;<��u?H���}<��eo���]�������]�vU�L�R!,,aaa2����z��W���[�b}8��V+{����jSt�y�KJG�)�n&��� >�D�6UE��#=QU�3]D&����Sc�=�f��g�z_��DDDU�."*-���+)6'�P��Q����DDDD
`�EDDD�^^�jG_P����J�jYZ�JS[�6���j����1�Z�qmw���F*�&"""^^$"""R�.""""��""""R�.""""��""""RG/����O�G�K��9�8�ieo����"�;���_��;n�S,����F��E""""��""""R//VC%�e��T���&�x��Q�6�t,�����b�{�KI�s�?���doQM������_*)5���K�������&z8]DDD�RS�`�'G%��5uc�ef�HODDD�]DDDD
`�EDDD�]DDDD
`�EDDD�]DDDD
`�EDDD��jWt}��������U������j�����~�z@YYBCCQ�n]���#((���f�"""���U�5f�=z-Z�(�,++�7oFAA��'88��"66���P��?~��������jUt�5
[�l���M�eYYY�������n����P���B��b��y��s'
h5�������{����L����������<y2���			h���!������HJJR��DDDT���g/8m������������@��1��-�^��F�1��T*h4��z�uddd **���GFF��
R��J��x�""#�L�?6��������}���sGZ���l�����K$����#22���nK�=t��j�4vv����������|��g���CA�I�'&���o�>XZ�L����@R\jj���}��<���N�B���&���|]R\YYY�z�6�_�s$�FGG���o����Z�)����
�{xx��w�Ahh(�-[kkk�������r�)�����_�r����,7��?����J�U��K^����bw�
�5��:��I��`��)8��&��K<��h���o��+��������(nd�$�����_�v&�>?8�*)�G�ps�6i���?$?{-=���i����b}}}��{K��?v�0���"�o���ei�
���@nq�qnnn&���q)��DI�m��E�g��4��+���YXX�|�����$`��/���C��<�q��V��C�(�������;�A�V�>������C@\\�Z-<=���y$�w����7��N&/�������V}�F��� �\�B\�~�-BPP`���X�|9RRR������g��W^1eDDDD���]�����o>|�'O�F���������o�>l��
������G��=����BBB___xxx���K�.5�&Q5Rm./ZZZ�eG�������.S�TCXX��-$""������""""��Xt)@������3��e���3����U�]�������!��g�}��������z�BQQ���������;�GDD`��uP�T���O�l�2��������������M """2;E./ZZZ"##g��A�.]�{��*++S"=���~�k���6l,--1b�h�Z����O�����������������G�q�����F����{f���4z�4Qu���Q�_R�o��AJ�%"""�2d+��
�����./++��-[���/W���������s�"77�/_��u��
 ''���Gzz:�z�)��U)�]������c��%x��'��u����{���k�8"""��L�[F"))�h^II	n�����&"""�Nd�H?n�88~~~���@AAN�<��
�y��r�'"""�d/��t����D�������j�F�n�`a��mQ� {�u��eL�4				(,,4Z���_r�"��]s����O?��V-�t���r�'"""�d/����1e�X[[�����*a_�
��NZ�������M"�1d/�����r�J����;�ED����+@~V��~���3�h�b�]����<���,�4�
U�q���Z�M��y������W}4ok����F)0����������IT��^EFF��/���y����d��}��y�"�$��@�'M���0
�CZ���S�M���������_�l��P;�7 �V���^&�mn���l���@{gY��Z<�����	�[d������M�����S>4in 9AZp����-[�
��~��W{>��uR�M�c�������������6m&N��[�t���@�����1y��++���u����qr4�w��`�.sjPpX������G�M��c�o��#��6T�9����b��~�������s���h����*�JKKaii��s�^t���|AAn��[[[8;;�����0������;��"��F�
[[[�����t�4
>��C,\�W�\�������N���D��� ))	�����+�R��	& ;;%%%0`�,Y��^z	�/_�s�=���0<����>}:"##�R����Ob���pvv�'�|�/������8}�����o������/���
-[���������w;rVs
J2��J���Ks7���H1�k����i�&���@�7�|��mCLL����g����c��M$''���3��cv��
x���0f�\�x������B\\�o����o1d�,Z�QQQ���_q��y���b��q�F�c��a�������g���}888 --
���(--���������o��tr7�J�%J�������nQ���{w����5kC���iS���v��m�����3��i�K�.{�1|��W������6m��{,��}���x���
w\�4i�4i����T*����o���x[d/���9����.)ZZZ�g��h��=��?���zJ�&��/���H���6��E�
�����e����?�9rK�.���waaa��/�c������c��Ex��7p��5�9.,�d��7o������3��������z��=���^t������x���
����p��9���������j��	��[���5�JFF��;wb�����{�����,��p��U0^^^x�u�=�����Y�Ta�J��^���}�BCC1m�4t��#F�������qc����E=���l����w����������m����s��{'����!����,--���	<x0V�^
����w����2e���������/#""��]���5f�����[���������[��M�\W���$�7��I����{{{b�������-Z`���:t(�z=T*z�����`XXX���^B���1w�\L�:���h��5JKKQ�^=|������-�#""��E�����Y�f�^�zpww7y��?�S�L��������?���o�FHH>�Z���G#,,*�
eee�6m6m����bt��
���������GDDTi���Q���c��5F��;��w�!,_���/7Z>|�p���U|����7M/Y�K�,)��K/���^�[������[��=���]�vEQQ�Is�3G�E�-*\fgg���T���`�����a <<Dll,RSS�V�1~�x������j6����{�b��
���1��3g<==g�\�F���-[CK������={�`�h�Z���a��)��e`���

���+�Z-�����;w������#""��K�����/^4����q��%����4��t�|�2�����f��!>>��������y{{���II�FDD�/P�PdgT���m��V[�<{�K�.��q#<<<�����~�	��
��OWE�z=4��<�Ng���r�J�FcX~_FF���*�i4}�������I����s�����4.�H��t��x�""#�$�K{Grl~~>jK�KKK���������?77WR��o�z����;��Jc����NTp|=LB��gDB+!����8#1v��7��w�J���'O"-��c:)=_�:0	�� IDAT��T�LQQJ���D�79 �����W�\�o�H{n]V��������%a��oH�����q�yyy�A��''_�WVV&)������_���b�Jb���G��Xi������}�����9�
pt�B��%��x��������;���1k�,�j%��{t����>����!\^RR�^_�2������Wn��������� ��������O>�$��[>����k�%����X��$��e�9�N��)8��&��j	��...���,>��^�=n����I�D�����k��od�*S'GG EZ~��~>?8�*)��� ����������	 ������$���3���+�;��l�,i������j��}��a��tI�*�
��yzz�Sb~��d ���8{{I�@����3.�Yy������T��I���|H�����0����7�%���28$-�S�N@�����:��M��N;;[@����~���Qd��N�����q��m���B�V+�����*�
/^D�&W���E��->>>���C�qqq�j����T����.�2�������HDDD�d��URR�)S�������C||<6l���>�H��Z�AAA�5k�z=����r�J�5
�7�t���HIIANNf���W^yE�������/���%K� --
���7��?00��/������T������������h�~����P\\777�k��F�2�7,$$���������lll�t�R������H���'N����KQ�~}�<GGG�o�����a?*KK���H����]�vU�L�R!,,aaa&i=�0���A,Zk���d�v���=z�����17o����rrr��-��.|���F�������'kD�)"""ph�|���&��*���3��M��,��&�����~���:t��_����h��	��z������������	&`���h��Z�n��'O�/^�^^^h��!��i���h����/_����{;v�@AA����
�~���2e��-D������c�����"��g�4h�K�.a��}�z�*lmm���_��z���������W_}�s������V��k����~�
QQQX�`��?777��7���:bbb�}�v��]�~�-���1g�\�p.\@qq1�f�����=���8�������F�n�L�=������a��)�j����/��(��.DDDTs����p�p��������[���}{����ps��v��z���K�*\���;1j�(h4��������#
�������y������e���t�[����C�n�����#,,�
B�^���o���DD��v�d"��+����]4P�E������w;;;���G�Z-f�����h�T*���?�����7+\���;88�YXX��T��]�������[�*�
�|�	�-[�>}� >>��������	DDD��p/�r+�F�;Yi=��������1"����3g��;�~�������B�.]*\����������4������R�-�E./ZZZ"##g��1�!�l�����z���o����3�>��s<���pvvFzz:�5kkkk�����?DII	
aaaKKK�sV�v�Z��z��z0�������g��a�����#F��V�����q��i����������+������U|��~��!88�.]���
>���[o��a���Y�fpvv��U����~111x�����{w��;��OGFF�5k���R4���Cz�������E������qcdgg������DFFB����������.A��M������k��o����������~��y�����p����suu��O�?~���n�T�]��
���C�u�V�}3�����l�������	DDD�/�`Q�o&[�5k�,��W�������0�i��r�'"""�Rd+��T�IDDD$��M��������������DDDTS�Vt��;������-Wz"""�*E�������"�xy����H�]�g�.w���L��O��OWvv6�^����"��Ui�?{q��-x��7������k-�������O��""""������u+N�<��������������;���?^�VIDDD�O�~�k�������z�)899-3f����n���^t��=vvv�_�~�>]���r�'"""�d/�RSS�o�>�������������O��a��u�V��"""�M������

���-5jd����?�������J�����i�0q��
�yzz����o��8t�,,�Wg�Z�
����}�6BBBp��a��j�=aaaP�T�������7�����P���,l��/��b�ec�����RSS�����]�������fi+U?�]UEVV����������={����V�V�)S�`��-,�����dd��UUdff"<<^^^�������������/�N�:���0�6k����fm/U/5�L�����m[l�������3��/@����t:���r����@TTT�����4��u���v���I��s���wYi\^Q��ufd����7���v�J�bc��3??�%�����dd��w�H����+)��9��y�J���K$�3Cb;Q���0		�%����Z	q��_������%���{WR�<yi��I�������))���%n{b�M�����c���+�Mb���IqY�Y���������]�7���efe�MB\^^~������%����I����/�����������7���G��Xi���9����}��rrrpT����Sc����p��x��w��#G���������bNNN���_���������� ��������O>�$��[>����k���srrR��������4.g�����Z- �������}_���7������2$��[�F�������lH��N'G��^�������J����H��tww������H(~lmm�4I�����������i����������<N�VK���n~M��R�	w�����������@n��������m+_g\
�����	��N����\���e���*���
o��ko�V���������N�:�*fq��I��dI����$����y�g=�qy1??G�1�WTT�Z
ooo�T*\�x��,66-[�4CK������E�AAAX�r%��~�:-Z��� h�Za��Y���HNN���+1j�(s7������Qt�t:�����m���#�����gO������>����pssC�v�0j�(�����n6U#5�OW���q���
�988`��]������j�q�������Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt)�E�Xt(++Chh(���{{{!33���"""�j�E���p<x���HMM�Z�������,"""�FXt��u+BCC���
�V�y��a���(,,4w������`� !!��77L{{{���IIIfmU�����@��C���U*4
�z�?^�����V�����.�V��J
�P�$���
�(1����JS[r�R;g�..�<��A����V����-���u4�K��^��}/��V-9��3,,T���K�_�^')���R��������,��������I���VWGr~�(-�<��Vr~+���U��*���k�%��k[H���K��PW����J�����P���l$�wu��FmYi\-�"���VmIau-����AZ~{��=�=*!�0w#����	������j�			h���!.##QQQfl)�������M�7�A"  @�]��0}�����Daa��������uU;M�vs����n���v����k��W��������|�r��� ''�g��+���Z��u"""���O���$''���������?�.]j�fQ5���^����0�����)DDDTM��"�8z�������jd������_�������5s����&o{51�E�����"�Xt)�E�Xt)�r��9s����"""����^�G����}��M����"##���;;;E����a���HKK���,-+8�����?���;�������+�n���M�6���~C�F���J{p��?~s��Ell,:v��h���Bl��G����3���-,_��F�f�`mm�h~��;�����+V���[����J%���&b�}_TT����[�F���N���g�2u�T���{(**����1}�t�r_�p����{�.��];��{W��_|������;O<�~��WEr�[�n�������������O������~�i���c��]�������X����9�[�F��-�������3������O�u��

Eqq�"���}{������ch��RRR�
�f=��L��M�6��������p�B�r�{��Z�F���K�?^��T	s?��&����E���EQQ�B�����N'~��wE�w��Q���C!DYY�h���8v��"�KKKE�z�DRR�a:,,LXYY�}���������G�6Lo��I<������/00P|��������{{{1k�,E�?���������z��+���#
�/^-Z��z�yyy����������
�����		�=��qo�������{�������N�m�V����{��������8��Y�S9|��N�:��]��iSt��QQQ�����B@@p��G�?�����#G���;77���pwwXXX��w�������/����5��S����k�>|8�_��[�n������d4j��0��wo9r�V���+d����k����s�����/���p��eYs?���?�8�;���oc��a������|�M?~\�����7�q
///�t�:uPXX��� ���"++K�����6c��;]�t���O���R@�PTTd4�����������+RSSaaa�v�lmmakk���{c��-�������O>�5k��1b�����_~��0Z������������;>���y�Z�Bdd$f���s�����w������z��]��e�P�^=�k�W�^�-w����a��������3g�v�Z�r@��=q������9��;��v��M�6a�����kF��#F $$QQQ����,�~���())1���s�
��������(--5,{��w��o�k���x��lm ��}���z��W��
�YYY"''G�\�����>,n��i�^�v��[��Q��(++���F��=*t:�8q�D��������$�B��+W�1c�<�j��
Ebb�a:%%�d������U������;w��������X�lY��3f��C��,wE����[�n��b����k�������,88X>�d�

���.>��Cq��e!���w������C�	��<���d1q�D1n�8q���r��\�"{�1�y�����7b������~B�����H�^�ZL�:U�:u��9����X�t�R���_���"E�{!���w�x����W_}%���k��a�g����w�1���z��E��G�4�B���_|��.�s�!��'�N��<��������b��!��O?iiiB!���#���/.]�d���������N�:���l���[�n�%�������;w����az���HOOGNN�Ir��5�{�F^^�a^��1�|���'O�4����L��8??]�t��c�PVV�n��!::�\���}ll,��k����G�_Q�������/����3�n�:���_~���+��u������1y�d�����


Bll�Ir?����M�p��9�5���w���QPP���4���z�*:v���u�����b���-[��s��(,,|��'N��9sp��U<}��A�f�9��v�����h�@rr�b�}E���w����7�E�pvv6��t:���K���Y�l������^n�\��>DFF"..o����?�~�����������������;����������j��b���"++K���������}��x��gL���}����zJt���\g��
kkkC����2��/�U�V�$��Y���1c��&M���/��A���$bbb�������0I���<|��!acc#�N�*


�B�\�R<�$�E%�>""Bh�Z*���DII�1b�x���M���:��?^4i�D<��s"55U!���g����())1I��S����g�eLRR�h���B��?�\4j��pF�Q�={V4i��0P&;;[<Xxzz���D��{���8�+0 �q_� ��[�
WWW�Y���B��];q���G�}��%1`�q��]���S�����r�r��<�������b��]�y%%%����M-Z�j�Z��T�y,��d������^~~~b�����AAA�Y�f���+&[w�N�Dff�h������[���5����i���������&���{w�e����I���	����.6n�h�� ���+���Edd�Ir!D������G���={V8::�>�@$$$���Q�n]���'�7o.�]�f�������X��cG��� \]]��/�(
M�{���b���F����#��m+Eff�x�����������������3In!�;v�Q��c���}{��S'q��!�yyyB���+V
"S��g���������2"���DFF����������+
����B��.+�'�q���-���S�8���O�N�QQQ���V�AAAF�z��e4Z{������N���+�e��b���&�]XX(���'�y�C���w�
ww�r�x�z���w�}'lmm
�Z��"����\�?.(v��?KLL����W�6y?��c�
!����m����w/��_RR"~��q��Y���9s�X�z�������������������qou�:u����M�?44T����	aee%bbb�B$$$��~�����T��{!��q��������O4h�������.5j$����B�����G����L�����"77W|��W���_<xP,Y�D�t:�{�n!��
64��1���o+++q��i�����b��b��!B�x���b���F������&@�����O�X����%K��F���q��>}Z����h^pp�X�f����>�@�;w�d�����o��"  �p�o��It������\��a:$lmm��_�H>��E�9��^���r��;w�SO=%z��ax�Uj�@rr����2L������k�s��������L��M+�yx�����7�4\�Tr�?8`@(�i��}/��;wC�/����u��a����
�Lrf��N�,^^^"==�(�������^\�p��s�W�������:��4_�(9HI�@��wo��cG���+���D�v��g�}f�1���;���[akk+����h^�K,�H^�u������7o����k��5K[�x�	��Gu��A��c��	t��1����}������M��$��w9�]�vb���������!��?��h���;v�����ppp�}��HKK^^^��W_-�W�������Q=l�_�~]4h�@����������<@dee�$��	D�����	��������HMM����
����m���$wrr�h���X�`�_��sqq1��<w��pww��������q��b������W4m�T�:uJ,]�T888�_~��(~��U��7�x���^q������� ����
�FS���!C����7q��z���Mr��:uJ���K,^�X���o����W/C�-���e-�����oD�z��^dV,�H~�U�VB��
�V��3gD��u����M�������IDAT�����zJ������-w��~�A�j��p�f����[�n�����j����We�4�W����K���St���P�|����}��&��W������������8y��a����I
!q��0S���2`@�AJ+V�b����i���Q�Fb��1"""B��z���/z��%:w�,�z�x��W���SM��Q�����.�����J��<r������5j�pww�|����=k�,q��a�����r����<�`�������[<���b��qb���F������
������|e�>##C�������J4n�X�j�J$''�$we�������E�����h����_��X�h�Ir��;�W�rRz���E���������K��| z��!����s�='/^,Z�n-�t�"��a�L����Xt���{�91c���fgg�>�@,\�P\�z�������DTT�(++�������p4���:d�����*4h �
&��Y#����������^���:�K��YYY���K&�K&��|NN����M~�I0`�N��0 � ��
777��999��������~�����������`�E��q���� ���������Q��t���~��l��wV/((0�����+�����~(���}U0 ���e_�N��t���:n��.�yyyb��y���^g���0�awz�{��{��'�����6w_.��Xt�Bii�9r��vc��/���Y�2~�x1i�$��+--M�����P�������D����/� ^{�5acc#�������E:�?l��s�@ZZ�"���������x��7��%Kz�T�R��?_��__\�xQ�|T-��"2���R1b����D��]+�G���������I��-�'O��Z������D]�q?�#E�$~\q��dPLv���+�AR����&�����Yl2gQ�K,�A��C6��e�$1cc�s��A����Gv�
�~�s*��_��7����l�����z}|}}����I�����?88��M��6��,����?� ����s�\���)++j���3�������P(����x�R��%��K������_qqq�����d���CCCHKKCmm-233��w�9�g2�������������%K`6��}�vIb?~K�,��������s������
;��b���?���p�k�N��W�^��������
6`�������^�GNN���o;6�c��Z-N�<	�Z���pDFF��?DCC�$�/\��Z��{���&�	O?�4rrr���_���:�z�ccc����$���-�>}������Guu5����0���+W������WK�JJJ�a�dgg���b�N������N�����f{��������������Q1�{�n�T*���sw��+�~�]�]�i~�s���Wm��r�����mmm�����G


b��}",,L��N���-�o/Iepp�-�]mb���:�T*'�����B��`@���n���]������M���-[�E��wg����~���B������Ip���J�R������;�m`�ED����>J�R:tH����V���%')������Y��e���"""B���h�j��i��j���e�Y�����HIIJ�r�������.A$#&]Dt�>�����+4����r��`@!�z�X�`�(,,�t��]��V�t4p�r��P*�NW���{ww�����?��C����F������!���&�� �K�y�{O�}���>���������t�������f�������v���F�
�[�nnl�nmm�[o��V��������������m�>s�����F�8���P\\���l��q�K=��6����b��5�X,��������������M�6I^0���L8p��!�����=��}$''��������(I|"Y�;�#���O?�$������J�,o���i4]'Wt�qqqbhhH����+V�� jjj��/�(���'����p������LFFFDww�[neI�L!���%"�!._���� w_��l6�c����&�	yyy(++���oP��.�m�����w���F<�������F��u�0::���F����-[�k�.X,,X��q���Z�'N����%�9�����k���������T��sO$�r��"�[�P(�����[�qqqhnn���;�y�f���/HHH@gg�Kb��p@zz�#�����;w�h��_�>�XQQ�����SO������*�
&�	mmm�s���>�������������>;;;w�Dzz:����w�^$''cttT�k r&]DD�x�������;v��'��G}$�>��B`���X�j�d�LII�;��3��������F����'O�V+���%{����0�Z����BBB�V��q�����;����w�}MMM�����������������������������?��������������:xxx����q���_}����\�i>''��m���k!����?�3g�`���S���X�������hV�QQQ����t�i&�=������"--
�B����v�L&TWW�h4"''�Q�9���0�����h&Il�H���?���P@bb"V�X���>�]������G@eee�����'r�r�tM������		�=vRR���$?��n��������Fxxx���b��9s�`��e��7����Ehh(�n���/�����_Gjj*>��Sdee�`0�b���{iv�J�,r�Ti9rz���X������}}}�
��!�L���f���6�L&������@jj*���d/�J���k�0o�<Yc��I�lw��)Gw��?�k�����f3���;vL��D2��."���U�PYY	��ooo�={��-6�;0�""���V+V�^���Z��4333a���R��{�n��3G��Dr��E"�Y����-U�D�DD4�1�"��.""""0�""""��.""""0�""""��.""""0�""""��.""""0�""""��.""""0�""""��.""""0�""""��.""""0�""�����k�������c�UUU����>}Z�v�����t^��"����[���b�����q#���g���:�f�������R""��JM������(**�O<�G}G��s�=���(���,-Z����_����QQQ���j�l6���0w�\l��
F��-�����\���CCCx��W�p�BDEE���������X����0�L.�"��W��hZ��������lFee%��[���|����B��Dqq���e4��K/���Baa!���&���������N���t��*--����@||<���������"dffJ2DDR�J9���D@dd$�-[___(�J���O�8�?�< $$qqqhmmu:��o�A~~>
T*222���_C�P�n���^��������."r����w|���	??�I�m6�$q_���cpp�����/#==^^��u�l����%�������n���'���CCC�|���~����R���
���;�xgg'
�-�&"�?�mQ������q��5��������s�={���;������y����@���1���S��""Wb�ED�%44���HHH@JJ
�������r�J��fdeeM��Z���������>�!!!N����{���?��������_!&&F��ED$5��x_��Hf������g�V��})DD�T��."r;~�#���I�X�HDn�ls=���+]DDDD2`�EDDD$&]DDDD2`�EDDD$&]DDDD2`�EDDD$&]DDDD2`�EDDD$&]DDDD2`�EDDD$&]DDDD2`�EDDD$&]DDDD2`�EDDD$&]DDDD2`�EDDD$&]DDDD2`�EDDD$/���"""�{����7�M�IEND�B`�

v4-test-sawtooth.pngimage/png; name=v4-test-sawtooth.pngDownload

�PNG


IHDR]Te�~	pHYs��-z8 IDATx���yxSe���w��I�^�E�T,�,bE6A� ��~�cEP��:2l2���OqX�q����q�ADQ�[J�^Je�@K[JZ���j��B���X>���u�s�<�s����y�b1�@DDDD�����������@E��������]""""��KDDD�Tt����x��.P�%"""�*�DDDD<@E��������]""""��KDDD�Tt����x��.P�%"""�*�DDDD<@E��������]"���������t����
b�����d��a����_mw�-��k��a���'>>�-Z�w}vv6��7�x��sl���!C���qc���h��)w�q�W���6��������#G��b������x�%K�`�XHOO�*"���K�>�'�|���8V�\��_~����9t�={�$99��}x��'x�����vnn.^^^�={�-�_���#�����|���r�
7�X�������[9~�8���X�b�'O�����}��L�0��r����������s�{\E��|j�"�U��g��L�:�I�&9�w����o��n���a�bcc����[���G��m�[�v�g��U�RS���1i�$~�a/^��bq������������q�]w��w���Q^^��b����o�z������^T�_���*"�=]"�����Ze]PP�v���g�u.s8<��DEEa��h��	�����c,\�///�����Y�n��q��Uj���n�w�����[�2{�l,�|�
�v�b��A���������^��/�HEE�����<}�Q�4i���/M�4��G����L�:��o����@��������a��e\{����vbbb����;V���U��K�.xyyq��)�����X,v��Ui��[o���_f��mX,^y��k,�?�<�7&  �=z���r���������3o��J���o���^�z9�����\w�uF�^�������>��ba��%�������������h��5~~~DGG�p��Jy9������b����g���DDD�M7�Ti�\���s�q�:u*AAA<���U^��3�P�^=���/8�"r��\�n�����5���?{���`�����!!!���k��_|����];���n2�0>l���+���8q���ys�m���e�����x�b���SF�z���{�8u��QZZj?~�3�u�f|��W�����^{����1�{�9�0����������qc���>2:d�Y�����0�u�f�a8c��i`dff����aF��m�V�Zw�u��e�c�������n�l6�����v�G�e�k���<77����6�7on�X�����g�5�5kf�a�=�h���a����g�v�mF��S�N�����S�����1�}�Y#%%��������(�S�N��;V��x��/����w�y��W_}�8p���s�Nc��aFhh�q��I�0#++��:��M3���+����HKK3|||����v�2�o�n8����0:t�`�a9r��w�y�0��m-Z�0-Zd'N�0:v�h\����S�a���
�8p���q;v�b9����0���*���6=f"R�t]"��������
///0"""�a���/6��=[%6##����^{��M�6m�q��9�w���x��W��b�8q�0�X�n���3�0���g<��S��L�2����2:T)�<`%%%���
�x���*�,X���-[��a�f�2������m[�a�����p.��y�����v�>���b�����a��~h�l��x��G��{���K��G1�_�aw�y���sg���
����J�f��aX,������=z���S�V��:999FZZZ�e)))`�Y��RP)����6|||�S�N9����.��{���R[/��b�m3���P�eT3��v�2c����e_|��_���1�*�kzQ�4m����Ws������q��w�m�6F�Mttt�3�V+s���C�4m�������?8������g�}?���s�N��������i�~�):t�I�&��)99������j��w������t����rK�����_p�;w������yXXX�����}�������
6��gOn��f���9s�m��9�4�����+=�0rss������P��O����L�n���������O�>���T�����iii4o�����;�q�u������CBB*m�������Mn��V/^�\�l�2��o��7�xQm��Tt���&M�����h�"222X�~=%%%�����_�����g���|��W����)S�Tjg����������
����i����z+7n���&����"��BCC������r~����y����U���
�������Kg������^�zq����{�nN�8��M����r3f���4h�����}�L�x���y���2d6l`���]����_���3g���s!!!.����j��a�=u1�������������+W������������Heee�����u���c������QRR���{����y���:t(��7'""�s��Uz�-����f���?g��
��{��������e��,�BBB��}�g{dBBB�_����yLM�i/������w/��z+M�4!::��>��O?��=zX������~������woXyy9�?�<����<��$&&r�5�azOY�z�p8U������a�=u1���n�4i��%K�������`�������NE��E�4i111|���U��AJJ
���X�V���w�����d�g<��f�g��|��l���[o�~<�l���,_�����4�����[�n>|��V�����	#::�9�����������k�b
0���TV�\Itt4����m������q������|��O?������������/��"����A~~~����o�m�Om��!==��Y���.��t��{��_����0f�����d����>���.��"��K��=��\}����������~�z���+>�����.>��3�O�@��m	

��7�����c��2�y
������7���|���ddd8�t5m���-[2k�,����<.����n�JJJ
����3���pF����[IOOg��Y|����������n���={��s��v�Z���;V�Z��i�0`�:ur�
�z�jv��uI�Khh(s���t���={�������]aaaddd�u�V:tI}��R	k���������^c��5,X���o���_����/�����B�n�X�j;v����C<��sX�V�����m������#)++c��$''��_p�����q�K��_�������#�<������x��G.��"���>�_��������I������

5|||�F�qqq�?���J�����uk�������1�{�=���sF����������6�0���c���^[���G�6c��������[FPP�n�[��������� ����h���1o��J����3����������4�|��Jg]�>}�����a����}���g/:�R[?��������{�����{M�6���??�����6�5kf�'Ov��7w��J��?�YYY��a���n�s�=Fdd����gDFF��
36o�\)n��}F��=
�q�����>k���O<�����o�3���1�X�t�m���QQQ�_��Wc���F�6m*��/�^t�mf�S��^����\�^���;�G1e�����M���e���C����h�"F�Y���^P�%""NG�����7�����[M��HD\zA��DD���g����o����b���*�Dj��t�������t����x��.P�%"""�*�DDDD<@E��������]""""��KDDD�Tt����x��.P�%"""�*�DDDD<@E��������]""""P����o�QiyFF��~;111�j������***HLL�A����0t�PrsskiDDD�.�SEWBB�6m�m��U�=���t����{��s�N�o�����
����Y�~=���deea�Z7n\-l�����Uu��5jK�.%00�����4n��6l6=z� 55�e�����HDDv������r�J���=�
"""R7���+66����������Oyy9���l���~���g�Z�n�������������[DDD�>�������5�^�z��{�QTT�w����#p8�l6g��b�f��p8*������-[�����EEE��BDD�v�l6���[���M�"����2������c?~<EEE�������<�>??�R����2MN\\\������].""RW$%%�v~������dff�k�.����b����{��������M�����iii��v���j��"""R�\EW�f�

e������f�:v����#�3gG�����)S�0b��Vk-�\DDD��:St���c����ll���	&`��7n~~~�]���M�V�8s����`��1<��;I`` �g���M�:�b�Q�����1]""R������Pg�t����\�Tt����x��.P�%"""�*�DDDD<@E��������]""""pE��ZDD�m����"�q�:��=�#�L���+���|�qF����izQDDD��\��`�x��7���:u*���q�=�PPP@EE���4h�����Jnnn-�^DDD��:Ut%$$�i�&��m[e�[o�������c�����>������~�zRSS����j�2n��Z�����1]�F�"66��}�VY7�|^x�"""X�r�s��e�HLLt��>}:�]w�/�����[ """uU���[�������8r��_=QQQ$$$�p8��g�[�v�GGGS^^NFF���."""u[���u>g������o���/�����"
�����={6������X,�l6gQ�����l�Rm���$�o���Te;{ka��@//��oV������k"���#�����vE]!!!xyy��c�a�����L�0�I�&1{�l������eee8+�N\\\������].""��S�z��8_?��d��_�����5kF���w�v.\�:5�x>>>>4o���'����0|}�m��M�����������DEE�JEDD���"�.�1c�0c����8s����:w�y'#G�d��9=z����L���#�Z���m�#�L�U^^��f�f��q�F&L���fs^������C��l����Z�n��I����l���t�����H�={v-o�����%u��.ooo��;w��V��7�|�7�|��:�����3�9s��{)"""W�:��KDDD�r��KDDD�Tt����x@�9�KDDjIz*�2�\l�L����=�,���Ks�,H1[�����\�4�("""�*�DDDD<@E��������]""""��KDDD��\��`�x��7�3p�@bbb��+**HLL�A����0t�PrsuZ������:Ut%$$�i�&��m{�����o����������~�zRSS����j�2n�8�XDDD�u��5jK�.%00�����c��i��9���e�����HDDv������r�J���=�s�����ccc/�>>>��S���I�J����C��������)//'##��r���[��]�=�3�5�=�_�SE��,Z�___|�A6o�\i����f�9�[,l6��R\NN[�l�����$7�\D����,�Vp�e\V��Y�h�����n2��o���<K��o��Iq�s�gu��R|M�9r�}W\������������|����� ??�������Qe�2<<����*�OJJ�v����,I��a�7�����pXn.��n�5�?c#�������{>�����Q�5kF���w�v.\�+��Z�f
t�����b���i������i����4�w���ZZZv�����Z������u�@��y���9y�$���dff�b�
Z�lIff&
4`�����3��G�RPP��)S1bV����."""uD�)������l�l66n���	��l�.�0f�L�����$00���g{��"""re�3������;w�Tl�=��w����ba���U.%!"""RS���.����.P�%"""�u��.�Zu�d7��Fw�FD.C*�DDj��E�����]������hzQDDD�Tt����x��.P�%"""�*�DDDD<@���H��w
>Yj.�kh���=���],`������K<�����G�a���|��W���0p�@������<��3���������O-ZDhhh�n���
����)�b�#Tt���������6m�D��m��{��h����'--���df��
����Y�~=���deea�Z7n\-l��o\q9\?��j��""W��t�5���X���[e��������j�R�~}�����}�X�l���DDD0}�t���:/^������C�7���p �u\���������e�N�����=���z����PZZ�������{���u��N5DGGS^^NFF�z-"""W�:������bz�!���F�����f�9c,6�
��Q��999l����v�����s�����������aK
��	:������cG�j4��ii2���?������}���f".++�m5<�a����d�7�|��y���.3�&�***X����������;r�)����]QE���'2d:t���^s. ??�������A```�����W�����j��\q��']�����%IO�'/v�����j8��d��\��w�Y��^\���7n\�c�����p�
����gl��^^^���^�����5kF���w�v.\�:5�x!�N��w���s�=��7��~��i����4����4�v;QQQ��[�k���+!!��������WY7r�H������G)((`��)�1��Z+}����]����l6l67nd��	�l6��Gvv6�V�b��Y���F���3f��c��DFF�������HM�3�ty{{s�����7���,3g�d���n�����\����.����.P�%"""�*�DDDD<@E��������EX�>�k.vv���G""�#]"uI^6�m.��������������hO�HM3*��Y��GD�J��K�&�d�=1�b�������eB?�EDDD<��],   �7�x�����lLHH
6����s��������D4h@HHC�%77���@DDD��:Ut%$$�i�&��m[����`���HII���>b�����?�������JVVV��q�4�#"""5�N�5j�(bcc���o������^�����v;v���'�t�R���Y�l���DDD0}�t���:/^���_-m����(/��ls���B�M5�?�,��m.6dx��l~�uu������vyzz:~~~DFF:�����{��3��g�[�v�����������J�ED�7�8���@�����\�z�\l@]u�G�����?��OK�.����9��-[z"=���Vi���?��������\�����l�Rm���$��]~[l�\�����{75��i�gf&�����8 �F����O������������[L�������5�������D����1��k4�s�������b[
�}��}t7��7��}��F����������
�������7��8�/-��D����	3���M��7-�n/�.\����������9s&C�a���������.@~~~�eyyyV�������\����p��������T�r��d�s�m���MM�o
3�ss�����������8�:�|��.Iz*,5��cG:�������s�w�y'x�����D(p��q����pXn.��n�5�?c#����2stt4��L�s�/���j�g�����\~���K�����]����X,���;�����D@D IDAT��9��m�:���-::�����������J�v���/���4����4�v;QQQ������}�^���&''������W/�������/��ng���L�<���s��)�����39r$s����;� 88�)S�0b��������A������ �^O���q�8[�!2�����Wz�GW�]�
b���x{{���b��Y�n�~�-m����<�����W���R6m���O?��1c�;w.���#>>���c��?~<��?c��!33��;RZZ�]w����&�p��_��;M�5S�U�*
����81rQ�^t�=�������|���.��;GRRR���/���7���;����PV�ZU�:�����3�9sf��Gj����G\��G��#=�#�L/�4���!C�x"�\���;�t����.��]��g����]_QQ���K������ """r�p[�5m�4��=Kzz:.d��1\u�U�n�:N�:���_���""""��]�����5�W^y�������O�>�~��9r�'"""R���:]���dddTZVVV�����EDDD��H?v�X
Dll,����;w���d�7o����&�Pa�V4~� ��'z$"R�������d��udeea������O�>���"R���gr\�����V�1k�>�k6��H-p{������O>��={(.�|s�+V��E�+Qh1|7�\l�E��f�[3��?\�y� dp���+�����S���Kf����O�t�����
�d8��u�O�{�����b���
���l�&���������:�"b������|&N�H@@��S�������w���}���G"������u�o���m1��a2�DE�4�]�����;����*{���:��}h.v��]""R���H���$^z�%�����������kHFF��~;111�j���������<x0!!!4l����{�0��_DDD����k}���/�J��/_����=������y��x��j�y����~���{����s����O^}�U�}�Y&++���<z��M��-����X�DDD��"�:~��g�<��I�&��sg���L����oo����������]�vDGG���ODD����]�v��W�������������l������T
Y�z53f��n���qc&N����K=�7�j���L�0��C���}{�t���_��A�h��=����������woZ�hA�f�?~�s�j��-�����U+���j�|�I���6l����������S\\�SO=���^KLL��s������;�q�2�������]�����w��W�v����Q#�����L^������������r�����a���#==???"##��111�������+��8�^b�!"r�������'���.)))����>�?��RRR8q��W�����s��dff�}�v�/_�G}�SO=EBB���g�����������������/���^^~�e�l���m���w/�;~�a��_0z�h�������O/���K���r��	���Oyy96l`��1���ww~�d�^�x���(**��;�`�������6��R���?���{D.R�8����������e�o����;E5j�����Z�=
�G}���z������3�Y�f����/m��!66�w�������J�V�X���>�����O>I�V�(++�b���w�y�����k�����3��������7��v]�ve���\����eee�����c�2~�x������'!!�q�����_)>//�����������������4OM���}6n��#dO���~�4ff��������n6G.}M����n�����{�`��Z%%�XM|;v��)�����O���3�g2�"$''s���e\�����d������]���������>x��&����D��C����\����a3���G��a����uv|�M����GH�q���|jr��ef��D\EE������t�&�����_q�M~6l������&����M}o���6]F�i?�����w����?\����>c����9s///����-�����h�"^~�e~���s��~�a^|��*w�����y���x���+�����h���%m�����M���W_q�5�8��s�N7v�5h������]�3f����^�x�	^{�5,����U�V�����]�����WeyRRR��M1�������`���<�S�e&n���|U'a��&{��0��@��?���
o���	���'��sm���z&��s�`��&���LS���_����P��*���k��)M�����8�:,(8�5�dll��+�����C1CCC��\�Y�V�c<�1j�X��	�QQQDu7�I"�
	�L5��Gs����a��&			�����7?�!�u����\tt4��L�_����~8P���&{����}OkR��pv�������~�]�rrr4h+W�d�����aE?	d����1��������eKy��J�4i����'3t��j�X,f~.�������Dz������i�����l����#Gz��j������X����������5k���cG�v;C�e���,^��S�N1w�\f�����Q�	L�i�"��W��s������m��jx:����������>?��&�p���b�/����������y�{RR)))���RRRB���y���i��=M�6�~�������������#p��a����0�z���n�:���������H?�^tEGG������k9r�<���t���������X�v-�<�/���ap���3�|���G||<�7�n�3~�x���~��ODD����>�q����.�G�Og$��������5k�?�8m�����#<����B�~�������������}�2m�4���?���O�N�(//�a������&x��k���m���Bf���4l���M�z"=7�t�7o�v]hh(�V�����O��B�q�5�WDD�rd�����z��J�:�'?�}0g����Si���#��2dH�9�,�|��+���+��R%�������.m���E����;wn������w�����b��8������a�z�q��Q5���,��\�3} ������Hn�N��5kX�x1m��q.�:u*QQQ����;����rs3G�����%s{�����O{u88p�F�t�c�2x����z�����&22��g����_2|�p��%"""R�<r���}�X�n�&  ���'��Cw��lx��E�
Fvv6AAAu��yO*+�3�.4�Q����n/���������E�8{�,�~�-��m���s��H����E��9<[�������Gr���}�_y�N�8���{���<x0s��!;;���EDDD.n/����+�N�J�&M��������+twz�TA������y���|�rN�>}������������C[�>���
*��������d�����^DDD.y��?�}�Hh2���M�4���;�x������t=�������{��l�����DZ�j����K��
��^DDD~�F�����4h7�x#�:u"99��~��Y�l�������sg�n�
?��:==�����|�r��;��q�h��9M�4a���TT����5k���MBBB9rd�u5��E�UW]��x����0a��{/�|�	���\5k���S��� ,,�{�����vefgg3x�`BBBh��!�=���+�����&ooo����w�y��[�2z�hz�!�l���3��y3�}��
��G�����_����{//�����s>6l�P���;v� --�={���������=9�a���:��'b��Y�b����z���X�z5;v�����TTT�����@pp0YYY�����G�x�b��ODDD������S���
c����<y��]�r��a7n��8p�@�m�\��Q�Fa�����?���v�7n^^^4n����[s��a�l�Gnx�p�B�������wf����!C0`�v���5�����/�@D�ww^�r%����^�����v;v���'�t�R���=�7�^xx�������>}����I���u+������N~��������@hh�s������n����Z�v-.�b���;�����2s�L��m������������8r��_=QQQ$$$�p8HOO������Hg|LL���&"""�w��I��srr�_�>S�Ne���|���l��������m�Q�F�8q�����;v��=��#���������}�vz���an;P����9CEE�~�-_~�%��m#--���'�p8��l�����q8����������/�=
����w��oO���9u�111PPP����NYY���xyy���Mn�W�<x0��_q88��U�<�-n�^4h������|����u����oi���������=��s
q��	L�4��z��{�'yyyVi'''�-[�T�#))���.'Ob�����B�Mn���;9\�2����&����!<�\�'�|BY�������\g2QQv?�q'N� 95�u ���i�� |��Y\���o���������5�fNN�&�d����|Z��Ck��KJ����8v��S���9?�`qg��d�EHNN�Da������q��6sss	���+))a���o}� -M�7���C��k���}���f".7/�Pq�x<n^�Q�qa����d�yyy�4pWXX��&��]f&-L�UTT��������ds�����k"����&�M�6QP��e\��n4�f��|�C\������*�`����}����8���9p�������;<��S>�������o��������)))�w�}����i����?����bbb(//g��!�;�S�N��v^������Gs��W����]w���s�HJJ����]|||h��y�]��a���Ktt4�������U+RSSi��]�v��������<))������\����(1�����}h��U�<SM�8��i*v���e�D4N�s��v;T��oX�F���bb�6�	��'�]�S�Nt
6�?'�j�W�}u��jf���B�V?�p��M�6�iW����']����}\���RSM�x��w.��V���?����X,`����(�����$L\[24$�2�d�=���u`Z8,7�$!!!���������}�FHq��e~r'::��F&�/��b�a?nnJ�g�����u`�
>4�$�A������������Wj��a�S�:t --��������,YRi�����?~�eU�V�y��_�G��Vt
>�x�e��9�����OW�Y�t)7�h�F�4c��a��t��
///^�u���N�v;C�e���,^��S�N1w�\f����~�������K8����<y2
6$::�y=�_���k������z�#G���eK���8p �&M`��y�����qc�v;��������X�DDD��s[��SA���0k��j��7���7���.44�V���{��wk�5����?M+��'�EDDDj����i����kwtt�������\V�Vt���/M/����x��EDDD<�mE��)S�\�����KDDD�W�u�DDDDj�G������w�}GI���������5n�����K���O�F����|K���{�.]���""""���E��e�HNN�u���N%"""r�2���t�5����U����\������H�~�����	��.!!��-[��U8�����w�^���3f7n�j�2z�hf����b�x�DDD�nr{�5e����i��I�c�����������o�����nzBB���dee���G���i��%���������Mn/��������	tw*��;��i�x����<y2����^�����v;v���'�t�R]"""Rc�~L����Y�l�eq\W||<S�N�I�&�e�������\����k��"""R���Z�~=�����E�J��������-Z���/>�`�����Vi���?��c}�������<�O<�D�������������/���_VY@~~~�eyyy�N�����e��js$%%Uz���I��[aa!�&�v������q>%E0�fNN��b?���*�.����r���EEE��\��8q���$��@���	3w��Y\���o���������5�fNN�f:Z���|Z��������c5�	p��1����s~>�&����!�T�������"�qA'���m���jwWRR�z�c���A��dfN�9t��������9l&�r��5�"l������.�������6���i�:����OM�}��LZ�����0��!==������/-��D�i��M�6m��d����)���^��|�C\�����������r����5k(((�k�����M�-���/�X,����V�Z���J�v���N\\\��IIIU��78��o�����+��o����U�WQX�L5�����b^&JD�$l0��n�CE���F�����lx\O\���?�����M�������Uc_���Z����P��*\o��Mi��d����I�aA�A���;@ll,�q��
KM5Ihh(���8��j~��'�7�B-�8�"**���&�/I����VIhH�e��=z������pXn�IBBB��.������}�FHq��e~r'::��F&�/��b�aaaa�1SM����Z�L�����$8(�T�����~���q�����������dff�����+h��%���4k���C�2y�d�����;�Q�F�v�EDD��"�.W���Gii)�7��od��Q�����-�C�>�x9��������8��j��Z������m��%"""�*�DDDD<@E��������]""""��KDDD�Tt����x��.P�%"""�*�DDDD<@E�����\1E��#G8p ���'""�Gy���b���<x0!!!4l����{�0j��"""R�\1E�<@�f�8~�8iii$''3{�l&++���>��#/^\�]�:���;�)�G�&..��J���������������W�����n�n�3q�D�.]J|||mw[DDD��+fO�C=D���(--e���������t������t�����{��Z������5WL�����b|�A���F�����f�U������p�ZEDD���b�N�<��!C���������W����#00���srr��eK�m'%%Uz���I��Saa!�&��s�N���))b��6srr�0��'�PVau�<5��L�/**���:���$�&���>M����g��z4�}�v��u�������6srr7��j�_��r�Z��_RR���'��c���b.������;�� S-Brr2'
�\����[L����K��u\II	�M�}��ii2�aXL�:t�]����{�6q�yy��j6o�L^�Q�qa����d�yyy�4pWXX��&��]f&-L�UTT��������ds�����k"����&�M�6QP��e\��n4�f��|�C\�������s�]�N��w��<��#�?��<::������i������k��J������UY���Tu�W�C�������������>�j�*
`��&	2M�0�L���I�`.��n��\�q�5"���m��&���   8k��N�:�)�D��,X`��_5�����U����Z�����7m���]M�O��t������B`������T��������Z����x2|c.�b���������n2��D(pd�j�G�`��:0-��j������8�c��R\�yy������&����}��uXXXp�T�={��V��m���&	
6�?00���uri����������J?~�:���'�p8���d����5���*"""u�Qtegg�j�*f����fs>:w���y�(--�q���x���5�������-"""u�1�X�~�^�444�U�Vy�O"""re�"�t�����6]""""��KDDD�Tt����x��.P�]�g IDAT�%"""�*�DDDD<@E��������]""""��KDDD�Tt$&&��ABBB:t(�����-�CTt���g�����������je��q��-�CTt��-#11����v;��Og����v�DDD��P�������[;�GGGS^^NFFF��KDDD��������p`����-6�
��q���BX#�q���k�M��D`�2��+|���b.��o>�wx���1�@P����!�Sn�M�c�����7��+����.�`��C���o6��?sq>>�b����	5�@�@��}��0���4�?�!��h�/�W����&��m��{�ps��2��|�����g>X#�����5��?���~���������_�b�Q���m����]���������> ������t!jR��t#�(CJKrM�}���O�6l��F�6L,�}f���mm���Y����VV���K��N�����/��Rh?��k������8�s�;�����}P__�6m� ++vvv�~���HJJ���2�c�3r�Hm��?Y����m��]�NOO'�BA������qqq���;~K������?����g����%�g�j��0i�$l��7n�@EEV�\��_m�����1�c��5]f�����8::����5j6l����b�1��3���{��������0�c���c�1�d���"����B�h��[�g�v������?{�������?�3"��.�c�1����E�c�1p��c�1&N�c�1�d�Ic�1��Z���{�i�$Z���8|������������o���?�@������_x����;�={� ??���h���l�����;v��o�������d�
�������aii	}�f�pW$'O���U����	wwwYc���"&&������3���d�_PP��7���c�����o/k|���w������i


����bk{����0f�899����/�f2���t�$$$k��E]]f����K�����pqqAyy9���������r�����=z4n���U�V��^�o��&K���Z>G�E~~>���p��AYb���4��bccamm���w�?..��M������#[\�s�<<<���?"--
NNNX�x1���+K�+W�`��������'��_?��qC����^����o����h���b�������-�����iWWW6�����fO���?����dooOuuuDD���M
��������������GDDj����������nhh�.]����W����pj��<xP�����>��1C����&[[[������G_|���}��!����X�B���
�c����a�6m�i��	�/^$������j��{{{�������S����3%�K|�k��OLL$�J%l9r� y���=�Z�f
���Rvv�V��F���rHMME�����uk���������$K���$����^y��m[�����;w.�?.i���J��}J�����w�}[�n���c���,i���T8::
��&M���7QPP i��rrr`ii)l������������M�$�_YY)�����V����#���q��eIc?��mmmq��	a����z������g����'%�y_������9kkka[OO����������QVV&i|m����-[��s�b��a�p����Yc�t����uuuc��uCii�,�===�������a``���`�����600@��}�m�6����'c����0ajjj$�����q|]]]����6�*�
���c������l�2dddH���G�!�h�"\�~}��t�ggg���J[�R���>�H�

q��a���c������#F���D����_����===�
6 663f�����1s�L$%%���C����?����2�%%%������K�����hhh��|�r��5����x��d���I���Z�)S��g�}&l���QEE�$��?;v�n��%lo������5�yj������1���H
��N�>�h777��c�(���jkk���?�9s�<����]�rE��q��h����)**�v��I%%%TTTD�����G5�w��e��k�����/(( ccc��};yzz�Z��D�&M-~MM
����6o�L�/_&""�JE~~~�����������<���Z�`��7�JJJ}���k��{w�11������a�:u����}]]m���BBB(55��}���333i��
����S]]���=��?�H�-��������5q�D1b�����~UUU���@[�l5>��Q�(22���I9�DD�O�&�BAC�!���3�_|A���DD��{����]�tI���/����RSS===����1b�|���������f����(,,DEE�(�V�XTWWc������/~��W����/Zq��;w0l�0�8qj�����3g����gff������{��M:t�|�
BCC����j�?a�Q���{ccc�������Buu�F���?233E�������hddd`����}�T*��� ??_�����pww���1z��
CC�F�<�}��{�������O��x������������+����,�}s�����j������o�II�Ra��=ppp@����}
��/���>����(,,l�5���>8p�����E������#G�������7�Z
777xxxH^Z�C�i_K5y�d��s'������i���Z9�-[���!CD;���/�����
��X:""���o/7��j;v,EEE�{��4g�a{����h?sss�z�*�����T*)..N���+NHH�:PHH�����L���'�}\\��������������i����d�Qb?�h>;;�z��I^^^���GDDg��%�RI�����		��+W>v��W����9����,--�;rO������gO�Q����@=z��+W�H~�7�p[���I
R_�OjR���/���T��T[[K���������/]�D���*���c!!!�h_���y���G����bcc����z���_��w�!j���,��I�9�����P�:u*
8�BCCe�N���*���S�^����k�{���TZZJ����7������lll������Cw��%�J���{�
�.����Spp0���Sx����F�V�"�RI%6���%&&j��={����(22�����������i���dooO��_-���>33��������LMMi��qT[[+J���z�V�Z�1VRRB ???*--��S�R�v�����LMM���#��&"�;w�F��o�><x0
:�>LDD�����m[��i�����~�a��i���j�9s&Y[[Sqq���}@@}����vCC)
���}R\��o�&==�F��_|A
�����$��������_c���[�[{��I��cGz����O�>!J���Z9r$
2D(
)//'�R���T��4���#d`` <j}��I"�']�r��I ���r�
������[E�#�;w.�����/�H*������zJII��g��;44��n�Jt�v�C��t�R��q#���Ppp0��6j===:t����/^L~~~��O�>M���������������S�o�bi���������<x����]O���diiI�|�	���Qbb"����?..��u�F�����w�������O�~�zR(��~�{�����pgFEEE��];JKK�ohh�W_}��$��###i��uc666��	���8p m�����������R�3���OKK�Q�Fi���m�4�>L������!Zl��x�������p�GGG����F��Ts�(			d``@���,�X�q��M��<���������O/�����\
999���$lgffR���)##�***DY����Jc[�����������G�r���
$s���sOD���A����;�


��={��YYY���}�h~��5dmmM����Q�N����O��'5�#
����.���5���D�H|||����*++I�V���3���K�G���+������������e��I�Vqq1999�����u����i���Z9�^xA����~��'���{�N�'N�B�h�����sdgg'Jl��8����BBB�������p�����������K���dhhH�v����|����)S�4������P���5�7o�$sss�����������D�?�|<x0��?��v�J>>>���G�G�&GG�F�W@@}��W�����!sssZ�f�c;�LLL����T*)33��������������"GGG������T��aRJJ���QQQ4k����K����^z����i��)��m�&���={��%\�
��OMM%oooZ�n�?���y{{���3gH�TJ��5G||<u��E���*N�������_�~�P(4������N���TTT$����[G���'�R��[���������'<���?
>��}����5���J^4����t�����T*�������,J������4z�h����_�U�����(	5�a��T4�wi��Ii��M���K�G�&;;;����9s�P\\UUU��;w����<<<�����L�B!!!��~b���D�I����n�;4�?N���4}�tR*�/��W�XA�����`���[��l)����Y��:������1ch��y����s������2I���4����4e�j��YYYQ�~�(''G��O*�/,,�>����_$333���D�M�l -���lRz�������������KI/��2u������h��u���D��
�+W����E����8�b�����e�������I���+��w��EIII�V�)((��w��d7���			Bqv^^������i��m4p�@rtt
�����6HU4���/++�K�.�ZK������
���_D����0���ym7H��A]�v��[QQA��=���u��M��p�����I��?���)H���2�ZM��O���y��d������h��UVV���
m���H����5��s��*�r���6�U���U����i���4u�TJOOor�G��.u����k�k����^k����mq���h��i�rs���	&h�\�|�MZ�p�(������]�6j@�;5H5����4x�`;v,M�:�:t�@��?�C555��?j���0���/K����w����f��E�����R��#,,����������c�N�KCCM�<�
yzz6�F��n��M666���MD����R�~��m���Z��N
R���V�/--��h�qs���9V�?t�
:���������^���J%���5�{�1����@OOOx����,q���������g���	��M�J��������;�_S�THII������{w4[�n���E�}��i4�[��:t��7�xk��Aee�0.��������R������'PTT����#663g�����1`����#88}��y��x����� 55J���������-[#J���\(�Ja�
���_C�Ra���W_}�����������{�.f��%J���p��Y���`���������{�n���+��};F�)J��Z�|9f���i������i;�c�Y��U�����@{���+V��3gD=vVV��1M���Mv�i�a@��o�*�R�7w���0 U�|aa!)
JKK���\��������o�����o�M������e`���Kii�V)J����v��I���?��~�����0 ��7U�.�*�iii�`�Z�v��Y��{m������\�����),,L���/_&}}}�|��VVzg�)p���k>��3��������={���F�����
t�����,�Ze��o�!+++Z�d	����|}}��O����*��j���'///���oT����,��%�']�����O?��m����E�U�� v����=u����,Y�1.�*����dll,,�ZXXH������&��k{���5TUU���YXX�0��w211�X.�����i���1����M�����������%f~~>�����o_$$$`���5��j���"&&������Axx8p�X;%%.���
����k��a��=h���S����@�N�`aa!����"44555�;w���=��*��5kF����d��������-����`������o��F(�W(8x� >��Cxxx���555��g���E�������1��YN�:E���M�/%)���K���'z��FM��/�*�%%%���DT__O������B��m����S��)++�H����*��m����.h�Q6c"Y�C�{oc��011�������A�s����J�y��a�����7oB�TJ[�VCW��V����������s�� ���^Cmm-bcc%���k����?Grr2:w�,����g��A��=E�������?��y�@��gLFa�Nc�/���y�.�
���������(�\����pvv��s�$�}?�!����\�zi������������^~�e	��?�<T*����*��n���+V`��E�~�:���O�6
QQQ���G||<�������v�i']�1�����#!!�7o���;>�����������_���W�czyy��w�������}�`aa���������.���p�\\\P^^���,8;;�����~��O��������e��!>>zzz��c�����1��=�
����```�.�����/�������s'tuu!������������%+��|�q�������/"##nnn���F�cR��.�kByy9�m�6Q�45Gbb"�=
GGG���AGGG����j�T*DFF������B���*++��������������\]]����[�na��q�����j���8z�(���D�������.�k���RRR��kW�c:C���j�/�������G�������X�f��~����K/�$z|OOO����������~�:P]]
��������������g�v��l|��1�Z��K�����aoo/$V;v�@hh(n����X��3$��.�k�����R����o"  >>>�����i����?�<�����cGYc3&N�c���������w���Q�F�~[�n��_~��'O��1�pMc��t������$''�u��8�<>,[l����.�k����1r�H���C�.���@444����Gtt4��o/k|����c������J�&c-��1�Z:N��']�1�c2���1�cL�t1�c����.�c�1p��c�1&N�c�1�d�Ic�1��8�b�1��']�1�c2���1�cL�t1�c����.�c�1p��c�1&N�c�:{�,lll��_�N�p��
Y��1���Ic�1��8�b�=���!�l��JKKK���AAA<x0^z�%TVV2331d����}��E\\�p������������}��=�������^�z5����www������
��c�=
N�c���U+TUU���#X�t)&L��%K� 99�[����V��7�x/^�W_}��_EEE�����~������#++K�s[�~=F��K�.!==���n��-��cLL�t1�����`mm�=z5Y��������k�p��u�?������{���SHLL���;LMM����I�&�v^]�v��#G����:��o�E��E;>c����.���k��w�K�P��Z�BCC


`dd�kFFF(,,DII	

5����;����AAA033���E;6c����.��S���JJJ�V�����"������eee�x~~���V�Z�������F�~�>�=��,Y����8q6l@ZZ����1���Ic��YYY���111��e!._�777�����_~A^^-�=�R��������Qd��>�����}�����&L����
VOcIDATx@���a``���1���	']���������l��vvv�:u*bbb`hh�~����7�@������777�����}}}899a������ACC������0k�,8;;����>o��6BCC��G888`��q0`�V��1��D��o�c�1���w�c�1�d�Ic�1��8�b�1��']�1�c2���1�cL�t1�c����.�c�1p��c�1&N�c�1�d�Ic�1��8�b�1��']�1�c2���1�cL�t1�c����.�c�1p��c�1&N�c�1�d�Ic�1��8�b�1��']�1�c2���1�cL���I0�c�=���/�nc�1*�RIEND�B`�

v4-test-misc.pngimage/png; name=v4-test-misc.pngDownload

�PNG


IHDR]Te�~	pHYs��-z8 IDATx���yXT�����32," .)�%j���%�U�[n�!X\������Eu���R�T�[j�NZ^�%���.�(��������M`b�9��~=���s���|g�7��9#A��\��������1`�""""�C������$��EDDD$�.""""	0tI������H]DDDD`�""""�C������$��EDDD$�.""""	0tI������H]DDDD`�""""�C���������L&���'C����?��L&�L&�^�����M�6w��~322�}x{{K���"''��������aaa��o_C�HD��."3+))��?�X���k�����dYTT��}����������{�w�^�����������r�����o�}�ZEEE������������;<���f���56
���{��g���_b���&��z=6n��g�y����M��G����{�9�e�����];|��G6lX�k=z� �Z&���*��rQj�ypO��
8��oGQQ����;wB��",,�d��/�����������u��q���b���k������=:w���[���]�b���`gg777�=������N���)S����B///�5
����m6l��L�3g�`��Aprr���^|�E�~�����g1d�8;;�������|����6������1���#G ���{kjj0o�<�k�J����0`N�:U����R�Dpp0�����222����P(h��-�O����*��Y�0`����#�����s����>����!���`��c=u�T�l�r��[�����Q]]m/::�����a�5k�W_}��}{L�2III����R�D�����7��[����@Df��W_	�+W������+L�GFF
C����Buu� �0n�8�u�����
&���	{��.]�$���G`���g�Z�n-|�����S��7�|S��d��]�A�y��	2�L�;w����!���_

���
�~/^�h��E��1^~�eA�R	;v�.]�$����B��g�y������BHH���~A��'N��4i"L�6��]}����\\\�=z�.]�$,^�X����y�c�-Z���3yl�����/Y�DP(�������!%%E���\]]����:����l�0w��:�
���� ��`|||�.]�{��u���I��w�At:�0{�l����%Z�V����&t��Q�y�����}�
*�J��a�p��ea��
����m��W_����g�}V�����.� AAAB��-��^{M�u��P^^.DGG
�B�~�z�_SD$�9]Dfr7�	�F��v�j\w��-���N������Z�j%�;����+W�'N� ��o7�&&&F���O���r���Q����n�����a��
&��+t������&5/^,����7����_�~�pV�~+++����&��9Rptt������k��1B�6mL�))):$���u��f��-TWW���BUU����%L�2E $%%	���u��E���k&5"""���;o��?_ ����
8P������������Mj}��G�������#� L�0A 9r�d��� �E����������oA��k���b/I`���8v�����6m���C������/`���x��W��w�A���U�V�����s��u�fr��+Wb���HMMEiii���]�v���~���z�A.�c��E���#Z�h���{P����OW�T(,,�z�����V�Z��$g��=QZZj��e}DDD ;;���X�f
�\�GGG���
������[[[���B.��M�6�����t�R���k+++���b��)h��=<==����]�v�]_��Z�OXXjjj�����!!!�jt�������*�
����5ED���E$���P�j�
_~�%p�S�#G�4��x/.�_|����1nnn���Z�j}
����b��	�P(L~JJJp�����PSS���c����9s&>�S�N!!!����Je�ewO �o?���pqq�U��������:t(���www����M�6�O?�t�����!%%)))������,`����mrss��O\�t	��9RRRp���z������<���&�M�=���R�T�����x]�?~���5ED��O/I���
QQQX�v-bcc�����K����QQQ���Byy9������x�������prr��y��{7�|��u�����{�����#G�f�DFF�WTT�{��J�BNNN��w����{#��jmSVVVkY�^���W/��� %%s��Exx8��=�v����_//��^�a��m(**�7�|��	�w�}|����Y�f�����?p����k������tId������������� t�����)))��
����J%
�������+�y�����/��br���H����D�����+W������G�h���}������j\V]]���������o?=z����Wq��%��8p...�������{��:~���m�Z
�F�	���u��+�����'���'33��������������'p'D���������}�{�D}^SDd]D����3�<��k����_��}lll����c���8t�._��C����>C`` �������={���o��
6�����������>}����o��>��S����.^��F���'������3g��GPP�6m�+V���+8u��
���P��%t:]��T�~bcc������G�������������w�����'����;�����/HOO��`�/���
6����'�`���P�������3g����������p�����������g����������p������i��m8{�,������8z�(._��=z ,,�'O���~���,>|C��_��W���?T��yM��4���D����^�k����������k\v�O/�={V:t����.��r�E���1c�+W���u��#4k�LP(B�N���������������\.����>}��������>���?�������/�]�V���z��)888k��1~z155�d��^zIh����#����.<Xh���`kk+��T_~~�0x�`A�R	NNNBdd�p��1��c�AA��
qqqB�V��\.���	aaa����z���~������D���KP*��3�<#;vLHKKZ�j%���	g��n��%t��UP(B��}A�#G�-[���3g
� �o��N�*�l�R����7o.�9��9�0a����Z���� !22�����O���5ED��#���syx����H]DDDD`�""""�C������$��EDDD$�.""""	0tI������H]DDDD`�""""�C������$��EDDD$�.""""	0tI������H]DDDD���u��t�����h��
/^l\WPP����T*�����w�� �������Y�fP�T���DQQQ�����,���.�V���p����(**��m�0c�>|'''������������z�j@RRv�����T���C.�c��I
<#"""�$2�������7�k�.�����e]�vE\\^z�%�T*dff����r�J�_�{���3�<���8�=p��E<���(..���]������,�\������n����q��y<����������1p���?������0������`@ff��� """Ke��
�Cnn.���m��Err2
��6�����t�Ng�^&�A�P��UXX����Z�YYY����l�!""jh
�}��m�6k�������_DBB�q�������M��j�ptt�s�^��N�3����������T��u.'""�j���[x�Y��E8z�(�
�/����P���d2.\�`\����:��h��4
�J%|||$�Y*�	]eeex����z�j��/1Y�T*��3gB��!++K�.Ett4 **
�-BNNJKK������GC.�7�l�����XL���s'�\��A�A�P�N�
X�|9�������n��!::#F����"""���������#.\��3""""Kb1��hH<����,�=4��d��������H]DDDD`�""""�C������$��EDDD$�.""""	0tI������H]DDDD�����g������-3Y�������������QUU���A||<�5k�J���H5�����YT��������Tk�+���������s8s�N�<��?�������w#55�������4iR�����,�E����h�[������i4��o(
����������#>>P*��3g�l����J��@DDD���BWHH�=����7n��`@qq1~��g������`����������MDDD���B��?>v��7774o��Z�BTT@��A�P���dP(��t
�1Y��n@
z������	0y�d���#&&qqq�����M���t�S"99��1�j���ADDD��F����p��Y���B&����/���L��F�A��=�;�)�J�����quuExxx��j����DDD��;^�8���eK8;;c���������#88�E�!''���HHH����!���s"""����`0�I�&���j���o��&bcc�t�R���o�����A��sg$%%bcc������`TWWc��AX�pa��,����UT�Z3������Z����G&���M<�xx���S���t]������[�Z���^�����m�����C������$��EDDD$�.""""	0tI������H]DDDD`�""""���|
�����<d]/�f����o�*jM�C���m;����Z��q���C��������$`q����>����-[Vk��Y��������(--��� >>��5�J�Bdd$����{"""�T������~�Z��'�`��m8u��^����|��7���$����������\.��I�`DDDd�,�����h����o����%%%a������l����n������7��3g�|�I�^�vvv����r|��ITVD������$��E�����:�WUUA�� ;;�;wFQQ�����I�&HOOG@@�q{???dff�,'"������()��fXg�.zlYT�����������8t����1d���9.�N��B��7�L&�B��N�3�SXX����:�P��f�=�JJJD����]v��u�����E��o�>\ry��@����^3==ju��uI:�"t�T*XYY����J%�J%�N��3f`���ppp@q�������������hR�����������:����k��������zwl-jM"1���T�{x����C@�G��	������������w�������gQ'�����
Z�n�7n�	�[[[@`` 4��]OE��@�T����A�%"""��(B���b��y�j�())��%K0p�@@TT-Z������"!!�G��\.o������BXL�2P(P(��w/�N�
�Ba���o���;��������3�;�,""���������#.\��3""""Kb1�tY[[�������r9V�X�+V�Z'��������D3wIDDD�����""""z�1tI������H]DDDD`�""""�C������$��EDDD$�.""""	0tI������H]DDDD�����g������-��6������vMM
�����Y3�T*DFF���H������1�������#((���|���HKK3Y������w#55�������4i�QcaQ�+::������c��sss1{�l$&&�,_�~=�������R�9s�`��-�����s"""�t�BBB�p}LLf��///�����0������`@ff��z%""��������U�`kk�1c�����&�t:
���L&�B��N�3�������u�W��f��,AII��5�����N�.�X�z��5����K.v��[nn��5����V�^���(B���W�����Cu�wpp@qq���^��N��u����������V��\Nt�{?�n��Z�G������5��d��Pe��s�=��V���4�Mgw���@V@@����Z�Ap���k�k���(--E���}�VVV���m��AJJ
��h��gO�F��R����OwNDDD��������'������BVV6o�___dee�Y�f�����E�������R$$$`������
�:Y�	]�
�
�{�����S�P(�u����XDDD 88���ptt���%������9�hmm����zm��W/�;w�x[&�!11���$�����b1{�����e]DDDD`�""""�C������$��EDDD$�.""""	0tI������H]DDDD`�""""������>�X�l�����l<nnn�������QYY	���A||<�5k�J���H5�����YT��������Tk���#��eK���A�� %%.$%%a���HMME~~>�r9&M��3 """K%I���g��������Y�p��%�������u����Xk��q�0{�l��r���!,,����_��������R���9s�e���0"""��e���r�JL�>� `����;w.�����TUU�:VHH�=��;nnn���j���aaa���t������`@ff���Q�e���c��\�2���9>��c$&&"((iii�����J�3O<�F�
��tP(�md2
t:����e��bkkk�������7@���H1���70l�0t���/6.wpp@qq���^��N��u�������u�V��f��w%%%��LNN�.;U��Db�������o.���^Wl�����LOO�Z] z]���C��!C0j�(X[[c��1P*���s'N�8���@sot��M���b����<y�����@h4�����h�T*���c����+���k�V��u.'����7�E���G���Z��Db�Yu�2�Z����C@+WQk����;�s�~
>  ���E�� �s���=t�7m��Eqq1
����Z�69�gnqqq:th��QQQX�h��y899!!!�G��\.��?"""�l�^�{H��a���>��`@�&M�;'����o��&bcc�����[�B.�c�����������EVV���Q]]�A�/'ADDD$���Q�F�����\_SS�u���G���gmm����{����d2���(J/DDDD�g��5{�l��}X�r%bcc��U+���b����y�&:w�l���,�k�~������S�����IDD�9C������?>,X�'�|���O�>0`�������iq� �Wq�N_��EDd&f�NWeee�����z\�v�Y��H?a�2!!!���FEERRR��uk���Y2�����{���K��s'����T*�>}������o��������222����#==��Ho��Y�O/=���f����]�b��y��1��W|'"""�Tf]����6m�=�#��'U���`���f��S"""�����t��j|��7�3g\]M����t5�9��V�������$��}V��`��M������2{�z���0e��:���.""���vE5r
J�����a��9c�t����B��`�����e��u������C��M���f�a���,J�����[4�+�W��"7n�6m��W�����P(X�d	>��\�|��7��m�`oo�K�.!66���0�������!��������'���z����0|�pddd�_�~
'�m IDATHLL���C1}�t��j�d2<��SX�b�������c���P(�����'x>f]:�c�����~;;;TVV����|�	��������0m�4|����8q�qyAAbcc�w�^��r�7�����d����[o��/��������V�Zggg��#""�3���~�F������ekk�]�v!%%���+^{�5>|J�;v��m�0b����@�.]��/����AAA���7"""��o ..�������k���F��7���?�����0g�$''����ppp����1a�l��	
����+6n�������=t}���pvv���������������8{�,���E+..%%%

�s��������j
___��� ))	�w�Fjj*����I�&�����7"QF*�zX���~@Hqk���oL���-::h�oHyT������4o�����o������o������t���/Z�l�o�����		�_|��Z����y3�~�m�^�u�k�z�2�*��O.H�N�<�E�)Z[[�o��w��s���s��������Z{�����m�6dffB�TB�Tb��iX�nbbb�~�z������0g�<���X�z5���D���Dr��4]���
c�z8���_���/3t=�~{�)kk�Z�
`��}X�p!JJJ`ee�.���+`��U��������";;���
>���Z��s��5���x��wM��v�������b�KF�h����EZVV�3g����S��BBB�\���;;;x{{����#--
���n�=�~~~0�������=���2d&O��c�����#&�����������q��I����V��U�����/GVV��G����L&{�>���+>>�z�������M����������=t��N��B�0Yfoo�NW�z�L�Ba\Waa!����C�V��w�%%%���j��Z�����'�[.�!si���@�k�������2����� ���#GP�w�O����������Ab�|�2�Z�����������V�q��i������
={���5k��SO�E�pss� ������5���{���^��+��4i�;w���={������$�^�����;���
����������6rpp@qq��2�Vk<F���z�:����.WWW������V��\Nt�{?�n��ZS�r�E-�N�:�S����\q�/nI///x�},*�U�*��5�����zl��w�t��P�����{����+�#jI�����������0�o��������|$i��-^�u���@�R!""����������c�����t��dCLL���0|�p����g��?��O�S�N0pww��%K�t_�g��;v�@PPF��9s&�����E)����d2.\��v��������C����@h4�����h�T*y1"z�T����[��
x���5�~G!��B�`���O>1�}���6n�h���E��h�"��QQQ�6��1~���`����>|8��@����C���+�t�R����
���HMM�\.�����J%"##1s�L�^�7o����K1o�<����h�"<���prrBBBF�-IoDD� ���?=U';%����5����H�}�v�^����;c��Y����F�m���B�B��{�b���P(�4i`���������'�u����h�����XDDD 88���ptt���E�������{�<<<p���O�t:\�x��7mkkkTT��+����u��:��d2$&&"11Q�~����~K��^���7��Yooo��}���Q�$;�������I������c����z�*0s�Lt����C=2$�����=^x��i��<A������H���1m�44m����HKK�����|�rsMDDD��0{�Z�`�_��s��a������,Z��������`��u��a��5^^^�e...���;.]�KK�����TY���G'���m��	�n���m�]��L���o�:�����.ooo����x��'���������+�5��,�r��Cb���]#s�����\\\D����������������={�������G�v���K/�������c***
�'O��!C��[7t��	)))��������/Z�n�.]�������/����@�~��i�&TTT`��Ih��5���0m�4����l�����P�T���2Y'&���j��.^��~�W�\A��M��G�{h"""z�Y[[��o���3g����e��a���8{�,���1o�<�;w����3g�����Oc�������O?�???��5�������Q]]�^�z��O>1~��S���hp��ub���������1��.8x� "##1m�4(�Jl����s�}
0�x���^@ZZn����������������/^����-[
�BGGG8p����q��I�`eeOOO����f���C���+1}�t���k�b������B���QUUe������1���j�������[�}�6f����]��[�n�����a�k�����T*������������`0�\��v����+WB&����?�����D!--���effb�����G�v�0q�Dc�+((@DDT*�����;�@�z#""����q�����B���f���3g���_~���G������F���q��u�����#77����&�����QXX��'O�w��A�v�Z]^y����b��]���@�>}�������F\\�������V���P���"&&F��,]E���_�������+jM""z����O������7���+<��Spss���7������b��%���������-���QTT��>���~�������1j�(���������k��!5j���1f�(�J���'N�@``���7�h4���
��z�Bjj*����m�6dffB�TB�Tb��iX�nC��
J�1$a��5�(lq{����$""3i�x���z'�:���#&&/^���#>��s�o��Q�F���nnnX�l�����kW�>}��G��}1{�lL�>��������`��a�0a���y�\����C��q���m[c��A���
��j(
so��_?l��]�v�������?c�����������������Kz�����������rww���a����hL��>}�������d]RR���L�yxx�:������c}�-t�5
#G��������x�M�����C�n���������w��X�v-�������#**
����������D�a�����'Qk�������5����t���B���3���???�����M����5�	�^�~��a��	�<y2������8L�4	���&�k�Z8::��SXX����:�P����C����#�3�]�V��Z��j�k��b�PR"��[�V[�zl� N�<��r{����mZ�>9!//'��S�u����`�.3�]���^���r�k9ryu�����q������t��e�}^���l��n����0JVV��=���X�d2������^��)S�x�b�d2\�p�������C������"<<��r�Z]����*����F������
-��XsN�����������7�E��R9���D�N����c��V\��[���^���)7�\���z/���T���{{{{��>����������;�sE��tuurD-	����o�?r�����4'�^�{X�^�:���eK8;;c���x���a0�}�vC�T"223g�����q��M,]����3{_b�\��>]��~O�{�;Qcg��5{���~k������7agg�;v�������s!:w�l<�n���������'�J%&O��#FH������n]����X�]�� ��� ~�)�t� "���l�K�@U_�<��=?������[�J�=����7|P>d�"""�5���D�p�?|����>�?kR�ut�"��5���D�pN\���:#j�������c�����P�:]��=]DDD�X4��t54I��U\\�+W�����\�=��������������������d���k��kWs�@DDD������_����{("�G���",��?��l/��W�#"	�=t=���e�DD��[xKE]��5]D�)������#,,�;w���F�F\\|}}��Q�3{�JHH������j��emmm������	f]���������h������YV�`��QX�~=��"""�F���k������G��M��M��#G��{x�f����\\\���//�ZPP����T*�����w�aH$"""Q����[o��)S���������}��'��mN�:�;�|�
�����8899!??Z�������ELL�d��e3{�zT��:))	s�����`��-���2l��
���P*�P*��6m��[��EDDD�1���GAUU4
�����sg��� ..:��������q{���5h�DDDdY����QPRR����8q�Byy9���3g��^�B�0����:��V���B$''�9�Z��o����E��Z�Yj��|D��jQ���`�&���������,))�h�Ep��v�����-������E��n��^����	�_������<���r�k�y����E�YVV��5�9���RQk����Zw~�����8k�����(B�J�������!N�:3f����cQ\\l��V��������\�V�����
'����U*�'jI�T�z��A��k��Z�\�y���?`/�%N��a
pS�_�*�3�/jIt��	�����+<�!jM [���������������LQk*�J@������E����@���������t������57���+����+�#jI�������}}v.�k�mll��uk������A���-��� ��p������Tt�����%"""K�(B���b��y�j�())��%K0p�@(�JDFFb�����t������K��-�i4���7�@�����`������������������1b���n����,H�8��r9V�X�+V�Z�����[�6H_DDD�84�=]DDDD
������H]DDDD`�""""�C������$��EDDD$�.""""	0tI������H]DDDD`�""""�@�]������vAA"""�R�����w�y� 4h�DDDdYl��}���HKK�����'''���C��"44������i�^����r4�=]����={6������m�6��7J�����6m��[����eiT�+&&�f�����qYFF������m\������������,Q�9��j�*���b��18x��q�N��B�0����:��V���B$''�Y_�V���������h�Z��<X��<�������� zM�����+E�YRR"j=�j��*r��'O"��^��/^���%�{)//'D~�krk�?�����E�i0�K���^��fYY�"�<r�
�JE����+j=��$���/���{�[�]W�^�|�C��Z������b�eZ������uuuExxx��j������N{�}#�T* O��P�T�����)(�������������?`_������)�/K����$:u��N}�}���������-jIxyy�K��}��l`k��5�J% ��Z�������f��J�?������zl��w�t~V�������Z�����$|||�S���>;��5���}�v���������h��
~��W�d2\�p�������C�
�5Y�FqN���q��
dee!++�7o���/�����eKDFFb�����t������K��m�i��~�/_���jxzz�[�n�����#�-"""� �������������"gggl���A{""""��=]DDDD`�""""�C������$��EDDD$�.""""	0tI������H]DDDD`�""""�C�M������������?������DDD@�R������Ah�������4��5r�H�l�yyy�h4HII���qqqprrB~~>N�>�����W�n��������4tR7n���!��������0�?eee��m233�T*�T*1m�4�[�111
�6Y�F��k���pssTWWc���CFF������m����iii
�-Y�F�������1c��O`�����tP(&����C��5X�DDDdy��E�q��
��;b����l��j���X�����HNN���Z������?���h�Z��<X��<�������� zM�����+E�YRR"j=�j��*r��'O"��^��/^���%�{)//'D~�kr������\����D�;��z�k���A.r�#G�� �T����������Ab�|�2������5��u��M���b����<y�q���d2.\��v��RSS��C�Z5\]]^k�Z��s��]N{�}#�T* O��P�T�����)(�������������?`_;t?��~X����J���Z�:uB�>�>��
�n�Z�����$����%���ij6�5S��J��o-kkk���`��P%�G���@q=6|��w:?+j�Mgw��D����
��Z>>>���s_���������8:�$p��`����9s&t:����t�RDGG7X�DDDdyE�*((���[1�|(
�O�.]��/Guu5<==��[7DGGc��
�6Y�Fqx����/v�����[�J�5.�bOQCc�""""�C������$��EDDD$�.""""	0tI������H]DDDD`�""""�C������$��������h��T*"##QTT��m�a�������w#55�������4iRC�EDDD������(�J��3[�lAeeeC�FDDD��@zz:�����`0�����}���	� 4t
���'N�@PP�q���#������~����������/��P��QWi�-�^�����j��E�Ycc���n��4���V�Z��
h_S jM��xA�L���K�P������6�k*D�Y���\)j��r���}��YU�YM��5�v��h�,j�*���%����0�"�s*��mgOqk�-�D���a�dZ�
�����6��F.j�[:=t�Qk6���E��Z�Z��J{�zo.����\����Q��������^�NGGG��\]]�|����F�Bl�sG#�?���7F�y�j���[x���"���@h4�m�F�R	������,C���(,Z�999(--EBBF�
�\���DDD�x��"���Xdee!88���4h.\��m�a��s"|bb"�"""�P<�(��m�6t
�1��|��{�����F>��<w���DDDD�7�{�����$��EDDD$�.""""	0tI������H]�����#$$M�6���/6n�X��fgg#44nn�~q���M�6��g��� ������p<���f�����0h� 4o������������(
���1c�_sjl��3�w�������{��c<��x,�����UYY�!C� &&Z�_}����s��������G��}��gOIz}\�_���M�_����n�O>|8|}}�����b��e��m@��b������0��]���[6��6������X�����?555���,C�#���P(x��Waee��={b��A��i�}�koo�������g}��W����|�M<��������%K����L&3�>88[�l�������x��'�ws�q���y����m��������jq��Q����������/������wB�J�j��E���'c��!���:u��������K��6m��e���<y��frr2|�KEIDATBBB��];�m�����q������.]����O?�4�;�����7�@�������_|�}�3�����m�������)SL~q������___,[������O?�g�yyyy��@???DDD��7��+��R�1���0l�0�o�����>}�q���c���
�R�/��}��E`���w�AQ�}������,.�( �������`�@�&��j�R�����F���t�t�[�i��9L)����*���� �U����{����Tv���g�,��{������<������������q�Fx{{����FZZ���kh��=V�^
���+�777���a��M����RVbb"<<<����A�!++h|�q�v�	�J�����c�6�'��������!���-���{�n�j����!""���bmm-555r��E���g���e���""���&?���\�vMH}}��/aaa��5���d��i""b2�d����m�6)++RXX��v��U���+""�'O�#FH�~����Y��/999k��0c�qqq������O���[��w��1�����Hqq�8::����EDd����u�V����3f��3g���L���$55UDD�m�&NNNR__/���2|�p�u���������y�s&//O�����_~���d�h4�c����3""^^^2n�8��	�����d���{Kbb���������o�������@�={��L&����a����M�Zm_Y�r���TVVJ�N������������*���r��
III[[[���?%??_��%K�d2����o�-!!!b4���T(""r��I������|����{��,*�#]�Tuu5��k�d��O?���j���������0`���PPP`��,���YYY8|�0n���`������6lbccQ[[�K�.!))	555��C�"  ��������cG�����n�#{����cG@xx8.\�����V����d,]�����A����+@�{��~dffB��"))	���HMM���#���0�h4��w/f�������o��V��9s��1xxx`��A���P8;;�5�I�&���J)'''�x��2��������u��� --
����J�B����������6m@����g�y��x���Ghh(\]]c�����=222�����e�����>}:�j5���0e���a�����prr+w�jY���V���M����C��Y,��`kk�l����FFdy����j5V�Z�I�&a�����a�v�����3g����;z������+��{��&�|��'���Gnn�r�o������:JKK[M{��'�^����P���|��2_h��-HHH@tt4rss����c��(,,l�.�Z�N�:
1w�\,\�P9ncc���B��LII����q��4�,�N����{w��}hL��V�������c


��'�/���7����?q��yS����Q�.))i�7�XUU�E�!++*�
5550�L�mYG�Z)ooo�?��3�s���o����\4
���Ee_ee�#2���h�����?����
���4^t�9���������dff���T)���`mmm�V<EEE���B�V�������b���8u�2331d���N����Ka0p�����b��-���s�ZAvv6��������7��������rt��������5��-zPw���������T������O+q������~{"��at���I[���X��d.�^�Gyy�����%8{�,��������?���b#�`��J�1Z����GCCRSS�����S�Z:4�prr�����?4�&���Z:,������-[��;����+�gHH��]8y�$v�����(��"<{�l��}555�?>�����sg������ //h�<����V����
���+���������*���a���8{�,������XYy�o�>B�R!<<�6mR����J�}/����|��2	}��]������777��i������??Pvvv

Bbb"��o�����c��#,,�R�e�Fee�]WR���a��=TTT��o�Q��W�^�����[��a�444�����b�������V���������;w�����}766+V������N�����f>�mll�|�r����������#
����0�0�Lh�����`0k����>���3�FOOOl��h|o###{{{|�����? ))	�f�B�n���h0b��&��j��������+W�����W_���wV�
2z����Gbb"bbb��O|��G�2e
����R�0z�hDFF����Brr2������z���8p����0o�<TTT`��0�pttl�*�n����a����C�V#$$����:g�j5�n���3gb��5:t(&O����_��oGLL<<<`41x�`������~���a<���X�d	^x��F������C����w$.>>��M���<==1u�T��s�����S��W/888���>������"%%��c����!%�V���^���+���-����dR�HEGGC��#!!��aY����h4*�-v���u��)_
AmN</Q���B�
V&�����k�_$��,�+W�DPPjkkQWW�����{�����6������
�������=&K�p�B���������D��[�QQQpuu�V���Y���K/Y:,��t_�����������F����#�x�b��O�/�<>^$"""2&]DDDDf��������t��.""""3`�EDDDdL�������I�0�""""2&]DDDDf��������tQ�khh��]������,�%0�"�w��9��� "�?~�~�h4�PdDD������E��!!!��������]���};�����G 77�n�Bhh(N�8�Q�Fa��	(**B��}����O���'������7~��W���"""�F���`���/-Z��#G���)))�	"��a�ED���hPUU���4���c����q�JKK�#G����������e�����f����!55_~�%������
WWW|���\�t	��GBB�Z-~��w�7����h4������?�;���8���Y�;���."j�	&���0`��:u
;vDQQz�����W�[����J{{{���#99*�
&�	���������#88�������k#��fe�������S�mmmQVV�����D��_YY�u�v�����+��.G����8q�R�Z��{C�N�l��j��"�6�I5[II	���������{��������^��;w>�*�.]�`��}���m�?;;*����@Ddn|�HD�vg%���Wa00|�p%���(++CRR���Z�h�Z444������7CDP__�9s�����mQK`�ED�����~��!  ���prr���Q\\OOO���a��e������1t�P������:t@`` ���p��)������={�D��=QWWK7����S��X:"j;�z=��?K�BD���s������jDD���������z��������!�I�""""2&]DDDDf��������t��.""""3`�EDDDdL�������I�0�""""2&]DDDDf��������t��.""""3`�EDDDdL�������I�0�""""2&]DDDDf` ��A=�~�9���F2IEND�B`�

v4-test-random.pngimage/png; name=v4-test-random.pngDownload

�PNG


IHDR]Te�~	pHYs��-z8 IDATx���wX���{Ap�^A���&�5V4*��{T����K�v09�XI�5�D51kD�
������H/��Gt?W �qw0p��k���y��}g����wfB""""��yz��"""���E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E���5

������m�����{emODD
v��-k�����������|���P(����[�/**�����}{XYYA�T���	��GDD�k�\L�0�e�(
$$$h=��Q������.k��<��w�*===>|X=-������~�z����W��������o�y�����N==~�x`��E�����|���`�������7�|333\�~k����m��y�f�����sI������ X[[�<�.�+I���H�
��o_l~�>}��Y3|��W,����4���9�6m�he���o���1l�0�ec��A�.]��'����VVV+G^^

%���[u���[y^�.�+I���D2�R�
���q����/_����LLL`ee�:������IIIP(��};>��s�������[�Fdd�:���'<x0���ajj��>����+����/�g�����D��UQ�~},X�@�M��������h�"888@�R�U�V�������Q�^=�����Sr7n��B���]�4�h``�w�yG#v���^�:���
��9���C�P���s��IKKC�~�`nnkkk����?.u�geea������{��LLL�m�6�?^]peffb���prr�R�D�50h� ��{W�{��=
N�8�����A��es��E�5�T*��E���P(��/�^��m
.]��=z���VVV����F�bcc��O���A�T�v���>}:���J��_m�5k�@�P`����~������6	"���#G
}}�R��n�Z��UK=�q�F@,Y�D��qC\�tI���[XZZ�!�x��� ��u�Dvv�HNN����Y�f�u���_���������7o���7�z��	���Bq��=aee%<<<��S�D\\�X�l��R���6m�z]�55k��g�YYY"&&F����F�������T���,�y���e�R�[�N1v�X����{E�Z������B��eK1d���6L�wccc1f����C���/~��G@�h�B�]�V������a`` &N�Xj[�9"�
6��;�i��a���B������8q��	��qc��uku��m������-[�����!����o1g�'~��w���.�J�?~�B���7����B������8v��(((.\���b��IB!
����x�������]������1c��]#G�5k��x/�j�6n�Xt��U��������i��mFDi.�."-y^t���k�n��-����p�Bu|JJ�����XGdd� ���+�EW��}5�,X 
����iii�J�*b��	1_~��F� ����_��
8P������<!�]NNN���H���
}}}�����7k�,ahh���1c��w�yG==~�x1r�H��A��?!�x����R��		���@!�����������E�i�����Z�*�B�e�@=z����%&&���x�y��-Dzz�F{���+�����WWW�y��,�V�^��{|����+,,7n����������(����%l���W===q��m��y��	KKK���%y��_����DZTXX�W�Z�����c���<y�:���������j��;;;t��	���������kL[XX@���T\�zx���4b^>-t��Y��U����[�j�����������h�
��������X'KKK���!++��m�� ::Z�����]�vx���q��Q����QTT�.]���m_����kL[YY����*U����i��b���h��	j��	;;;|���P,�����tttt���u������^?��zzz�������Q�~}������(����!C��������y��m��!C�R���z�H�."-������g�������;v,����Bf���1cz�����#""���%���/>!�<y<+�^daa�1���^�����|=�T*���4�yJ��cG����HNN������}{�o�G�����h���+_�gdd�1�P(Jm���]�&i�EEE����}�v��5�N�BDDJ�q����#''���Q�jUILI1��v��]t��	qqq��q#��=���xyyI�WiLLL0t�Pl��BDDD &&�|��k���4��E"-{�HE���1n�8��7}�7n�^�����_=/>>����~����G�HLL,������\��.�j�
G�Evv6������z�����A�������>>>Z�[�-Z���[�l�����c����
6`���HLL�����a�����crrr�������GVV���_�{��Ajj*~��899���X0�]c����U�p��	����[�F�F�^{�D��x��H����;;;>��#������l��A�\�
@OOO��>8t������n�����8����������_�oe���+N�8����o�Q�fM8;;���~Bdd$�u����x�mQ�?��.\XlyFF
��+W"77iii������c��-�����R���w�����x=z������...h��=�n��m����O?}���&]D:fll���W���������g��<<<�k�.DDD�����6m

���������`�beeooo�^��6mBtt4V�Z�h����������3g�������)S����������������cG��v��a��E�V�Z��O/����3���,v��U��1>>>�2e
�w���7b��=���WWWDEEa�����A�F�`ff���W���[���@�^��}8}�t�#Y/6l��=�3f **
{��A@@LMM_��Z�����ERR:�������HNN�������_�zJ���}�6l���O��O�>��V"*�E������o_��;W��5k����	�[�V�^�|9����7b��)���~�zt��
c�����~��l��	xv�lmmq��	����K�.pqqApp0�/_��S����������Abb��Mc��o���xt��zz�=/Z:u���g��V[�T��]�v����GNN�N�����c����������E���9�m����o�~��0`�������U�V���O��?��k������/�n�:���a��yX�r%lmm%E�i��=�q�F��S�g���U�0m�4T�^�[���7�\Oi���?���>�
V�>"��B�����/$$��
����5���V��@z"�J���k8�<F��q����"�]DD�\�N������C�������""��J��i���E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��B]��G�-`ii	GGG,[�L�������P*����u�EEE������
,,,�������r�	U4��JKK���7&L����T���3g���S����l�����k��Q��� <xQQQHJJ���!��W�="""����]yyyX�bh��	6l�k����.�7$$�������J����s�s�N�����"""��*L�ekk�a��������k���]�v���T�N�:prr������
�r�
6l��]ggg">>�zBDDDQ��n�.��{^^^������kz������c���x��!|||0c�,Y����P*���W(P*�����XoJJ
��������CQQ�=#""*J�������4�B�w#����s���/4�|����~���?���`mm���P�j�
PPPCCC\�r���/3gXX�����""�7	��^��
sz��9�^�za��MWvv6�=����CCC������������R����$c�����"�0EWVV������������X&����/V�X!�����������!C�`���HLLDFF0x�`uQFDDD��*L���~��u=z������add�}��a��m�������t��/��������������X�xqyw����*�
7��<�<7Ut��{mkL���E�Xt��E�Xt��B>����2H}������b�q����RB$�
�.""��3W�����b~����I+�������d�#]DDD�����x��[f�O���3��2���b�EDD�.�z���9e�5�g/K{�����DDDD2��.""�G;�>n�}\f�����m K��J�������6�������j�����O/��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��Jy7�����r����))����������{>����������b
��O�z:o���B�^<~�8Z�hKKK8::b��e�e�=���,,,`kk�i��A(**���?lll`aa___����cO������0EWZZ���1a����b��=�9s&N�:=z4���������H�������������BRR

1n��r�U$�������+0p�@@�&M��aC\�v
YYY��g����J{{{L�4	[�n������vvvP�T�;w.v������r�U�������a���w����k���];����j��pppP/o��bbbW�\A��
�����QXX���x�{ADDDU�H��]xyy�?��j�����p(�J�###df�y�Mff��r�B�R�^�\JJ
���K������b�eI�=w�R�k5B�]IqEEE:������w����?��!9���3��wY����T�����s���/�G�LLL�������SS�� 33S��9kkkx{{�V�|"���":�')�y�������n�
D=.3NOOO'��n��e����[����j�*g���	�b���������������^��*�W�^��i���Fggg(
\����]EEE�q��DGG��EGGC�R������DDDD/�0EWVV������������X�R�����Y�f!33			X�bF�2d�.]���Dddd   ����a9�����*�
St����n�B�=�T*���'V�Z���|������#F���~~~������+`jj����s�����"�0c�z�����iI,--�k���)
"00P�-$""�����""""z���""""�A�9�HDT�
_���e�/���o�������4��""�_L@���o�i�4��=DTO/��E�Xt��E�Xt��E�Xt��E�Xt��E�� ""-8�����K���9
�7���0,����`�����%����U�y�����_=�Xt��E�Xt��E�Xt��E�Xt��7G%�
�ivN_�')���
���u�&"���"�
!�^*:�MR��Sz`�gc�����E<�HDDD$]DDDD2�pE��5k`bb��+Wj�������!�J���n�:@QQ���acc���"55��z@DDDQ�*�F��c���Q�F�����a��-���Q�F�

������$b��q��"""��*T�5b�l����������������		���?����R�0w�\������2�����*�
Ut�����,55AAA�S����0q�Ddgg�\���
�c���QXX���xY�MDDD_��eD��=��ysl��>���f���%K� 33J�R�P(�T*����������������0����J�0[rlDD,s��?.����}��A_O���999�������ys/Kr��s�`�z]���J�+**��gu~~���;w�h=�����g��A���Z�O���]AAA��0m�4���c��%011Azz�zyAA233���������w�u����8����l�����
o-�����#����b���PE_�'�!	����������ye������ysx�����n�
D=.3NOOO'��n��e����[����j�*g������pwwG�fN;.��
uz�4���8z������<\\\�^
�J'���s��R�B����g����cq��=|������2K�.E���ann���<X]��4���4;��8s��h�h#K������*,,�����R�����c����~~~X�b����I�&a��Y055�����_������WWW����G�X�xq9����g��}�[���v����E�di����]���9��e��8u�T��
��QeV)�t�7]DDDD2���:t�����[�b������vy1QE���k����>}:�����1o�<����k����+�*'"""��@��CCC�v�Z(
l��K�,A�����������	DDDD�N������HII�����C��������HODDDT�t~��g��4h���1t�P�T*���.\�������t^t�9�k�Fzz:z��<{0kXX��C�����=���ti�mo���.Q�d�9��S�����K��D�����_����{�����D�m""�?���4h��9S����"l���j�CgE��9s���S���b������C�Z������������h������QtVt9;;.\�E���w�Q/����u��;w�������*2��2"77����


p����|@5QE�����}�z��	777888 ''g����o���
�:=�A�EW�����#))	*�
�F�B�N�����mQ����+66&L��+W����������DDDT)����={6Z�h�����J�tNNN�NODDD�F�y�����I�&���7a��O #;WB$�4��*�:o�F��4�:���I�����Q�Fa�����/v����Iz�5���������]�m""��N��(���jN�Xt�������0����;w.���5�qLQ���eL]w������������&��c:w�+++���y�5u�T�?��e�EDDT9<��G���R��f���;g�����{������B������������7���3���`gg����q��066�uz"""��9r$&N�___����h��N�>��={��w�E���������C��������z��9����p����^�z�]�6&L��������������c����b����_�>4h��}����G��7�{���������Vt^teff�O�>066F���Q�zu8;;���C�NMDDD�`8p����	!���a��m���Drr2���3f�{�=$$$������cv��
�<y2F�����#&&iii����������+����������8�<�^�
;;;|��g�R�'N`����p������O/~��W���Drr2�U����B>|~~~�|�2���t�""�rW�I(-��O�j�n�?���'LMM
4@�����C��������{������U�{���7n�z�-���Opqq�������g�%|��?��/��B}��	&�^�z(((�B��������v_t^t]�xK�.E�j�<������Z�l��W��Y�f�nQ�{;=��FZp{]��x�)}}�b������G�b���x��	���p��u�m��#�� IDATn�:|������Oq��|���X�`A�'����'O��34���`kk�Z}�����5k���S����p��%����:=Up)))���'>��s�������5�]���b���������q��!�[���zj���U�V!!!A�JKK���@�P�V;u^t���c���h��-����{�v�����;�.""�W~��WQay��^Azz:�����VXX"##���S����y���t���`P�j� ����������
���7V�^���L����1i�$��S��������P��s&&&�1c�6m���TI���n��z���w�l�wnoY�DD�O�����C'#g��I��P����U�R�U��%��+���`aa,\�c��E�F�0}�t8���P(���3F�===�������3g�L����t4m������������v�^&�-�CCC��Q#00k�,����f��Z��f�L�4	_}��������#������#044���#�B���"L�:������G�N��n�:XZZj�}�{I���Vrz�q��T������|�������o��Vc�����_}K�.���K5�2D�s�^�J��e���E�a��E����������R�_����k����4�����;"//O��F��c���Q�F%.377GRR"##�{�n�_���"**
III044��q���6"""��t^t����������z����������h��1b�n��������,�������J����=&M���[�BBB���;;;�T*��;;w�Dn���E�����p��u��233q��
T�^]��J�Clll,�V��������r�
6l�^������B���k�'"��c��_��F�@��r4�H6�<{�C���a���S���4h�N�t�$33J�Rc���������+
(�J���RRR^b���0���^����%����k�}{�Y !�O111{���W�<���o����@�����K�gK�����%�FDD�2'A�����K���o��^������H{n]RR���}��,��V�|��(3.���?��l���������$���?�W%n���|IqR?����c���+3�|B��2|Yz�Hy�`FF��;M'd�z���k��?n��
��5M�4�uj5�b_iii���///((@fff���������.������S�Xxdp7��8sss��o�R���H�uqq����V�_�>�L���sg�[�H��.`�F@B�S������K��J�uuu��gc��?q�p^Z����*����I����ooo��mo��,�@K@B}jdd$��n�
D�]��|#�����g��
6�r������ �>D�v�����T9���������8SS�R�����r����z���G�������r�Usvv�B�����Q�����FEE�q�??t]\\�V�Z����R����$k;�������@���L�4	fff���ELL��_�U�V�:��J����/f�����L$$$`��1b�����K�"11����e/���������E�!99W�^��1c>>>h������~&��*,,���1�����c����~~~X�bV�Z�Q�F���*�
�����a~~~HHH���+�����G,^�X+�������W�&)��w#Q���FDD����S�Na����Q��z���Z�l���8�]���9�����v�*q�B�@`` %>���R~A�r�
,-B��!"�7@n6��Z�r�	`l��t;v�@����KW����������]8|�0����������g�r�Qe��`�����|���L3g��{����E��t^t}��h��5���{��s����z�*>��c��J|i�}�}`�Ji��V���"��2d,--q��-��yyyX�f����.���~���T�V
�W����;z�����X|��D��=�����{�"??����x����{1�|��w=z���M�^�jW�t^t��U7n���}�p��-��������G��|�6�
%�_���5L��������yW��
S������������d�E�J__?��.]�+++�\���������������������s��'� 22��o���~��W8;;c����v��]����|�i��~���y����Frr2\\\p��t��I�����'O����/`�������T*Q�N9�Q	T'�$��'7[�EW��D`�di���������	 ����F��������B��Q�����z~��n���O����������eK��}�����]�����*q;w����3�T*�T*q��q���9n�8������
6����u��]k�����k��S'l���������v����/��D$���.�Zv�E5�]��?����&z���k��'�'���V����z��1T*f���3g�@�P ;;��������</�,-��h���
%�}�
:/�BCC�v�Z(
l��K�,A�����������	DDD������OIIT�V
�g���K��������������C���^�:��������(((����GMi��o��g�dSRRp��E�B��s���������_�����y3�}�]T�V
>D�
`bb���,_��������������-*|||��w�!33������K���.��HW��=1h� ���c���P�T��?.\�]����
����=)cK���^��)`�X�r��������Q�F���055����'O��A���AT�V
+W�DLLZ�h���H�������3g�O����4h�������>��3<|(�y�����k����]�6�����G@NN����T*u��Rz�����H��=�}i��6Qq����8^���t�""�������������+6�I�&�������y��-�����1������t�<y���(����A�a���		��3g����5b�n�
www]5�\��rg�%��`c������&z�X��..�f��^.���tVt��5���pvvFFFF�1����U�rw��=��_\�qMj��J[���k��o�����Hv��iw����y ���@�j@��Z�m��
,�\Z��p�~3��'"��EgE������0����Oi���k�~I��~�&-8��2�j~�5���[ezt��~MZdY���IZp��,���d\�M�*��^|~Z�4��"��tVt��3O�>��ggg]�'"""z����bAEDDD��xz����H<�HDDD$�]����2�"""�����""""��,��JOO��[������tDDDDo4�?{q������OQ�zuh�y����G�-t�"""�r���+$$g��E��
u����������u���P�$"""�;t~�k������3�5kkkk�e�G�F�:ut�"""�r���+  ����Q�F�1]����NODDD�F�y�����}�����T�������X:�5h� ���p\Uj:/�<������Q�u��i]�'"""z#������S1~���999�:����~��7�����r�J�5
�=����9CCC�9���P(������*6�]o�C�����e���������G���IIIHKKC��Q�N�5�\�JDDD����7EZZ,,,�������={�J�J�I�&a���,����Hkt>��M������ ��SNNN�8q"�����U����A��A����k{����b�4G�z������c���x��!|||0c����J�R#�����������������1���I�������K�.��+3.+�P�:SRK����_P`P����(�����������d�
+�������O�>�/^��,e�q�3$�3Eb;Q��U�+WI^gn^.T����������K�{����88{�,�S����fK^gjj�����<��������! e����7qYb���Iqi�i������H����{�?�R��`/!.++�K�{B�]IqEEE�� 66W%�����'��	�;����2��'dH^gz�XI�����1�}�WSi���� ���6m��������HKK+��b������.6?,,�����?����������.��*��eO�r�o����������+�2.3.�#�>U* ���z��%n��,<��[�h~���	�"i�h��)�v,;������+��im%}�K����3��$I�U
��+k�������H(~����dI�����������d`[��uZZZ���344���O�?�()V�P�����'���!	�����-,-$���i4l^�:���e`iaH�O���$o�7$�����Teqvv����n�e�c`eeH�hD�v����Y\�l<�7A�:���	�LMMK����S)N/fgg��������`hhggg(
\�~]�,**
�7.��QEU)�.!|}}�b�
!p��]|�������J����/f�����L$$$`��1bDy7����*�JQta��}��m�������.]���/��Z�
���������;F���w������4c�Z�l�S�N������v���MDDDTyT�#]DDDD��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E�Xt��E���"���������Ejjjy7����*]���p��ADEE!))	���7n\y7����*]BBB���;;;�T*��;;w�Dnnny7����*]�\���
�����QXX����rmUU��o���L(�J��B��R�Dff��^���
vV�e���9���T����PH�

cS�Jb~=i5���@r�B�j0��);��R����L��76+����J	A����m/^e�Kd�2�������e�ZH�oca$)������J��U�_a��I�on-i}`fTUr~�:PXXv������-�QPTTf��q��mo`()��@O��������6�������Y��nd*9���1���e�U1�����H
SV���M,��������Q!Dy7��Y[[#44�Z����W�\A����q)))/���oo��n�?�<m���}��z�������H���J����P����_��^��+s��;?�^~*s����
b.�t2d�.]���Dddd   �����C�DDDDe��.~~~HHH���+�����G,^����EDDD��g�X�M!""�
������d��� ++FFF�2e�{y���}/���{��{y���}� ���""""��y<�HDDD$]DDDD2`�EDDD$]DDDD2��={���nDe��k�"33
6�5��������aaapqq����������e�$''�N�:��/��������5k�`������B�5d�
<@pp0._�GGG�T�\�-'O���9s��m���;77��o��c�P�Z5XYY�����X�t)�9�
���D�����o���g�-[����
����kIyo���<|��Gh��)ll�?8�t�8�t�d��)������<�9��O�-��k�����'O����+pww��'Od���?�g��HLL��9s��;����������E�N�p��a$''�M�6��o�,��l��h�111��k������7��?44�1�6m����������}{���.\@��M������|Y�����e��HJJ��'��I$&&�����u��4i���Q�n]L�>,�-wyo{044D�V���C\�zU��T��~�cep��i��aC���'��������H��}[��m��;v�BQTT$�6m*N�8!K���Bakk+�����������X���O�����+1r�H�tpp��[����>���#6m����������f��%�-��#Gd���e����?�X=}��u��Q#��kW������]�v�7oVO>\����<��~_����c�����z���C�y��:��\yo���?_�����W��K~*�����s����@�����m[�����?<<m���=�H�T�����1cp��Q��~��)���Q�fM����M����W�O�>8}��N��;w�����!C�����x���N�>���GGG�t�n�p��Q�\���-�y��O��.((��9s���
___����4��}�[�.N�8�G�a��A��-_~��������:���������9�:u����V����\������iii:�_���E3f���1c��C\�vM��T�.�j�
yyy����*K��;"))	�;w����fff����n����fffx��w����j�:t(���� ''Gg���i��~===��QC�m����o��Fc^�&M�3f���K:���[7������q��,Y����pww��[�t��������(�,--q��\�x�}���r@�.]���c���s���o����#����x�b���#G����C������p�o�^'�~���((((�m���c�����O���'QXX�^���_��O?E��q��u���$*�cm���a�������iii"##C'�
�?9rD��_=��w�	���QTT$���5�;vL�S�N�o���X�f�Vr!Dnn�X�b�=zt��P�~�m��NLL�Z���d�r�J�a����c���#agg'�,YR,v��b���Z�]���������}������(**R/5j�2d���������7�������X!������������o�	������1~�x��g����[~��M��[oi���{��/�����?��C�����<�z�j1e�q���ct��GEE������Y������!���{�����O?�$���k��A�K�.b��i����L��Q#�j�*��B�=z�o����e���Bq��)add$Z�n-<<<D�j�D�~���M�Drr�B���g�5j�7nh57��^,/999�Z�* ==]�tAHH�Nr�������}{T�^]=��wo<|�Z�5k�,t��
YYY�ym����y������g�j����_k�������C�8qEEE�����9S,��mwwwDGG�v�����?��#f����k�j�0@��K��666��};&N����,�+�|}}����
���K�0b����===������d���u���m�������X���;l�����Gnn�k�?~<f���[�n�w�����;4h �~/u������.HHH�m�/�"%OOOl���5B�j���FFF����N��,Y����x��a�e����yxx ,,����<y2�^�
ooo8p...pssCQQ��i�����|h���.�*��C��
6���4���&�O�^.�X�j�h��������%�5k&:t�Pl�������zpsQQ�����X�r�Vr��5K�=Z==a�1o��bq�j����"22R��YS���j%�_
���������2e����B�b�
��wo��el���P�R��������b�����/��J��4��UQ�^=�����$!��f�����@+��L�"�2&>>^��UK!������������������S_(���.z��-���D\\���{��u���u������.R
		vvv��L������]���o�������?O�<Q��2e�5jT�X]}���������L���K=���@?~\L�:U4j�H�r1�h.��r2s�L1|�p���&f��)k���@���
___��Aq��M���]�v"55U4o�\t����/GGG���,���/>��#���������b�����	&��?�\|��'b��
�SYm��s��5k�aaaZ�-�M�6����!����7�|#�\�"��	Y"B�IDATi#lll����h����s����������D��m�������}�����Z�=q�D1g��y�?��7>>>"55U>\agg':���B1f���k���e���]�v���B!����R���-SD��g���C�yEEE���O��SG����t�����X�b�z���P�O+=���>==]T�Z��~�i�&add$���u���\�R���j��������C����b����q��b��Z����+���E����CC�<y"j��Y���>sJr��!aff�>����D����r��I@�[�(..N��7O�^�Z�����#�"55U���{������AA�����'"""��{���b���B<�fjj*�O�.�.]*�W�.>��!�]F]�jU��~�����>>>���:uJ���H!�W�\�����sm�������{ZS�o�>Q�V�b��������

B���$�;&RSS��?44T888��O���~�Ixxx���E�	###�{�n!��o������6<z�H�.h�/,,~������:������p�B�y����n������M,[����E�	GGG�8C]��.\=z���7j�(����j�;p�����o��K���[<+����D�6m��~pp�h����xE]m�����o������������Q(9�>T:h-T�d-������0���.lN����b.����LNc�-6�!�,,�M.#� ����������*���s����}���_w/m^�������e������*c�eJO�<��(77W��m+=z�P�����@zz��<yR�|��y������T)((Pe�����2��n��g��;Vy��e��90 7��3�""2l�04h���{WY���U����T��������(quu����2�eggK���������������m�~���4��
h9�$OHBBB�k��RXX(�A���d����1j����������Z~��M��s��"����___	

�;w������_��$���uk��Gu������y�r�@��++�r�\�pA<<<T�-O���������A���t�s��%qrr����K�^����V6o�,������*��/����Q#����*���[����Q���'999�zqq����<U�O�<Y:v�(�'O{{{			�����������+�([�nU%vzz�8::JTT�s'��������������������m�6qqq��?�X|||���C��9#��/[[[IJJ*s����%""��q�Iq��K0`�>\j��]�����P���yRpu��A���3g�Hpp�,]�T.^�X����������S����Z�UE||�4i���dR,���rrr���[�����ok)%%E7n,�����o�����m[��t�����<��/�������f��=��{�
�5F��70p��
�7�?/�W�^�-Z�^�W��������*���4��H��}���MN�>�|���U��G�80 &j�Y�5���_�������W<<<���Y��'{�����"y���K@@�����e�������~� U��"�{����������S�N2j�(��t������+�������K�f��5e�y8**���l���#����	&Hddd������
����i�E��������K�:u���E���%==]��/j������������k�Nd������*�	��_��c)-Z�H���Kr��UY�r����Clll$((H�.]*����������)aaa����<]d|AAA2k�,����_V�\)���r��
U��y�f9y��=z�4o���i8c4'$$(�����(aaa�n�:y������Gi�7F��:0`�����>//O�^��j/YU��

�������Uu`��M��0��Rtt������[PP �w����G����4k�L��pze��"��}���/�(�>.32j�(�����[�n-��f����2}j������&�V�Q)�/���h���v���i��;0�V��)w�/..������#$%%��c*����CJ,{{�J��M��E/-]Dj(--��#G*��7N�j�k�4i�L�:U�seff���}���i`�X�/..��;��A�d��R�^=���O���D����ro�����LM��=z$���!��-��n���"22R���+���W�."������~(VVVX�Y�v��}qssSe�m���O������]�����40`��?o��{���i�y�7�����������[���N���?��t:������^$R���j�������R����������___�9z�^�s������Ahh(����|M��#))	'N�@�����C�]�+V�P%��'��CXXX(k���������Be��OOO����������Cvv6F��]�va��1���;��oOOO����U�V�������
g���N����3\]]�f����������tJ�����}�v��z��������u+���wwwxzz���G���P%���p��9����}��X�r%D����o�������_���{�������1c�`�������0S�}D�����Jw�7���R��e���;WN�:���������M�6���e��p�0F����������{S�i>++K���$99Yn��!'��mGGG�6m��	vz��e�*��E"���w�$������6l� ���e��S��B��������i]�]����e��)�`��{�2���e�����7O<==%22����]�&���r��5���NT
,�������o���R8 ��������tiI��y2}���,��e�����"3g��V�ZI�^�*<�X�7�.��
<~�X�������\O����jo� ��."�����+�]��899��^j��xzzJ�F�d���e����~~~�4n�X��5++K,--+���v�M�������$((H����G����������.��?b~MS����o�;;;:t��������D�����M$$$���>Rm`�`0�i�����C`` ����p�B�I�vRR�N�
777�i���_Gll,j��Q������_�>����5www��=%%%?~|�������#""��O$&&b��
���;�������)S��>0������P���Ci���������d��A�())All,7n�J|"M���#���������s����1=z$�'Oy��F��o�]�sss���W


����,����n�:2d����HZZ���ro�]��:0PRR"�/_6��l"��7����DDU "�{�.���L})�*--U�\���@��c��	�?>���[��tF�m0`n��|v����}��f��=�6l>|�]�v-�,�������F�)�999pss��S���eKUc>���o��&L��]�9�D��>]D������r�2�
}}}������Wc��y8�<���p����~Zp��������P)������>��3�j�
=z�@vv����aC��z$''W+�������s���O>���7��~���X�z5���xl��x���j�@d*,���*��m[$$$`��U���+���K����JD�}�v���K�sa��e�����m�6899���g�����#%%E���������G~~>����������r��5
�V�B���1k�,����V�Z�\�)�����5k���������kkk\�|�7��9sT;NN6l�sssDGG+��j�����e���i><<�/����!"h��RSS��s�r��bX�����EDT���|xyya��u��i��#G����C���A���aff��y
�z=V�\���@���+S��*,,DJJ
�������Jl<)�������������q��<��W@������CpqqQ->��E�NQ������{{{�cw��
��uS����[�F�6mp��!���QQQe��[�.�t��z���@ddd���.���7amm���b�������FXX�������J����w���^#/����������V111�={6���Sf��������5����^�I�&a���		A||��C���h��!���`cc�il"���""z��={V������������v�Z|��w����4�M��t��L1�	���HLL���.^��h��Xt�������wo���h>����R4l��6mB��u5�O�5>^$"z�edd�dJ��5����Xpi�E�Xti�E�Xti�E�Xti�E�Xti�E�Xti�E�Xti�E�Xt����;77�W�~}�����\�Xti�E=���-��Y�^ggg����G�F����K��?�w�y-[�D�6m�w�^����ptt��������/Z�c����3$&&��7���szz:�v�
www��� ::��� "�]D�\5j�@QQ<��?�C����3�����������8q"�\���[�����Gvv6����d�$%%!%%iii�]��e���gO\�z)))8{�,�������������^�W�^WWW�h�B��ruuEFF�_���7ob��!///���[8~�8�9��]��i��033�|��u�������HJJB�z��s�N����v~""5��"��S���������^�F
�������h�������5h�YYY������m�u���1���=z4�b�
��MD�6]DTmM�4Ann.�����
;;;���"//OY���T��F��sAAA�s?���5kb���HMM��c��|�r$''�����EU���<����k���3:u���G�"##�����i����t:\�x����2M��s��m�p��s����C�">>��ysX[[�)���^&,��������5k����#F�@\\lmm�����'�m�����B������c@�~�`ii	___2!!!(---snDDD���AAA���R��6mf���-Z����F���M�"�1�g���1D�N�Xti�E�Xti�E�Xti�E�Xti�E�Xti�E�Xti�E�Xti�E�Xti�E�Xti�E�Xti�E�j�4�E�����Q��f�IEND�B`�

#19

Álvaro Herrera

alvherre@kurilemu.de

about 2 months ago

In reply to: John Naylor (#12)

Re: tuple radix sort

On 2025-Nov-12, John Naylor wrote:

+/*
+ * Based on implementation in https://github.com/skarupke/ska_sort (Boost license),
+ * with the following noncosmetic change:
+ *  - count sorted partitions in every pass, rather than maintaining a
+ *    list of unsorted partitions
+ */
+static void
+radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state)

I think given https://www.boost.org/LICENSE_1_0.txt you should include a
copy of the Boost license in this comment, as well as the copyright
statement from the hpp file,

// Copyright Malte Skarupke 2016.
// Distributed under the Boost Software License, Version 1.0.
// (See http://www.boost.org/LICENSE_1_0.txt)

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#20

John Naylor

johncnaylorls@gmail.com

about 2 months ago

In reply to: Álvaro Herrera (#19)

Re: tuple radix sort

On Thu, Nov 20, 2025 at 6:13 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

I think given https://www.boost.org/LICENSE_1_0.txt you should include a
copy of the Boost license in this comment, as well as the copyright
statement from the hpp file,

Will do, next time I do some polishing.

(While thinking about it, I need to sprinkle in some
CHECK_FOR_INTERRUPTS(), too).

--
John Naylor
Amazon Web Services

#21

Chengpeng Yan

chengpeng_yan@Outlook.com

about 2 months ago

In reply to: John Naylor (#12)

Re: tuple radix sort

On Nov 12, 2025, at 21:57, John Naylor <johncnaylorls@gmail.com> wrote:

I decided I wasn't quite comfortable with the full normalized datum
sharing space in SortTuple with isnull1. There's too much of a
cognitive burden involved in deciding when we do or don't need to
reset isnull1, and there's a non-zero risk of difficult-to-detect
bugs. For v4 I've instead used one byte of padding space in SortTuple
to store only the byte used for the current pass. That means we must
compute the normalized datum on every pass. That's actually better
than it sounds, since that one byte can now be used directly during
the "deal" step, rather than having to extract the byte from the
normalized datum by shifting and masking. That extraction step might
add significant cycles in cases where a pass requires multiple
iterations through the "deal" loop. It doesn't seem to make much
difference in practice, performance-wise, even with the following
pessimization:

I had to scrap the qsort specialization on the normalized datum for
small sorts, since it's no longer stored. It could still be worth it
to compute the "next byte of the normalized datum" and perform a qsort
on that (falling back to the comparator function in the usual way),
but I haven't felt the need to resort to that yet. For v4, I just
divert to qsort_tuple in non-assert builds, with a threshold of 40.
[…]

I made an attempt at clean-up, but it's still under-commented. The
common prefix detection has moved to a separate patch (v4-0004).

I've been forcing all eligible sorts to use radix sort in assert
builds, even when small enough that qsort would be faster. Since both
qsort and in-place radix sort are unstable, it's expected that some
regression tests need adjustment (v4-0002). One thing surprised me,
however: The pg_upgrade TAP test that runs regression tests on the old
cluster showed additional failures that I can't explain. I haven't
seen this before, but it's possible I never ran TAP tests when testing
new sort algorithms previously. This doesn't happen if you change the
current insertion sort threshold, so I haven't been able to reproduce
it aside from this patch. For that reason I can't completely rule out
an actual bug, although I actually have more confidence in the
verification of correct sort order in v4, since isnull1 now never
changes, just as in master. I found that changing some tests to have
additional sort keys seems to fix it (v4-0003). I did this in a rather
quick and haphazard fashion. There's probably a longer conversation to
be had about making test output more deterministic while still
covering the intended executor paths.

Aside from that, this seems like a good place to settle down, so I'm
going to create a CF entry for this. I'll start more rigorous
performance testing in the near future.

Hi John,

I have reviewed the v4 patch set. I applied the patches and ran
"make check" on macOS 14.2.1 (M1), and all regression tests passed.

Overall, the implementation looks solid and the feature is a great
addition. I have a few suggestions regarding code optimization and one
discussion point regarding the data structure design.

Minor Comments / Optimizations:

1. Optimization in radix_sort_tuple (v4-0001)

In radix_sort_tuple, there is a check:

```
if (part.offset == part.next_offset)
```

Since "part" is a local copy of the struct, this check might not
reflect the latest state updated inside the loop. It might be slightly
more efficient to check the array directly:

```
if (partitions[idx].offset == partitions[idx].next_offset)
```

This allows us to detect if a partition has been fully consumed/sorted
within the current pass, potentially saving an iteration of the
"while (num_remaining > 1)" loop.

1. Branchless calculation in Common Prefix (v4-0004)

In sort_byvalue_datum, when calculating the common bits:

```
if (this_common_bits > common_upper_bits)
common_upper_bits = this_common_bits;
```

Since we are looking for the leftmost bit difference, we could
accumulate the differences using bitwise OR. This avoids a conditional
branch inside the loop:

```
common_upper_bits |= this_common_bits;
```

3. Short-circuit for identical keys (v4-0004)

When calculating common_prefix, if common_upper_bits is 0, it implies
that all non-null keys are identical (for the bits we care about). In
this case, we might be able to skip the radix sort entirely or handle
it as a single partition. Currently, the code handles it by passing
"common_upper_bits | 1" to pg_leftmost_one_pos64, which is safe but
perhaps not the most optimal path for identical keys.

-----

Discussion: SortTuple Layout and normalize_datum

I have a concern regarding the addition of "uint8 current_byte" to the
SortTuple struct.

1. Struct Size and Abstraction:

SortTuple is the generic atom for the entire sorting module. Adding a
field specific to integer Radix Sort feels like a bit of a leaky
abstraction. While it might fit into padding on some 64-bit platforms
(keeping the size at 32 bytes), relying on padding is fragile. If the
struct size increases, it reduces the number of tuples work_mem can
hold, affecting all sort operations, not just integers.

2. Cache Invalidation (Read vs. Write):

Currently, radix_sort_tuple writes to tup->current_byte in every pass.
This turns what could be a read-only access to the tuple array into a
read-write access, potentially dirtying cache lines and causing
unnecessary memory traffic.

Proposal: On-the-fly Normalization

We can avoid storing current_byte by calculating the relevant byte on
the fly. While this means re-calculating the byte extraction, we can
optimize normalize_datum.

The normalization logic (mapping signed integers to unsigned space)
essentially flips the sign bit. Instead of doing a full 64-bit
normalization for every byte extraction, we can apply this
transformation only when extracting the most significant byte (MSB).

The logic could look like this:

```
/*
* Extract the byte at 'level' from 'key'.
* level 0 denotes the Most Significant Byte.
*/
static inline uint8_t
extract_raw_byte(Datum key, int level)
{
return (key >> (((SIZEOF_DATUM - 1) - level) * 8)) & 0xFF;
}

/*
* Extract the "logically normalized" byte at 'level'.
*
* This effectively replaces the need to call normalize_datum() on
* the full key.
* - For non-MSB levels: return the raw byte.
* - For the MSB level: flip the sign bit if the comparator
* requires it.
* - If sorting DESC: invert the result.
*/
static inline uint8_t
radix_extract_byte(Datum orig, int level, SortSupport ssup)
{
uint8_t byte;

/* 1. Extract raw byte first */
byte = extract_raw_byte(orig, level);

/*
* 2. Apply normalization only to the Most Significant Byte.
*
* Mathematically, normalizing a signed integer for unsigned
* comparison is equivalent to flipping the sign bit (adding
* 2^(Width-1)).
* In the byte domain, this means XORing the MSB with 0x80.
*/
if (level == 0)
{
if (ssup->comparator == ssup_datum_signed_cmp ||
ssup->comparator == ssup_datum_int32_cmp)
{
byte ^= 0x80;
}
else
{
Assert(ssup->comparator == ssup_datum_unsigned_cmp);
}
}

/*
* 3. Handle Reverse Sort
* Instead of inverting the whole Datum, we just invert the
* current byte.
*/
if (ssup->ssup_reverse)
byte = ~byte;

return byte;
}
```

Benefits:

1. Keeps SortTuple minimal: No risk of struct bloat.
2. Read-only passes: The tuple array remains clean in cache during the
counting phase.
3. Reduced instruction count: We avoid the overhead of full
normalize_datum calls for lower bytes.

I'm happy to help verify this approach if you think it's worth
pursuing.

Best regards,
Chengpeng Yan

#22

John Naylor

johncnaylorls@gmail.com

about 2 months ago

In reply to: Chengpeng Yan (#21)

Re: tuple radix sort

On Sun, Nov 23, 2025 at 12:33 PM Chengpeng Yan
<chengpeng_yan@outlook.com> wrote:

On Nov 12, 2025, at 21:57, John Naylor <johncnaylorls@gmail.com> wrote:

I have reviewed the v4 patch set. I applied the patches and ran
"make check" on macOS 14.2.1 (M1), and all regression tests passed.

Overall, the implementation looks solid and the feature is a great
addition. I have a few suggestions regarding code optimization and one
discussion point regarding the data structure design.

Thanks for taking a look!

Minor Comments / Optimizations:

1. Optimization in radix_sort_tuple (v4-0001)

In radix_sort_tuple, there is a check:

```
if (part.offset == part.next_offset)
```

Since "part" is a local copy of the struct, this check might not
reflect the latest state updated inside the loop. It might be slightly
more efficient to check the array directly:

```
if (partitions[idx].offset == partitions[idx].next_offset)
```

This allows us to detect if a partition has been fully consumed/sorted
within the current pass, potentially saving an iteration of the
"while (num_remaining > 1)" loop.

I wondered about that and then forgot about it. Thanks for bringing
that up, I'll look into it.

1. Branchless calculation in Common Prefix (v4-0004)

In sort_byvalue_datum, when calculating the common bits:

```
if (this_common_bits > common_upper_bits)
common_upper_bits = this_common_bits;
```

Since we are looking for the leftmost bit difference, we could
accumulate the differences using bitwise OR. This avoids a conditional
branch inside the loop:

```
common_upper_bits |= this_common_bits;
```

Good idea.

3. Short-circuit for identical keys (v4-0004)

When calculating common_prefix, if common_upper_bits is 0, it implies
that all non-null keys are identical (for the bits we care about). In
this case, we might be able to skip the radix sort entirely or handle
it as a single partition. Currently, the code handles it by passing
"common_upper_bits | 1" to pg_leftmost_one_pos64, which is safe but
perhaps not the most optimal path for identical keys.

Right. If all values of the first sort key are the same, v4 still
wastes a bit of time counting the lowest byte. This shouldn't happen
for abbreviated keys since low cardinality will cause abbreviation to
abort, but it's still possible to hit that case if the first key is an
integer.

Discussion: SortTuple Layout and normalize_datum

I have a concern regarding the addition of "uint8 current_byte" to the
SortTuple struct.

1. Struct Size and Abstraction:

SortTuple is the generic atom for the entire sorting module. Adding a
field specific to integer Radix Sort feels like a bit of a leaky
abstraction.

I would say the only generic part of SortTuple is the "void* tuple"
pointer. Most of the rest of the fields are a cached copy of something
extracted from the actual tuple. The exception is "srctape", which is
only used for bookkeeping during external merges. I'm not sure I would
call srctape a leaky abstraction (I would call this a "fat struct"),
but either way we have some precedence for both cached info and for
info specific to one mode of sorting.

While it might fit into padding on some 64-bit platforms
(keeping the size at 32 bytes), relying on padding is fragile.

I don't see how any platform of any pointer size can fail to have a
padding byte since "isnull1" is a boolean. A long time ago we used to
allow platforms to have 32-bit booleans, but no longer, so I think
it's actually not fragile. (FYI: 24 bytes.)

2. Cache Invalidation (Read vs. Write):

Currently, radix_sort_tuple writes to tup->current_byte in every pass.
This turns what could be a read-only access to the tuple array into a
read-write access, potentially dirtying cache lines and causing
unnecessary memory traffic.

Proposal: On-the-fly Normalization

We can avoid storing current_byte by calculating the relevant byte on
the fly. While this means re-calculating the byte extraction, we can
optimize normalize_datum.

That's a interesting idea to consider.

There's one possible disadvantage I can think of: It makes it
difficult to add a fallback qsort specialization that includes the
next "current_byte". How would you teach the qsort comparator to look
at the right "level"?

(Aside: Long term, a cached current byte (or more than one byte) could
also come from something other than the pass-by-value datum. I'm
thinking specifically of a blob of bytes made from normalizing a set
of keys. In that case we won't want to fetch the byte every time. The
pass-by-value datum case can remain independent of that, of course.)

This made me think of something tangential: v4's common prefix
skipping doesn't take into account that the upper 4 bytes can't
matter. With a mix of positive and negative integers, I think it will
do the radix sort on all 8 bytes of "datum1".

Benefits:

1. Keeps SortTuple minimal: No risk of struct bloat.

As I mentioned, I don't believe there is any risk at all.

2. Read-only passes: The tuple array remains clean in cache during the
counting phase.

I think the "deal" step stresses the cache and TLB a lot more than
sequential writes would, but reducing overall write traffic is a good
design choice. We'd need to weigh that against the other things I
mentioned.

3. Reduced instruction count: We avoid the overhead of full
normalize_datum calls for lower bytes.

I'm not quite sure that's a net reduction, since every iteration of
the deal step has to at least re-extract the raw byte. An earlier
version stored the normalized datum and extracted the byte as needed,
but that also diverted to a qsort that compared the normalized datum.
There wasn't a big difference in my limited testing, but it could
matter for some inputs.

I'm happy to help verify this approach if you think it's worth
pursuing.

I suspect it won't make a big difference, but I could be wrong so feel
free. First let me update the patch with some of your other review
items.

--
John Naylor
Amazon Web Services

#23

John Naylor

johncnaylorls@gmail.com

about 2 months ago

In reply to: John Naylor (#22)

Re: tuple radix sort

On Wed, Nov 26, 2025 at 1:52 PM John Naylor <johncnaylorls@gmail.com> wrote:

This made me think of something tangential: v4's common prefix
skipping doesn't take into account that the upper 4 bytes can't
matter. With a mix of positive and negative integers, I think it will
do the radix sort on all 8 bytes of "datum1".

I accidentally edited out some context: The above is referring to SQL
"int" sort keys, i.e. 32-bit signed integers.

--
John Naylor
Amazon Web Services

#24

John Naylor

johncnaylorls@gmail.com

about 2 months ago

In reply to: Chengpeng Yan (#21)

4 attachment(s)

Re: tuple radix sort

On Thu, Nov 20, 2025 at 6:13 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

I think given https://www.boost.org/LICENSE_1_0.txt you should include a
copy of the Boost license in this comment, as well as the copyright
statement from the hpp file,

Done.

On Sun, Nov 23, 2025 at 12:33 PM Chengpeng Yan
<chengpeng_yan@outlook.com> wrote:

```
if (part.offset == part.next_offset)
```

Since "part" is a local copy of the struct, this check might not
reflect the latest state updated inside the loop. It might be slightly
more efficient to check the array directly:

```
if (partitions[idx].offset == partitions[idx].next_offset)
```

Done, and removed the local copy since it wasn't doing much else.

Since we are looking for the leftmost bit difference, we could
accumulate the differences using bitwise OR. This avoids a conditional
branch inside the loop:

```
common_upper_bits |= this_common_bits;
```

Done.

3. Short-circuit for identical keys (v4-0004)

When calculating common_prefix, if common_upper_bits is 0, it implies
that all non-null keys are identical (for the bits we care about). In
this case, we might be able to skip the radix sort entirely or handle
it as a single partition. Currently, the code handles it by passing
"common_upper_bits | 1" to pg_leftmost_one_pos64, which is safe but
perhaps not the most optimal path for identical keys.

Added a short-circuit.

For v5 I've also added CHECK_FOR_INTERRUPTS and rewrote some comments.

--
John Naylor
Amazon Web Services

Attachments:

v5-0001-Use-radix-sort-when-SortTuple-contains-a-pass-by-.patchapplication/x-patch; name=v5-0001-Use-radix-sort-when-SortTuple-contains-a-pass-by-.patchDownload

From b281ad2de789953ffd22d6e1187bb7598f264b39 Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Fri, 17 Oct 2025 09:57:43 +0700
Subject: [PATCH v5 1/4] Use radix sort when SortTuple contains a pass-by-value
 datum
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

For now this only works for signed and unsigned ints
with the usual comparison semantics, the same types
for which we previously had separate qsort
specializations.

Temporary GUC wip_radix_sort for testing

Reviewed-by: Chengpeng Yan <chengpeng_yan@outlook.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Tested-by: Chao Li <li.evan.chao@gmail.com> (earlier version)
---
 src/backend/utils/misc/guc_parameters.dat |   7 +
 src/backend/utils/sort/tuplesort.c        | 437 +++++++++++++++++++++-
 src/include/utils/guc.h                   |   1 +
 src/include/utils/tuplesort.h             |   1 +
 4 files changed, 427 insertions(+), 19 deletions(-)

diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 1128167c025..c9167eb4bb4 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -3469,6 +3469,13 @@
   max => 'INT_MAX',
 },
 
+{ name => 'wip_radix_sort', type => 'bool', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Test radix sort for debugging.',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'wip_radix_sort',
+  boot_val => 'true',
+},
+
 { name => 'work_mem', type => 'int', context => 'PGC_USERSET', group => 'RESOURCES_MEM',
   short_desc => 'Sets the maximum memory to be used for query workspaces.',
   long_desc => 'This much memory can be used by each internal sort operation and hash table before switching to temporary disk files.',
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 5d4411dc33f..028c5b71c27 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -122,6 +122,7 @@
 
 /* GUC variables */
 bool		trace_sort = false;
+bool		wip_radix_sort = true;	/* FIXME not for commit */
 
 #ifdef DEBUG_BOUNDED_SORT
 bool		optimize_bounded_sort = true;
@@ -615,6 +616,25 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 #define ST_DEFINE
 #include "lib/sort_template.h"
 
+
+#ifdef USE_ASSERT_CHECKING
+/* WIP: for now prefer test coverage of radix sort in Assert builds. */
+#define QSORT_THRESHOLD 0
+#else
+/* WIP: low because qsort_tuple() is slow -- we could raise this with a new specialization */
+#define QSORT_THRESHOLD 40
+#endif
+
+typedef struct RadixPartitionInfo
+{
+	union
+	{
+		size_t		count;
+		size_t		offset;
+	};
+	size_t		next_offset;
+}			RadixPartitionInfo;
+
 /*
  *		tuplesort_begin_xxx
  *
@@ -2663,10 +2683,373 @@ sort_bounded_heap(Tuplesortstate *state)
 	state->boundUsed = true;
 }
 
+static inline uint8_t
+extract_byte(Datum key, int level)
+{
+	return (key >> (((SIZEOF_DATUM - 1) - level) * 8)) & 0xFF;
+}
+
+/*
+ * Normalize datum to work with pure unsigned comparison,
+ * taking ASC/DESC into account as well.
+ */
+static inline Datum
+normalize_datum(Datum orig, SortSupport ssup)
+{
+	Datum		norm_datum1;
+
+	if (ssup->comparator == ssup_datum_signed_cmp)
+	{
+		norm_datum1 = orig + ((uint64) PG_INT64_MAX) + 1;
+	}
+	else if (ssup->comparator == ssup_datum_int32_cmp)
+	{
+		/*
+		 * First truncate to uint32. Technically, we don't need to do this,
+		 * but it forces the upper bytes to remain the same regardless of
+		 * sign.
+		 */
+		uint32		u32 = DatumGetUInt32(orig) + ((uint32) PG_INT32_MAX) + 1;
+
+		norm_datum1 = UInt32GetDatum(u32);
+	}
+	else
+	{
+		Assert(ssup->comparator == ssup_datum_unsigned_cmp);
+		norm_datum1 = orig;
+	}
+
+	if (ssup->ssup_reverse)
+		norm_datum1 = ~norm_datum1;
+
+	return norm_datum1;
+}
+
+/*
+ * radix_sort_tuple
+ *
+ * Radix sort by the pass-by-value datum in datum1. This is a modification of
+ * ska_byte_sort() from https://github.com/skarupke/ska_sort
+ * The original copyright notice follows:
+ *
+ * Copyright Malte Skarupke 2016.
+ * Distributed under the Boost Software License, Version 1.0.
+ *
+ * Boost Software License - Version 1.0 - August 17th, 2003
+ *
+ * Permission is hereby granted, free of charge, to any person or organization
+ * obtaining a copy of the software and accompanying documentation covered by
+ * this license (the "Software") to use, reproduce, display, distribute,
+ * execute, and transmit the Software, and to prepare derivative works of the
+ * Software, and to permit third-parties to whom the Software is furnished to
+ * do so, all subject to the following:
+ *
+ * The copyright notices in the Software and this entire statement, including
+ * the above license grant, this restriction and the following disclaimer,
+ * must be included in all copies of the Software, in whole or in part, and
+ * all derivative works of the Software, unless such copies or derivative
+ * works are solely in the form of machine-executable object code generated by
+ * a source language processor.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
+ * SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
+ * FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+static void
+radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state)
+{
+	RadixPartitionInfo partitions[256] = {0};
+	uint8_t		remaining_partitions[256] = {0};
+	size_t		total = 0;
+	int			num_partitions = 0;
+	int			num_remaining;
+	SortSupport ssup = &state->base.sortKeys[0];
+	size_t		start_offset = 0;
+	SortTuple  *partition_begin = begin;
+
+	/* count number of occurrences of each byte */
+	for (SortTuple *tup = begin; tup < begin + n_elems; tup++)
+	{
+		uint8		current_byte;
+
+		/* extract the byte for this level from the normalized datum */
+		current_byte = extract_byte(normalize_datum(tup->datum1, ssup),
+									level);
+
+		/* save it for the permutation step */
+		tup->current_byte = current_byte;
+
+		partitions[current_byte].count++;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	/* compute partition offsets */
+	for (int i = 0; i < 256; i++)
+	{
+		size_t		count = partitions[i].count;
+
+		if (count != 0)
+		{
+			partitions[i].offset = total;
+			total += count;
+			remaining_partitions[num_partitions] = i;
+			num_partitions++;
+		}
+		partitions[i].next_offset = total;
+	}
+
+	num_remaining = num_partitions;
+
+	/*
+	 * Swap tuples to correct partition.
+	 *
+	 * In traditional American flag sort, a swap sends the current element to
+	 * the correct partition, but the array pointer only advances if the
+	 * partner of the swap happens to bring an element that belongs to the
+	 * current partition. That only requires one pass through the array, but
+	 * the disadvantage is we don't know if the pointer can advance until the
+	 * swap completes. Here lies the most interesting innovation from the
+	 * upstream ska_byte_sort: After initiating the swap, we immediately
+	 * proceed to the next element. This makes better use of CPU pipelining,
+	 * but also means that we will often need multiple iterations of this
+	 * loop. ska_byte_sort() maintains a separate list of which partitions
+	 * haven't finished, which is updated every loop iteration. Here we simply
+	 * check each partition during every iteration.
+	 *
+	 * If we started with a single partition, there is nothing to do. If a
+	 * previous loop iteration results in only one partition that hasn't been
+	 * counted as sorted, we know it's actually sorted and can exit the loop.
+	 */
+	while (num_remaining > 1)
+	{
+		/* start the count over */
+		num_remaining = num_partitions;
+
+		for (int i = 0; i < num_partitions; i++)
+		{
+			uint8		idx = remaining_partitions[i];
+
+			for (SortTuple *st = begin + partitions[idx].offset;
+				 st < begin + partitions[idx].next_offset;
+				 st++)
+			{
+				size_t		offset = partitions[st->current_byte].offset++;
+				SortTuple	tmp;
+
+				/* swap current tuple with destination position */
+				Assert(offset < n_elems);
+				tmp = *st;
+				*st = begin[offset];
+				begin[offset] = tmp;
+
+				CHECK_FOR_INTERRUPTS();
+			};
+
+			/* count sorted partitions */
+			if (partitions[idx].offset == partitions[idx].next_offset)
+				num_remaining--;
+		}
+	}
+
+	/* recurse */
+	for (uint8_t *rp = remaining_partitions;
+		 rp < remaining_partitions + num_partitions;
+		 rp++)
+	{
+		size_t		end_offset = partitions[*rp].next_offset;
+		SortTuple  *partition_end = begin + end_offset;
+		ptrdiff_t	num_elements = end_offset - start_offset;
+
+		if (num_elements > 1)
+		{
+			if (level < SIZEOF_DATUM - 1)
+			{
+				if (num_elements < QSORT_THRESHOLD)
+				{
+					qsort_tuple(partition_begin,
+								num_elements,
+								state->base.comparetup,
+								state);
+				}
+				else
+				{
+					radix_sort_tuple(partition_begin,
+									 num_elements,
+									 level + 1,
+									 state);
+				}
+			}
+			else if (state->base.onlyKey == NULL)
+			{
+				/*
+				 * We've finished radix sort on all bytes of the pass-by-value
+				 * datum (possibly abbreviated), now qsort with the tiebreak
+				 * comparator.
+				 */
+				qsort_tuple(partition_begin,
+							num_elements,
+							state->base.comparetup_tiebreak,
+							state);
+			}
+		}
+
+		start_offset = end_offset;
+		partition_begin = partition_end;
+	}
+}
+
 /*
- * Sort all memtuples using specialized qsort() routines.
+ * Partition tuples by NULL and NOT NULL first sort key.
+ * Then dispatch to either radix sort or qsort.
+ */
+static void
+sort_byvalue_datum(Tuplesortstate *state)
+{
+	SortSupportData ssup = state->base.sortKeys[0];
+
+	bool		nulls_first = ssup.ssup_nulls_first;
+	SortTuple  *data = state->memtuples;
+	SortTuple  *null_start;
+	SortTuple  *not_null_start;
+	size_t		d1 = 0,
+				d2,
+				null_count,
+				not_null_count;
+
+	/*
+	 * First, partition by NULL-ness of the leading sort key, since we can
+	 * only radix sort on NOT NULL pass-by-value datums.
+	 */
+
+	/*
+	 * Find the first NOT NULL tuple if NULLS FIRST, or first NULL element if
+	 * NULLS LAST. This is a quick check for the common case where all tuples
+	 * are NOT NULL in the first sort key.
+	 */
+	while (d1 < state->memtupcount && data[d1].isnull1 == nulls_first)
+		d1++;
+
+	/*
+	 * If we have more than one tuple left after the quick check, partition
+	 * the remainder using branchless cyclic permutation, based on
+	 * https://orlp.net/blog/branchless-lomuto-partitioning/
+	 */
+	if (d1 < state->memtupcount - 1)
+	{
+		size_t		j = d1;
+		SortTuple	save = data[d1];	/* create gap at front */
+
+		/* WIP: more comments */
+		while (j < state->memtupcount - 1)
+		{
+			data[j] = data[d1];
+			j += 1;
+			data[d1] = data[j];
+			d1 += (data[d1].isnull1 == nulls_first);
+		}
+
+		data[j] = data[d1];
+		data[d1] = save;
+		d1 += (data[d1].isnull1 == nulls_first);
+	}
+
+	/* d1 is now the number of elements in the left partition */
+	d2 = state->memtupcount - d1;
+
+	/* set pointers and counts for each partition */
+	if (nulls_first)
+	{
+		null_start = state->memtuples;
+		null_count = d1;
+		not_null_start = state->memtuples + d1;
+		not_null_count = d2;
+	}
+	else
+	{
+		not_null_start = state->memtuples;
+		not_null_count = d1;
+		null_start = state->memtuples + d1;
+		null_count = d2;
+	}
+
+	for (SortTuple *tup = null_start;
+		 tup < null_start + null_count;
+		 tup++)
+		Assert(tup->isnull1 == true);
+	for (SortTuple *tup = not_null_start;
+		 tup < not_null_start + not_null_count;
+		 tup++)
+		Assert(tup->isnull1 == false);
+
+	/*
+	 * Sort the NULL partition using tiebreak comparator, if necessary. XXX
+	 * this will repeat the comparison on isnull1 for abbreviated keys.
+	 */
+	if (state->base.onlyKey == NULL && null_count > 1)
+	{
+		qsort_tuple(null_start,
+					null_count,
+					state->base.comparetup_tiebreak,
+					state);
+	}
+
+	/*
+	 * Sort the NOT NULL partition, using radix sort if large enough,
+	 * otherwise fall back to quicksort.
+	 */
+	if (not_null_count > 1)
+	{
+		if (not_null_count < QSORT_THRESHOLD)
+		{
+			/*
+			 * WIP: We could compute the common prefix, save the following
+			 * byte in current_byte, and use a new qsort specialization for
+			 * that. Same for the diversion to qsort while recursing during
+			 * radix sort.
+			 */
+			qsort_tuple(not_null_start,
+						not_null_count,
+						state->base.comparetup,
+						state);
+		}
+		else
+		{
+			radix_sort_tuple(not_null_start,
+							 not_null_count,
+							 0,
+							 state);
+		}
+	}
+}
+
+/* Verify sort using standard comparator. */
+static void
+verify_sorted_memtuples(Tuplesortstate *state)
+{
+#ifdef USE_ASSERT_CHECKING
+	for (SortTuple *tup = state->memtuples + 1;
+		 tup < state->memtuples + state->memtupcount;
+		 tup++)
+	{
+#if 0
+		Assert(COMPARETUP(state, tup - 1, tup) <= 0);
+#else
+		if (COMPARETUP(state, tup - 1, tup) > 0)
+			elog(ERROR, "SORT FAILED");
+#endif
+	}
+#endif
+}
+
+/*
+ * Sort all memtuples using specialized routines.
  *
- * Quicksort is used for small in-memory sorts, and external sort runs.
+ * Quicksort or radix sort is used for small in-memory sorts, and external sort runs.
  */
 static void
 tuplesort_sort_memtuples(Tuplesortstate *state)
@@ -2681,26 +3064,42 @@ tuplesort_sort_memtuples(Tuplesortstate *state)
 		 */
 		if (state->base.haveDatum1 && state->base.sortKeys)
 		{
-			if (state->base.sortKeys[0].comparator == ssup_datum_unsigned_cmp)
-			{
-				qsort_tuple_unsigned(state->memtuples,
-									 state->memtupcount,
-									 state);
-				return;
-			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_signed_cmp)
+			SortSupportData ssup = state->base.sortKeys[0];
+
+			if (wip_radix_sort)
 			{
-				qsort_tuple_signed(state->memtuples,
-								   state->memtupcount,
-								   state);
-				return;
+				if ((ssup.comparator == ssup_datum_unsigned_cmp ||
+					 ssup.comparator == ssup_datum_signed_cmp ||
+					 ssup.comparator == ssup_datum_int32_cmp))
+				{
+					sort_byvalue_datum(state);
+					verify_sorted_memtuples(state);
+					return;
+				}
 			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_int32_cmp)
+			else
 			{
-				qsort_tuple_int32(state->memtuples,
-								  state->memtupcount,
-								  state);
-				return;
+				if (state->base.sortKeys[0].comparator == ssup_datum_unsigned_cmp)
+				{
+					qsort_tuple_unsigned(state->memtuples,
+										 state->memtupcount,
+										 state);
+					return;
+				}
+				else if (state->base.sortKeys[0].comparator == ssup_datum_signed_cmp)
+				{
+					qsort_tuple_signed(state->memtuples,
+									   state->memtupcount,
+									   state);
+					return;
+				}
+				else if (state->base.sortKeys[0].comparator == ssup_datum_int32_cmp)
+				{
+					qsort_tuple_int32(state->memtuples,
+									  state->memtupcount,
+									  state);
+					return;
+				}
 			}
 		}
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index f21ec37da89..bc6f7fa60f3 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -324,6 +324,7 @@ extern PGDLLIMPORT int tcp_user_timeout;
 extern PGDLLIMPORT char *role_string;
 extern PGDLLIMPORT bool in_hot_standby_guc;
 extern PGDLLIMPORT bool trace_sort;
+extern PGDLLIMPORT bool wip_radix_sort;
 
 #ifdef DEBUG_BOUNDED_SORT
 extern PGDLLIMPORT bool optimize_bounded_sort;
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 0bf55902aa1..e40c6e52f81 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -150,6 +150,7 @@ typedef struct
 	void	   *tuple;			/* the tuple itself */
 	Datum		datum1;			/* value of first key column */
 	bool		isnull1;		/* is first key column NULL? */
+	uint8		current_byte;	/* chunk of datum1 conditioned for radix sort */
 	int			srctape;		/* source tape number */
 } SortTuple;
 
-- 
2.51.1

v5-0004-Detect-common-prefix-to-avoid-wasted-work-during-.patchapplication/x-patch; name=v5-0004-Detect-common-prefix-to-avoid-wasted-work-during-.patchDownload

From 28e7d66295e007c0a443b99e4f14c5a803c142e1 Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Wed, 12 Nov 2025 14:31:24 +0700
Subject: [PATCH v5 4/4] Detect common prefix to avoid wasted work during radix
 sort

This is particularly useful for integers, since they commonly
have some zero upper bytes.
---
 src/backend/utils/sort/tuplesort.c | 67 ++++++++++++++++++++++++++++--
 1 file changed, 63 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 028c5b71c27..27623bbf21e 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -104,6 +104,7 @@
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -3019,10 +3020,68 @@ sort_byvalue_datum(Tuplesortstate *state)
 		}
 		else
 		{
-			radix_sort_tuple(not_null_start,
-							 not_null_count,
-							 0,
-							 state);
+			int			common_prefix;
+			Datum		first_datum = 0;
+			Datum		common_upper_bits = 0;
+
+			/*
+			 * Compute the common prefix to skip unproductive recursion steps
+			 * during radix sort.
+			 */
+			for (SortTuple *tup = not_null_start;
+				 tup < not_null_start + not_null_count;
+				 tup++)
+			{
+				Datum		this_datum = tup->datum1;
+
+				if (tup == not_null_start)
+				{
+					/*
+					 * Need to start with some value, may as well be the first
+					 * one.
+					 */
+					first_datum = this_datum;
+				}
+				else
+				{
+					/*
+					 * Accumulate bits that represent a difference from the
+					 * reference datum.
+					 */
+					common_upper_bits |= first_datum ^ this_datum;
+				}
+			}
+
+			if (common_upper_bits == 0)
+			{
+				/*
+				 * All datums are the same, so we can skip radix sort.
+				 * Tiebreak with qsort if necessary.
+				 */
+				if (state->base.onlyKey == NULL)
+				{
+					qsort_tuple(not_null_start,
+								not_null_count,
+								state->base.comparetup_tiebreak,
+								state);
+				}
+			}
+			else
+			{
+				/*
+				 * The upper bits of common_upper_bits are zero where all
+				 * values have the same bits. The byte position of the
+				 * leftmost one bit is the byte where radix sort should start
+				 * bucketing.
+				 */
+				common_prefix = sizeof(Datum) - 1 -
+					(pg_leftmost_one_pos64(common_upper_bits) / BITS_PER_BYTE);
+
+				radix_sort_tuple(not_null_start,
+								 not_null_count,
+								 common_prefix,
+								 state);
+			}
 		}
 	}
 }
-- 
2.51.1

v5-0003-WIP-make-some-regression-tests-sort-order-more-de.patchapplication/x-patch; name=v5-0003-WIP-make-some-regression-tests-sort-order-more-de.patchDownload

From 960b81a62a1c11bf47b79dcc232824620c015fbc Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Wed, 12 Nov 2025 18:56:29 +0700
Subject: [PATCH v5 3/4] WIP make some regression tests' sort order more
 deterministic

The previous commit still results in failures in the TAP test
002_pg_upgrade.pl, namely that the regression tests fail on
the old cluster.

XXX it's not clear why only some tests fail this way
---
 src/test/regress/expected/tsrf.out   | 16 +++----
 src/test/regress/expected/window.out | 72 ++++++++++++++--------------
 src/test/regress/sql/tsrf.sql        |  2 +-
 src/test/regress/sql/window.sql      | 20 ++++----
 4 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/src/test/regress/expected/tsrf.out b/src/test/regress/expected/tsrf.out
index fd3914b0fad..f5647ee561c 100644
--- a/src/test/regress/expected/tsrf.out
+++ b/src/test/regress/expected/tsrf.out
@@ -397,26 +397,24 @@ SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(d
        |     | 2 |     3
 (24 rows)
 
-SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g) ORDER BY dataa;
+SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g) ORDER BY dataa, datab, g;
  dataa |  b  | g | count 
 -------+-----+---+-------
- a     | foo |   |     2
- a     |     |   |     4
- a     |     | 2 |     2
  a     | bar | 1 |     1
  a     | bar | 2 |     1
  a     | bar |   |     2
  a     | foo | 1 |     1
  a     | foo | 2 |     1
+ a     | foo |   |     2
  a     |     | 1 |     2
+ a     |     | 2 |     2
+ a     |     |   |     4
  b     | bar | 1 |     1
- b     |     |   |     2
- b     |     | 1 |     1
  b     | bar | 2 |     1
  b     | bar |   |     2
+ b     |     | 1 |     1
  b     |     | 2 |     1
-       |     | 2 |     3
-       |     |   |     6
+ b     |     |   |     2
        | bar | 1 |     2
        | bar | 2 |     2
        | bar |   |     4
@@ -424,6 +422,8 @@ SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(d
        | foo | 2 |     1
        | foo |   |     2
        |     | 1 |     3
+       |     | 2 |     3
+       |     |   |     6
 (24 rows)
 
 SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g) ORDER BY g;
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index b3cdeaea4b3..8a38417e721 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -18,13 +18,13 @@ INSERT INTO empsalary VALUES
 ('sales', 3, 4800, '2007-08-01'),
 ('develop', 8, 6000, '2006-10-01'),
 ('develop', 11, 5200, '2007-08-15');
-SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary;
+SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary, empno;
   depname  | empno | salary |  sum  
 -----------+-------+--------+-------
  develop   |     7 |   4200 | 25100
  develop   |     9 |   4500 | 25100
- develop   |    11 |   5200 | 25100
  develop   |    10 |   5200 | 25100
+ develop   |    11 |   5200 | 25100
  develop   |     8 |   6000 | 25100
  personnel |     5 |   3500 |  7400
  personnel |     2 |   3900 |  7400
@@ -33,18 +33,18 @@ SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM emps
  sales     |     1 |   5000 | 14600
 (10 rows)
 
-SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary) FROM empsalary;
+SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary, empno) FROM empsalary;
   depname  | empno | salary | rank 
 -----------+-------+--------+------
  develop   |     7 |   4200 |    1
  develop   |     9 |   4500 |    2
- develop   |    11 |   5200 |    3
  develop   |    10 |   5200 |    3
+ develop   |    11 |   5200 |    4
  develop   |     8 |   6000 |    5
  personnel |     5 |   3500 |    1
  personnel |     2 |   3900 |    2
  sales     |     3 |   4800 |    1
- sales     |     4 |   4800 |    1
+ sales     |     4 |   4800 |    2
  sales     |     1 |   5000 |    3
 (10 rows)
 
@@ -75,33 +75,33 @@ GROUP BY four, ten ORDER BY four, ten;
     3 |   9 | 7500 |     9.0000000000000000
 (20 rows)
 
-SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname);
+SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname) ORDER BY depname, empno;
   depname  | empno | salary |  sum  
 -----------+-------+--------+-------
- develop   |    11 |   5200 | 25100
  develop   |     7 |   4200 | 25100
- develop   |     9 |   4500 | 25100
  develop   |     8 |   6000 | 25100
+ develop   |     9 |   4500 | 25100
  develop   |    10 |   5200 | 25100
- personnel |     5 |   3500 |  7400
+ develop   |    11 |   5200 | 25100
  personnel |     2 |   3900 |  7400
- sales     |     3 |   4800 | 14600
+ personnel |     5 |   3500 |  7400
  sales     |     1 |   5000 | 14600
+ sales     |     3 |   4800 | 14600
  sales     |     4 |   4800 | 14600
 (10 rows)
 
-SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w;
+SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w, empno;
   depname  | empno | salary | rank 
 -----------+-------+--------+------
- develop   |     7 |   4200 |    1
- personnel |     5 |   3500 |    1
- sales     |     4 |   4800 |    1
  sales     |     3 |   4800 |    1
- develop   |     9 |   4500 |    2
+ sales     |     4 |   4800 |    1
+ personnel |     5 |   3500 |    1
+ develop   |     7 |   4200 |    1
  personnel |     2 |   3900 |    2
- develop   |    11 |   5200 |    3
- develop   |    10 |   5200 |    3
+ develop   |     9 |   4500 |    2
  sales     |     1 |   5000 |    3
+ develop   |    10 |   5200 |    3
+ develop   |    11 |   5200 |    3
  develop   |     8 |   6000 |    5
 (10 rows)
 
@@ -3754,7 +3754,7 @@ FROM empsalary;
 SELECT
     empno,
     depname,
-    row_number() OVER (PARTITION BY depname ORDER BY enroll_date) rn,
+    row_number() OVER (PARTITION BY depname ORDER BY enroll_date, empno) rn,
     rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS BETWEEN
                  UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk,
     count(*) OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN
@@ -3765,8 +3765,8 @@ FROM empsalary;
      8 | develop   |  1 |   1 |   1
     10 | develop   |  2 |   2 |   1
     11 | develop   |  3 |   3 |   1
-     9 | develop   |  4 |   4 |   2
-     7 | develop   |  5 |   4 |   2
+     7 | develop   |  4 |   4 |   2
+     9 | develop   |  5 |   4 |   2
      2 | personnel |  1 |   1 |   1
      5 | personnel |  2 |   2 |   1
      1 | sales     |  1 |   1 |   1
@@ -4202,7 +4202,7 @@ SELECT * FROM
 
 -- Ensure we correctly filter out all of the run conditions from each window
 SELECT * FROM
-  (SELECT *,
+  (SELECT depname,
           count(salary) OVER (PARTITION BY depname || '') c1, -- w1
           row_number() OVER (PARTITION BY depname) rn, -- w2
           count(*) OVER (PARTITION BY depname) c2, -- w2
@@ -4210,10 +4210,10 @@ SELECT * FROM
           ntile(2) OVER (PARTITION BY depname) nt -- w2
    FROM empsalary
 ) e WHERE rn <= 1 AND c1 <= 3 AND nt < 2;
-  depname  | empno | salary | enroll_date | c1 | rn | c2 | c3 | nt 
------------+-------+--------+-------------+----+----+----+----+----
- personnel |     5 |   3500 | 12-10-2007  |  2 |  1 |  2 |  2 |  1
- sales     |     3 |   4800 | 08-01-2007  |  3 |  1 |  3 |  3 |  1
+  depname  | c1 | rn | c2 | c3 | nt 
+-----------+----+----+----+----+----
+ personnel |  2 |  1 |  2 |  2 |  1
+ sales     |  3 |  1 |  3 |  3 |  1
 (2 rows)
 
 -- Ensure we remove references to reduced outer joins as nulling rels in run
@@ -4498,23 +4498,23 @@ SELECT * FROM
           empno,
           salary,
           enroll_date,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date) AS first_emp,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC) AS last_emp
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date, empno) AS first_emp,
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC, empno) AS last_emp
    FROM empsalary) emp
 WHERE first_emp = 1 OR last_emp = 1;
-                                                         QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------------
+                                                                 QUERY PLAN                                                                  
+---------------------------------------------------------------------------------------------------------------------------------------------
  Subquery Scan on emp
    Filter: ((emp.first_emp = 1) OR (emp.last_emp = 1))
    ->  WindowAgg
-         Window: w2 AS (PARTITION BY empsalary.depname ORDER BY empsalary.enroll_date ROWS UNBOUNDED PRECEDING)
+         Window: w2 AS (PARTITION BY empsalary.depname ORDER BY empsalary.enroll_date, empsalary.empno ROWS UNBOUNDED PRECEDING)
          ->  Incremental Sort
-               Sort Key: empsalary.depname, empsalary.enroll_date
+               Sort Key: empsalary.depname, empsalary.enroll_date, empsalary.empno
                Presorted Key: empsalary.depname
                ->  WindowAgg
-                     Window: w1 AS (PARTITION BY empsalary.depname ORDER BY empsalary.enroll_date ROWS UNBOUNDED PRECEDING)
+                     Window: w1 AS (PARTITION BY empsalary.depname ORDER BY empsalary.enroll_date, empsalary.empno ROWS UNBOUNDED PRECEDING)
                      ->  Sort
-                           Sort Key: empsalary.depname, empsalary.enroll_date DESC
+                           Sort Key: empsalary.depname, empsalary.enroll_date DESC, empsalary.empno
                            ->  Seq Scan on empsalary
 (12 rows)
 
@@ -4523,14 +4523,14 @@ SELECT * FROM
           empno,
           salary,
           enroll_date,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date) AS first_emp,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC) AS last_emp
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date, empno) AS first_emp,
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC, empno) AS last_emp
    FROM empsalary) emp
 WHERE first_emp = 1 OR last_emp = 1;
   depname  | empno | salary | enroll_date | first_emp | last_emp 
 -----------+-------+--------+-------------+-----------+----------
  develop   |     8 |   6000 | 10-01-2006  |         1 |        5
- develop   |     7 |   4200 | 01-01-2008  |         5 |        1
+ develop   |     7 |   4200 | 01-01-2008  |         4 |        1
  personnel |     2 |   3900 | 12-23-2006  |         1 |        2
  personnel |     5 |   3500 | 12-10-2007  |         2 |        1
  sales     |     1 |   5000 | 10-01-2006  |         1 |        3
diff --git a/src/test/regress/sql/tsrf.sql b/src/test/regress/sql/tsrf.sql
index 7c22529a0db..af7bd4bdd95 100644
--- a/src/test/regress/sql/tsrf.sql
+++ b/src/test/regress/sql/tsrf.sql
@@ -96,7 +96,7 @@ SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(d
 SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab) ORDER BY dataa;
 SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab) ORDER BY g;
 SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g);
-SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g) ORDER BY dataa;
+SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g) ORDER BY dataa, datab, g;
 SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g) ORDER BY g;
 reset enable_hashagg;
 
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index 37d837a2f66..cb28d552fe8 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -21,17 +21,17 @@ INSERT INTO empsalary VALUES
 ('develop', 8, 6000, '2006-10-01'),
 ('develop', 11, 5200, '2007-08-15');
 
-SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary;
+SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary, empno;
 
-SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary) FROM empsalary;
+SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary, empno) FROM empsalary;
 
 -- with GROUP BY
 SELECT four, ten, SUM(SUM(four)) OVER (PARTITION BY four), AVG(ten) FROM tenk1
 GROUP BY four, ten ORDER BY four, ten;
 
-SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname);
+SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname) ORDER BY depname, empno;
 
-SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w;
+SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w, empno;
 
 -- empty window specification
 SELECT COUNT(*) OVER () FROM tenk1 WHERE unique2 < 10;
@@ -1145,7 +1145,7 @@ FROM empsalary;
 SELECT
     empno,
     depname,
-    row_number() OVER (PARTITION BY depname ORDER BY enroll_date) rn,
+    row_number() OVER (PARTITION BY depname ORDER BY enroll_date, empno) rn,
     rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS BETWEEN
                  UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk,
     count(*) OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN
@@ -1366,7 +1366,7 @@ SELECT * FROM
 
 -- Ensure we correctly filter out all of the run conditions from each window
 SELECT * FROM
-  (SELECT *,
+  (SELECT depname,
           count(salary) OVER (PARTITION BY depname || '') c1, -- w1
           row_number() OVER (PARTITION BY depname) rn, -- w2
           count(*) OVER (PARTITION BY depname) c2, -- w2
@@ -1507,8 +1507,8 @@ SELECT * FROM
           empno,
           salary,
           enroll_date,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date) AS first_emp,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC) AS last_emp
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date, empno) AS first_emp,
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC, empno) AS last_emp
    FROM empsalary) emp
 WHERE first_emp = 1 OR last_emp = 1;
 
@@ -1517,8 +1517,8 @@ SELECT * FROM
           empno,
           salary,
           enroll_date,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date) AS first_emp,
-          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC) AS last_emp
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date, empno) AS first_emp,
+          row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC, empno) AS last_emp
    FROM empsalary) emp
 WHERE first_emp = 1 OR last_emp = 1;
 
-- 
2.51.1

v5-0002-WIP-Adjust-regression-tests.patchapplication/x-patch; name=v5-0002-WIP-Adjust-regression-tests.patchDownload

From 5337386f57e791ff1d1a94a7cdebe9cf73c490be Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Wed, 26 Nov 2025 18:23:59 +0700
Subject: [PATCH v5 2/4] WIP Adjust regression tests

Regression tests don't pass for underspecified queries; this
is expected since both qsort and in-place radix sort are
unstable sorts. For the query

SELECT a, b from mytable ORDER BY a;

...a stable sort would guarantee the relative position of 'b'
for each group of 'a', compared to the input.

This is separated out since the relative order changes with
the qsort threshold, same as it would for qsort and its
insertion sort threshold. Assert builds for now do radix
sort regardless of input size, if the data type allows it.

The final commit will have the same threshold for all builds.
---
 contrib/pg_stat_statements/expected/dml.out   |   6 +-
 .../postgres_fdw/expected/postgres_fdw.out    |   8 +-
 src/test/regress/expected/groupingsets.out    |   2 +-
 src/test/regress/expected/inet.out            |   4 +-
 src/test/regress/expected/join.out            |  20 +-
 src/test/regress/expected/plancache.out       |   8 +-
 src/test/regress/expected/sqljson.out         |   8 +-
 src/test/regress/expected/tsrf.out            |  38 +-
 src/test/regress/expected/tuplesort.out       |   6 +-
 src/test/regress/expected/window.out          | 500 +++++++++---------
 10 files changed, 300 insertions(+), 300 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/dml.out b/contrib/pg_stat_statements/expected/dml.out
index 347cb8699e4..aa6e91e1c7f 100644
--- a/contrib/pg_stat_statements/expected/dml.out
+++ b/contrib/pg_stat_statements/expected/dml.out
@@ -44,12 +44,12 @@ SELECT *
 SELECT * FROM pgss_dml_tab ORDER BY a;
  a |          b           
 ---+----------------------
- 1 | a                   
  1 | 111                 
- 2 | b                   
+ 1 | a                   
  2 | 222                 
- 3 | c                   
+ 2 | b                   
  3 | 333                 
+ 3 | c                   
  4 | 444                 
  5 | 555                 
  6 | 666                 
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index cd28126049d..95f8fd388e6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2626,15 +2626,15 @@ SELECT * FROM ft1, ft4, ft5, local_tbl WHERE ft1.c1 = ft4.c1 AND ft1.c1 = ft5.c1
  12 |  2 | 00012 | Tue Jan 13 00:00:00 1970 PST | Tue Jan 13 00:00:00 1970 | 2  | 2          | foo | 12 | 13 | AAA012 | 12 | 13 | AAA012 |  2 |  2 | 0002
  42 |  2 | 00042 | Thu Feb 12 00:00:00 1970 PST | Thu Feb 12 00:00:00 1970 | 2  | 2          | foo | 42 | 43 | AAA042 | 42 | 43 | AAA042 |  2 |  2 | 0002
  72 |  2 | 00072 | Sat Mar 14 00:00:00 1970 PST | Sat Mar 14 00:00:00 1970 | 2  | 2          | foo | 72 | 73 | AAA072 | 72 | 73 |        |  2 |  2 | 0002
- 24 |  4 | 00024 | Sun Jan 25 00:00:00 1970 PST | Sun Jan 25 00:00:00 1970 | 4  | 4          | foo | 24 | 25 | AAA024 | 24 | 25 | AAA024 |  4 |  4 | 0004
  54 |  4 | 00054 | Tue Feb 24 00:00:00 1970 PST | Tue Feb 24 00:00:00 1970 | 4  | 4          | foo | 54 | 55 | AAA054 | 54 | 55 |        |  4 |  4 | 0004
+ 24 |  4 | 00024 | Sun Jan 25 00:00:00 1970 PST | Sun Jan 25 00:00:00 1970 | 4  | 4          | foo | 24 | 25 | AAA024 | 24 | 25 | AAA024 |  4 |  4 | 0004
  84 |  4 | 00084 | Thu Mar 26 00:00:00 1970 PST | Thu Mar 26 00:00:00 1970 | 4  | 4          | foo | 84 | 85 | AAA084 | 84 | 85 | AAA084 |  4 |  4 | 0004
- 96 |  6 | 00096 | Tue Apr 07 00:00:00 1970 PST | Tue Apr 07 00:00:00 1970 | 6  | 6          | foo | 96 | 97 | AAA096 | 96 | 97 | AAA096 |  6 |  6 | 0006
+  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006 |  6 |  6 | 0006
  36 |  6 | 00036 | Fri Feb 06 00:00:00 1970 PST | Fri Feb 06 00:00:00 1970 | 6  | 6          | foo | 36 | 37 | AAA036 | 36 | 37 |        |  6 |  6 | 0006
  66 |  6 | 00066 | Sun Mar 08 00:00:00 1970 PST | Sun Mar 08 00:00:00 1970 | 6  | 6          | foo | 66 | 67 | AAA066 | 66 | 67 | AAA066 |  6 |  6 | 0006
-  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006 |  6 |  6 | 0006
- 48 |  8 | 00048 | Wed Feb 18 00:00:00 1970 PST | Wed Feb 18 00:00:00 1970 | 8  | 8          | foo | 48 | 49 | AAA048 | 48 | 49 | AAA048 |  8 |  8 | 0008
+ 96 |  6 | 00096 | Tue Apr 07 00:00:00 1970 PST | Tue Apr 07 00:00:00 1970 | 6  | 6          | foo | 96 | 97 | AAA096 | 96 | 97 | AAA096 |  6 |  6 | 0006
  18 |  8 | 00018 | Mon Jan 19 00:00:00 1970 PST | Mon Jan 19 00:00:00 1970 | 8  | 8          | foo | 18 | 19 | AAA018 | 18 | 19 |        |  8 |  8 | 0008
+ 48 |  8 | 00048 | Wed Feb 18 00:00:00 1970 PST | Wed Feb 18 00:00:00 1970 | 8  | 8          | foo | 48 | 49 | AAA048 | 48 | 49 | AAA048 |  8 |  8 | 0008
  78 |  8 | 00078 | Fri Mar 20 00:00:00 1970 PST | Fri Mar 20 00:00:00 1970 | 8  | 8          | foo | 78 | 79 | AAA078 | 78 | 79 | AAA078 |  8 |  8 | 0008
 (13 rows)
 
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 398cf6965e0..02abebdb0d7 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -94,8 +94,8 @@ select a, b, grouping(a,b), sum(v), count(*), max(v)
  1 |   |        1 |  60 |     5 |  14
  1 | 1 |        0 |  21 |     2 |  11
  2 |   |        1 |  15 |     1 |  15
- 3 |   |        1 |  33 |     2 |  17
  1 | 2 |        0 |  25 |     2 |  13
+ 3 |   |        1 |  33 |     2 |  17
  1 | 3 |        0 |  14 |     1 |  14
  4 |   |        1 |  37 |     2 |  19
  4 | 1 |        0 |  37 |     2 |  19
diff --git a/src/test/regress/expected/inet.out b/src/test/regress/expected/inet.out
index 1705bff4dd3..85a3a6a7de5 100644
--- a/src/test/regress/expected/inet.out
+++ b/src/test/regress/expected/inet.out
@@ -465,9 +465,9 @@ SELECT * FROM inet_tbl WHERE i < '192.168.1.0/24'::cidr ORDER BY i;
       c      |      i      
 -------------+-------------
  10.0.0.0/8  | 9.1.2.3/8
- 10.0.0.0/32 | 10.1.2.3/8
  10.0.0.0/8  | 10.1.2.3/8
  10.0.0.0/8  | 10.1.2.3/8
+ 10.0.0.0/32 | 10.1.2.3/8
  10.1.0.0/16 | 10.1.2.3/16
  10.1.2.0/24 | 10.1.2.3/24
  10.1.2.3/32 | 10.1.2.3
@@ -613,9 +613,9 @@ SELECT * FROM inet_tbl WHERE i < '192.168.1.0/24'::cidr ORDER BY i;
       c      |      i      
 -------------+-------------
  10.0.0.0/8  | 9.1.2.3/8
- 10.0.0.0/32 | 10.1.2.3/8
  10.0.0.0/8  | 10.1.2.3/8
  10.0.0.0/8  | 10.1.2.3/8
+ 10.0.0.0/32 | 10.1.2.3/8
  10.1.0.0/16 | 10.1.2.3/16
  10.1.2.0/24 | 10.1.2.3/24
  10.1.2.3/32 | 10.1.2.3
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 0e82ca1867a..9b6a9f536d3 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -220,8 +220,8 @@ SELECT t1.a, t2.e
 ---+----
  0 |   
  1 | -1
- 2 |  2
  2 |  4
+ 2 |  2
  3 | -3
  5 | -5
  5 | -5
@@ -1575,8 +1575,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -1589,8 +1589,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -1683,8 +1683,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -1696,8 +1696,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -1720,8 +1720,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -1736,8 +1736,8 @@ SELECT *
 ---+---+-------+---+----
  0 |   | zero  | 0 |   
  1 | 4 | one   | 1 | -1
- 2 | 3 | two   | 2 |  2
  2 | 3 | two   | 2 |  4
+ 2 | 3 | two   | 2 |  2
  3 | 2 | three | 3 | -3
  5 | 0 | five  | 5 | -5
  5 | 0 | five  | 5 | -5
@@ -1820,8 +1820,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -1835,8 +1835,8 @@ SELECT *
 ---+---+-------+----
  0 |   | zero  |   
  1 | 4 | one   | -1
- 2 | 3 | two   |  2
  2 | 3 | two   |  4
+ 2 | 3 | two   |  2
  3 | 2 | three | -3
  5 | 0 | five  | -5
  5 | 0 | five  | -5
@@ -2776,8 +2776,8 @@ select * from
 ---+---+-------+---+----
    |   |       |   |  0
    |   |       |   |   
-   | 0 | zero  |   |   
    |   | null  |   |   
+   | 0 | zero  |   |   
  8 | 8 | eight |   |   
  7 | 7 | seven |   |   
  6 | 6 | six   |   |   
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..41440c10cdd 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -38,8 +38,8 @@ EXECUTE prepstmt;
  4567890123456789 | -4567890123456789
  4567890123456789 |               123
               123 |               456
-              123 |  4567890123456789
  4567890123456789 |  4567890123456789
+              123 |  4567890123456789
 (5 rows)
 
 EXECUTE prepstmt2(123);
@@ -64,8 +64,8 @@ EXECUTE prepstmt;
  4567890123456789 | -4567890123456789
  4567890123456789 |               123
               123 |               456
-              123 |  4567890123456789
  4567890123456789 |  4567890123456789
+              123 |  4567890123456789
 (5 rows)
 
 EXECUTE prepstmt2(123);
@@ -86,8 +86,8 @@ EXECUTE vprep;
  4567890123456789 | -4567890123456789
  4567890123456789 |               123
               123 |               456
-              123 |  4567890123456789
  4567890123456789 |  4567890123456789
+              123 |  4567890123456789
 (5 rows)
 
 CREATE OR REPLACE TEMP VIEW pcacheview AS
@@ -98,8 +98,8 @@ EXECUTE vprep;
  4567890123456789 | -2283945061728394
  4567890123456789 |                61
               123 |               228
-              123 |  2283945061728394
  4567890123456789 |  2283945061728394
+              123 |  2283945061728394
 (5 rows)
 
 -- Check basic SPI plan invalidation
diff --git a/src/test/regress/expected/sqljson.out b/src/test/regress/expected/sqljson.out
index c7b9e575445..13fa4f2262d 100644
--- a/src/test/regress/expected/sqljson.out
+++ b/src/test/regress/expected/sqljson.out
@@ -870,10 +870,10 @@ FROM
    4 | [4, 4]
    4 | [4, 4]
    2 | [4, 4]
-   5 | [5, 3, 5]
-   3 | [5, 3, 5]
-   1 | [5, 3, 5]
-   5 | [5, 3, 5]
+   3 | [3, 5, 5]
+   1 | [3, 5, 5]
+   5 | [3, 5, 5]
+   5 | [3, 5, 5]
      | 
      | 
      | 
diff --git a/src/test/regress/expected/tsrf.out b/src/test/regress/expected/tsrf.out
index c4f7b187f5b..fd3914b0fad 100644
--- a/src/test/regress/expected/tsrf.out
+++ b/src/test/regress/expected/tsrf.out
@@ -354,18 +354,18 @@ SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(d
  a     | foo | 1 |     1
  a     |     | 1 |     2
  b     | bar | 1 |     1
- b     |     | 1 |     1
-       |     | 1 |     3
        | bar | 1 |     2
        | foo | 1 |     1
-       | foo | 2 |     1
+       |     | 1 |     3
+ b     |     | 1 |     1
  a     | bar | 2 |     1
- b     |     | 2 |     1
  a     | foo | 2 |     1
-       | bar | 2 |     2
  a     |     | 2 |     2
-       |     | 2 |     3
  b     | bar | 2 |     1
+       | bar | 2 |     2
+       | foo | 2 |     1
+ b     |     | 2 |     1
+       |     | 2 |     3
 (16 rows)
 
 SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(dataa, datab, g);
@@ -433,26 +433,26 @@ SELECT dataa, datab b, generate_series(1,2) g, count(*) FROM few GROUP BY CUBE(d
  a     | foo | 1 |     1
  b     | bar | 1 |     1
        | bar | 1 |     2
-       | foo | 1 |     1
+       |     | 1 |     3
  a     |     | 1 |     2
  b     |     | 1 |     1
-       |     | 1 |     3
+       | foo | 1 |     1
+ a     | bar | 2 |     1
+ a     | foo | 2 |     1
+ b     | bar | 2 |     1
+       | bar | 2 |     2
  a     |     | 2 |     2
  b     |     | 2 |     1
-       | bar | 2 |     2
        |     | 2 |     3
        | foo | 2 |     1
- a     | bar | 2 |     1
- a     | foo | 2 |     1
- b     | bar | 2 |     1
- a     |     |   |     4
- b     | bar |   |     2
- b     |     |   |     2
+ a     | bar |   |     2
+       | foo |   |     2
        |     |   |     6
  a     | foo |   |     2
- a     | bar |   |     2
+ a     |     |   |     4
        | bar |   |     4
-       | foo |   |     2
+ b     | bar |   |     2
+ b     |     |   |     2
 (24 rows)
 
 reset enable_hashagg;
@@ -600,8 +600,8 @@ FROM (VALUES (3, 2), (3,1), (1,1), (1,4), (5,3), (5,1)) AS t(a, b);
  a | b | g 
 ---+---+---
  3 | 2 | 1
- 5 | 1 | 2
- 3 | 1 | 3
+ 3 | 2 | 2
+ 3 | 2 | 3
 (3 rows)
 
 -- LIMIT / OFFSET is evaluated after SRF evaluation
diff --git a/src/test/regress/expected/tuplesort.out b/src/test/regress/expected/tuplesort.out
index 6dd97e7427a..fc1321bf443 100644
--- a/src/test/regress/expected/tuplesort.out
+++ b/src/test/regress/expected/tuplesort.out
@@ -304,9 +304,9 @@ FROM abbrev_abort_uuids
 ORDER BY ctid DESC LIMIT 5;
   id   |           abort_increasing           |           abort_decreasing           |          noabort_increasing          |          noabort_decreasing          
 -------+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------
-     0 |                                      |                                      |                                      | 
  20002 |                                      |                                      |                                      | 
  20003 |                                      |                                      |                                      | 
+     0 |                                      |                                      |                                      | 
  10009 | 00000000-0000-0000-0000-000000010008 | 00000000-0000-0000-0000-000000009992 | 00010008-0000-0000-0000-000000010008 | 00009992-0000-0000-0000-000000009992
  10008 | 00000000-0000-0000-0000-000000010007 | 00000000-0000-0000-0000-000000009993 | 00010007-0000-0000-0000-000000010007 | 00009993-0000-0000-0000-000000009993
 (5 rows)
@@ -335,9 +335,9 @@ FROM abbrev_abort_uuids
 ORDER BY ctid DESC LIMIT 5;
   id   |           abort_increasing           |           abort_decreasing           |          noabort_increasing          |          noabort_decreasing          
 -------+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------
-     0 |                                      |                                      |                                      | 
- 20003 |                                      |                                      |                                      | 
  20002 |                                      |                                      |                                      | 
+ 20003 |                                      |                                      |                                      | 
+     0 |                                      |                                      |                                      | 
   9993 | 00000000-0000-0000-0000-000000009992 | 00000000-0000-0000-0000-000000010008 | 00009992-0000-0000-0000-000000009992 | 00010008-0000-0000-0000-000000010008
   9994 | 00000000-0000-0000-0000-000000009993 | 00000000-0000-0000-0000-000000010007 | 00009993-0000-0000-0000-000000009993 | 00010007-0000-0000-0000-000000010007
 (5 rows)
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 9e2f53726f5..b3cdeaea4b3 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -95,13 +95,13 @@ SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITI
 -----------+-------+--------+------
  develop   |     7 |   4200 |    1
  personnel |     5 |   3500 |    1
- sales     |     3 |   4800 |    1
  sales     |     4 |   4800 |    1
- personnel |     2 |   3900 |    2
+ sales     |     3 |   4800 |    1
  develop   |     9 |   4500 |    2
- sales     |     1 |   5000 |    3
+ personnel |     2 |   3900 |    2
  develop   |    11 |   5200 |    3
  develop   |    10 |   5200 |    3
+ sales     |     1 |   5000 |    3
  develop   |     8 |   6000 |    5
 (10 rows)
 
@@ -394,12 +394,12 @@ SELECT first_value(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM te
 SELECT last_value(four) OVER (ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;
  last_value | ten | four 
 ------------+-----+------
-          0 |   0 |    0
-          0 |   0 |    2
-          0 |   0 |    0
-          1 |   1 |    1
+          2 |   0 |    0
+          2 |   0 |    0
+          2 |   0 |    2
           1 |   1 |    3
           1 |   1 |    1
+          1 |   1 |    1
           3 |   3 |    3
           0 |   4 |    0
           1 |   7 |    1
@@ -821,14 +821,14 @@ SELECT sum(unique1) over (order by four range between current row and unbounded
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  45 |       0 |    0
-  45 |       8 |    0
   45 |       4 |    0
-  33 |       5 |    1
-  33 |       9 |    1
+  45 |       8 |    0
+  45 |       0 |    0
   33 |       1 |    1
-  18 |       6 |    2
+  33 |       9 |    1
+  33 |       5 |    1
   18 |       2 |    2
+  18 |       6 |    2
   10 |       3 |    3
   10 |       7 |    3
 (10 rows)
@@ -940,14 +940,14 @@ SELECT first_value(unique1) over (ORDER BY four rows between current row and 2 f
 FROM tenk1 WHERE unique1 < 10;
  first_value | unique1 | four 
 -------------+---------+------
-           8 |       0 |    0
-           4 |       8 |    0
-           5 |       4 |    0
-           9 |       5 |    1
-           1 |       9 |    1
-           6 |       1 |    1
-           2 |       6 |    2
-           3 |       2 |    2
+           8 |       4 |    0
+           0 |       8 |    0
+           1 |       0 |    0
+           9 |       1 |    1
+           5 |       9 |    1
+           2 |       5 |    1
+           6 |       2 |    2
+           3 |       6 |    2
            7 |       3 |    3
              |       7 |    3
 (10 rows)
@@ -957,14 +957,14 @@ SELECT first_value(unique1) over (ORDER BY four rows between current row and 2 f
 FROM tenk1 WHERE unique1 < 10;
  first_value | unique1 | four 
 -------------+---------+------
-             |       0 |    0
-           5 |       8 |    0
-           5 |       4 |    0
-             |       5 |    1
-           6 |       9 |    1
-           6 |       1 |    1
-           3 |       6 |    2
+             |       4 |    0
+           1 |       8 |    0
+           1 |       0 |    0
+             |       1 |    1
+           2 |       9 |    1
+           2 |       5 |    1
            3 |       2 |    2
+           3 |       6 |    2
              |       3 |    3
              |       7 |    3
 (10 rows)
@@ -974,14 +974,14 @@ SELECT first_value(unique1) over (ORDER BY four rows between current row and 2 f
 FROM tenk1 WHERE unique1 < 10;
  first_value | unique1 | four 
 -------------+---------+------
-           0 |       0 |    0
-           8 |       8 |    0
            4 |       4 |    0
-           5 |       5 |    1
-           9 |       9 |    1
+           8 |       8 |    0
+           0 |       0 |    0
            1 |       1 |    1
-           6 |       6 |    2
+           9 |       9 |    1
+           5 |       5 |    1
            2 |       2 |    2
+           6 |       6 |    2
            3 |       3 |    3
            7 |       7 |    3
 (10 rows)
@@ -991,14 +991,14 @@ SELECT last_value(unique1) over (ORDER BY four rows between current row and 2 fo
 FROM tenk1 WHERE unique1 < 10;
  last_value | unique1 | four 
 ------------+---------+------
-          4 |       0 |    0
-          5 |       8 |    0
-          9 |       4 |    0
-          1 |       5 |    1
-          6 |       9 |    1
-          2 |       1 |    1
-          3 |       6 |    2
-          7 |       2 |    2
+          0 |       4 |    0
+          1 |       8 |    0
+          9 |       0 |    0
+          5 |       1 |    1
+          2 |       9 |    1
+          6 |       5 |    1
+          3 |       2 |    2
+          7 |       6 |    2
           7 |       3 |    3
             |       7 |    3
 (10 rows)
@@ -1008,14 +1008,14 @@ SELECT last_value(unique1) over (ORDER BY four rows between current row and 2 fo
 FROM tenk1 WHERE unique1 < 10;
  last_value | unique1 | four 
 ------------+---------+------
-            |       0 |    0
-          5 |       8 |    0
-          9 |       4 |    0
-            |       5 |    1
-          6 |       9 |    1
-          2 |       1 |    1
-          3 |       6 |    2
-          7 |       2 |    2
+            |       4 |    0
+          1 |       8 |    0
+          9 |       0 |    0
+            |       1 |    1
+          2 |       9 |    1
+          6 |       5 |    1
+          3 |       2 |    2
+          7 |       6 |    2
             |       3 |    3
             |       7 |    3
 (10 rows)
@@ -1025,14 +1025,14 @@ SELECT last_value(unique1) over (ORDER BY four rows between current row and 2 fo
 FROM tenk1 WHERE unique1 < 10;
  last_value | unique1 | four 
 ------------+---------+------
-          0 |       0 |    0
-          5 |       8 |    0
-          9 |       4 |    0
-          5 |       5 |    1
-          6 |       9 |    1
-          2 |       1 |    1
-          3 |       6 |    2
-          7 |       2 |    2
+          4 |       4 |    0
+          1 |       8 |    0
+          9 |       0 |    0
+          1 |       1 |    1
+          2 |       9 |    1
+          6 |       5 |    1
+          3 |       2 |    2
+          7 |       6 |    2
           3 |       3 |    3
           7 |       7 |    3
 (10 rows)
@@ -1093,14 +1093,14 @@ SELECT sum(unique1) over (w range between current row and unbounded following),
 FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four);
  sum | unique1 | four 
 -----+---------+------
-  45 |       0 |    0
-  45 |       8 |    0
   45 |       4 |    0
-  33 |       5 |    1
-  33 |       9 |    1
+  45 |       8 |    0
+  45 |       0 |    0
   33 |       1 |    1
-  18 |       6 |    2
+  33 |       9 |    1
+  33 |       5 |    1
   18 |       2 |    2
+  18 |       6 |    2
   10 |       3 |    3
   10 |       7 |    3
 (10 rows)
@@ -1110,14 +1110,14 @@ SELECT sum(unique1) over (w range between unbounded preceding and current row ex
 FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four);
  sum | unique1 | four 
 -----+---------+------
-  12 |       0 |    0
-   4 |       8 |    0
    8 |       4 |    0
-  22 |       5 |    1
-  18 |       9 |    1
+   4 |       8 |    0
+  12 |       0 |    0
   26 |       1 |    1
-  29 |       6 |    2
+  18 |       9 |    1
+  22 |       5 |    1
   33 |       2 |    2
+  29 |       6 |    2
   42 |       3 |    3
   38 |       7 |    3
 (10 rows)
@@ -1127,14 +1127,14 @@ SELECT sum(unique1) over (w range between unbounded preceding and current row ex
 FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four);
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   35 |       3 |    3
   35 |       7 |    3
 (10 rows)
@@ -1144,14 +1144,14 @@ SELECT sum(unique1) over (w range between unbounded preceding and current row ex
 FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four);
  sum | unique1 | four 
 -----+---------+------
-   0 |       0 |    0
-   8 |       8 |    0
    4 |       4 |    0
-  17 |       5 |    1
-  21 |       9 |    1
+   8 |       8 |    0
+   0 |       0 |    0
   13 |       1 |    1
-  33 |       6 |    2
+  21 |       9 |    1
+  17 |       5 |    1
   29 |       2 |    2
+  33 |       6 |    2
   38 |       3 |    3
   42 |       7 |    3
 (10 rows)
@@ -1163,14 +1163,14 @@ FROM tenk1 WHERE unique1 < 10
 WINDOW w AS (order by four range between current row and unbounded following);
  first_value | nth_2 | last_value | unique1 | four 
 -------------+-------+------------+---------+------
-           0 |     8 |          7 |       0 |    0
-           0 |     8 |          7 |       8 |    0
-           0 |     8 |          7 |       4 |    0
-           5 |     9 |          7 |       5 |    1
-           5 |     9 |          7 |       9 |    1
-           5 |     9 |          7 |       1 |    1
-           6 |     2 |          7 |       6 |    2
-           6 |     2 |          7 |       2 |    2
+           4 |     8 |          7 |       4 |    0
+           4 |     8 |          7 |       8 |    0
+           4 |     8 |          7 |       0 |    0
+           1 |     9 |          7 |       1 |    1
+           1 |     9 |          7 |       9 |    1
+           1 |     9 |          7 |       5 |    1
+           2 |     6 |          7 |       2 |    2
+           2 |     6 |          7 |       6 |    2
            3 |     7 |          7 |       3 |    3
            3 |     7 |          7 |       7 |    3
 (10 rows)
@@ -1367,14 +1367,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -1386,14 +1386,14 @@ FROM tenk1 WHERE unique1 < 10;
 -----+---------+------
      |       3 |    3
      |       7 |    3
-  10 |       6 |    2
   10 |       2 |    2
+  10 |       6 |    2
   18 |       9 |    1
   18 |       5 |    1
   18 |       1 |    1
-  23 |       0 |    0
-  23 |       8 |    0
   23 |       4 |    0
+  23 |       8 |    0
+  23 |       0 |    0
 (10 rows)
 
 SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::int2 preceding exclude no others),
@@ -1401,14 +1401,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -1418,14 +1418,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -1435,14 +1435,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -1452,14 +1452,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -1469,14 +1469,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 6::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  33 |       0 |    0
-  41 |       8 |    0
   37 |       4 |    0
-  35 |       5 |    1
-  39 |       9 |    1
+  41 |       8 |    0
+  33 |       0 |    0
   31 |       1 |    1
-  43 |       6 |    2
+  39 |       9 |    1
+  35 |       5 |    1
   39 |       2 |    2
+  43 |       6 |    2
   26 |       3 |    3
   30 |       7 |    3
 (10 rows)
@@ -1486,14 +1486,14 @@ SELECT sum(unique1) over (order by four range between 2::int8 preceding and 6::i
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  33 |       0 |    0
-  33 |       8 |    0
   33 |       4 |    0
-  30 |       5 |    1
-  30 |       9 |    1
+  33 |       8 |    0
+  33 |       0 |    0
   30 |       1 |    1
-  37 |       6 |    2
+  30 |       9 |    1
+  30 |       5 |    1
   37 |       2 |    2
+  37 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -1539,13 +1539,13 @@ select sum(salary) over (order by enroll_date range between '1 year'::interval p
  34900 |   5000 | 10-01-2006
  34900 |   6000 | 10-01-2006
  38400 |   3900 | 12-23-2006
- 47100 |   4800 | 08-01-2007
  47100 |   5200 | 08-01-2007
+ 47100 |   4800 | 08-01-2007
  47100 |   4800 | 08-08-2007
  47100 |   5200 | 08-15-2007
  36100 |   3500 | 12-10-2007
- 32200 |   4500 | 01-01-2008
  32200 |   4200 | 01-01-2008
+ 32200 |   4500 | 01-01-2008
 (10 rows)
 
 select sum(salary) over (order by enroll_date desc range between '1 year'::interval preceding and '1 year'::interval following),
@@ -1557,8 +1557,8 @@ select sum(salary) over (order by enroll_date desc range between '1 year'::inter
  36100 |   3500 | 12-10-2007
  47100 |   5200 | 08-15-2007
  47100 |   4800 | 08-08-2007
- 47100 |   4800 | 08-01-2007
  47100 |   5200 | 08-01-2007
+ 47100 |   4800 | 08-01-2007
  38400 |   3900 | 12-23-2006
  34900 |   5000 | 10-01-2006
  34900 |   6000 | 10-01-2006
@@ -1573,8 +1573,8 @@ select sum(salary) over (order by enroll_date desc range between '1 year'::inter
      |   3500 | 12-10-2007
      |   5200 | 08-15-2007
      |   4800 | 08-08-2007
-     |   4800 | 08-01-2007
      |   5200 | 08-01-2007
+     |   4800 | 08-01-2007
      |   3900 | 12-23-2006
      |   5000 | 10-01-2006
      |   6000 | 10-01-2006
@@ -1587,13 +1587,13 @@ select sum(salary) over (order by enroll_date range between '1 year'::interval p
  29900 |   5000 | 10-01-2006
  28900 |   6000 | 10-01-2006
  34500 |   3900 | 12-23-2006
- 42300 |   4800 | 08-01-2007
  41900 |   5200 | 08-01-2007
+ 42300 |   4800 | 08-01-2007
  42300 |   4800 | 08-08-2007
  41900 |   5200 | 08-15-2007
  32600 |   3500 | 12-10-2007
- 27700 |   4500 | 01-01-2008
  28000 |   4200 | 01-01-2008
+ 27700 |   4500 | 01-01-2008
 (10 rows)
 
 select sum(salary) over (order by enroll_date range between '1 year'::interval preceding and '1 year'::interval following
@@ -1603,13 +1603,13 @@ select sum(salary) over (order by enroll_date range between '1 year'::interval p
  23900 |   5000 | 10-01-2006
  23900 |   6000 | 10-01-2006
  34500 |   3900 | 12-23-2006
- 37100 |   4800 | 08-01-2007
  37100 |   5200 | 08-01-2007
+ 37100 |   4800 | 08-01-2007
  42300 |   4800 | 08-08-2007
  41900 |   5200 | 08-15-2007
  32600 |   3500 | 12-10-2007
- 23500 |   4500 | 01-01-2008
  23500 |   4200 | 01-01-2008
+ 23500 |   4500 | 01-01-2008
 (10 rows)
 
 select sum(salary) over (order by enroll_date range between '1 year'::interval preceding and '1 year'::interval following
@@ -1619,13 +1619,13 @@ select sum(salary) over (order by enroll_date range between '1 year'::interval p
  28900 |   5000 | 10-01-2006
  29900 |   6000 | 10-01-2006
  38400 |   3900 | 12-23-2006
- 41900 |   4800 | 08-01-2007
  42300 |   5200 | 08-01-2007
+ 41900 |   4800 | 08-01-2007
  47100 |   4800 | 08-08-2007
  47100 |   5200 | 08-15-2007
  36100 |   3500 | 12-10-2007
- 28000 |   4500 | 01-01-2008
  27700 |   4200 | 01-01-2008
+ 28000 |   4500 | 01-01-2008
 (10 rows)
 
 select first_value(salary) over(order by salary range between 1000 preceding and 1000 following),
@@ -1710,13 +1710,13 @@ select first_value(salary) over(order by enroll_date range between unbounded pre
         5000 |       5200 |   5000 | 10-01-2006
         6000 |       5200 |   6000 | 10-01-2006
         5000 |       3500 |   3900 | 12-23-2006
-        5000 |       4200 |   4800 | 08-01-2007
-        5000 |       4200 |   5200 | 08-01-2007
-        5000 |       4200 |   4800 | 08-08-2007
-        5000 |       4200 |   5200 | 08-15-2007
-        5000 |       4200 |   3500 | 12-10-2007
-        5000 |       4200 |   4500 | 01-01-2008
-        5000 |       4200 |   4200 | 01-01-2008
+        5000 |       4500 |   5200 | 08-01-2007
+        5000 |       4500 |   4800 | 08-01-2007
+        5000 |       4500 |   4800 | 08-08-2007
+        5000 |       4500 |   5200 | 08-15-2007
+        5000 |       4500 |   3500 | 12-10-2007
+        5000 |       4500 |   4200 | 01-01-2008
+        5000 |       4500 |   4500 | 01-01-2008
 (10 rows)
 
 select first_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following
@@ -1729,13 +1729,13 @@ select first_value(salary) over(order by enroll_date range between unbounded pre
         5000 |       5200 |   5000 | 10-01-2006
         6000 |       5200 |   6000 | 10-01-2006
         5000 |       3500 |   3900 | 12-23-2006
-        5000 |       4200 |   4800 | 08-01-2007
-        5000 |       4200 |   5200 | 08-01-2007
-        5000 |       4200 |   4800 | 08-08-2007
-        5000 |       4200 |   5200 | 08-15-2007
-        5000 |       4200 |   3500 | 12-10-2007
-        5000 |       4500 |   4500 | 01-01-2008
+        5000 |       4500 |   5200 | 08-01-2007
+        5000 |       4500 |   4800 | 08-01-2007
+        5000 |       4500 |   4800 | 08-08-2007
+        5000 |       4500 |   5200 | 08-15-2007
+        5000 |       4500 |   3500 | 12-10-2007
         5000 |       4200 |   4200 | 01-01-2008
+        5000 |       4500 |   4500 | 01-01-2008
 (10 rows)
 
 select first_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following
@@ -1748,13 +1748,13 @@ select first_value(salary) over(order by enroll_date range between unbounded pre
         3900 |       5200 |   5000 | 10-01-2006
         3900 |       5200 |   6000 | 10-01-2006
         5000 |       3500 |   3900 | 12-23-2006
-        5000 |       4200 |   4800 | 08-01-2007
-        5000 |       4200 |   5200 | 08-01-2007
-        5000 |       4200 |   4800 | 08-08-2007
-        5000 |       4200 |   5200 | 08-15-2007
-        5000 |       4200 |   3500 | 12-10-2007
-        5000 |       3500 |   4500 | 01-01-2008
+        5000 |       4500 |   5200 | 08-01-2007
+        5000 |       4500 |   4800 | 08-01-2007
+        5000 |       4500 |   4800 | 08-08-2007
+        5000 |       4500 |   5200 | 08-15-2007
+        5000 |       4500 |   3500 | 12-10-2007
         5000 |       3500 |   4200 | 01-01-2008
+        5000 |       3500 |   4500 | 01-01-2008
 (10 rows)
 
 select first_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following
@@ -1767,13 +1767,13 @@ select first_value(salary) over(order by enroll_date range between unbounded pre
         6000 |       5200 |   5000 | 10-01-2006
         5000 |       5200 |   6000 | 10-01-2006
         5000 |       3500 |   3900 | 12-23-2006
-        5000 |       4200 |   4800 | 08-01-2007
-        5000 |       4200 |   5200 | 08-01-2007
-        5000 |       4200 |   4800 | 08-08-2007
-        5000 |       4200 |   5200 | 08-15-2007
-        5000 |       4200 |   3500 | 12-10-2007
-        5000 |       4200 |   4500 | 01-01-2008
+        5000 |       4500 |   5200 | 08-01-2007
+        5000 |       4500 |   4800 | 08-01-2007
+        5000 |       4500 |   4800 | 08-08-2007
+        5000 |       4500 |   5200 | 08-15-2007
+        5000 |       4500 |   3500 | 12-10-2007
         5000 |       4500 |   4200 | 01-01-2008
+        5000 |       4200 |   4500 | 01-01-2008
 (10 rows)
 
 -- RANGE offset PRECEDING/FOLLOWING with null values
@@ -1828,8 +1828,8 @@ window w as
   (order by x desc nulls first range between 2 preceding and 2 following);
  x | y  | first_value | last_value 
 ---+----+-------------+------------
-   | 43 |          43 |         42
-   | 42 |          43 |         42
+   | 42 |          42 |         43
+   | 43 |          42 |         43
  5 |  5 |           5 |          3
  4 |  4 |           5 |          2
  3 |  3 |           5 |          1
@@ -2751,10 +2751,10 @@ window w as (order by f_timestamptz desc range between
   7 | Wed Oct 19 02:23:54 2005 PDT |           8 |          6
   6 | Tue Oct 19 02:23:54 2004 PDT |           7 |          5
   5 | Sun Oct 19 02:23:54 2003 PDT |           6 |          4
-  4 | Sat Oct 19 02:23:54 2002 PDT |           5 |          2
-  3 | Fri Oct 19 02:23:54 2001 PDT |           4 |          1
+  4 | Sat Oct 19 02:23:54 2002 PDT |           5 |          3
   2 | Fri Oct 19 02:23:54 2001 PDT |           4 |          1
-  1 | Thu Oct 19 02:23:54 2000 PDT |           3 |          1
+  3 | Fri Oct 19 02:23:54 2001 PDT |           4 |          1
+  1 | Thu Oct 19 02:23:54 2000 PDT |           2 |          1
   0 | -infinity                    |           0 |          0
 (12 rows)
 
@@ -2862,10 +2862,10 @@ window w as (order by f_timestamp desc range between
   7 | Wed Oct 19 10:23:54 2005 |           8 |          6
   6 | Tue Oct 19 10:23:54 2004 |           7 |          5
   5 | Sun Oct 19 10:23:54 2003 |           6 |          4
-  4 | Sat Oct 19 10:23:54 2002 |           5 |          2
-  3 | Fri Oct 19 10:23:54 2001 |           4 |          1
+  4 | Sat Oct 19 10:23:54 2002 |           5 |          3
   2 | Fri Oct 19 10:23:54 2001 |           4 |          1
-  1 | Thu Oct 19 10:23:54 2000 |           3 |          1
+  3 | Fri Oct 19 10:23:54 2001 |           4 |          1
+  1 | Thu Oct 19 10:23:54 2000 |           2 |          1
   0 | -infinity                |           0 |          0
 (12 rows)
 
@@ -2983,14 +2983,14 @@ SELECT sum(unique1) over (order by four groups between unbounded preceding and c
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  12 |       0 |    0
-  12 |       8 |    0
   12 |       4 |    0
-  27 |       5 |    1
-  27 |       9 |    1
+  12 |       8 |    0
+  12 |       0 |    0
   27 |       1 |    1
-  35 |       6 |    2
+  27 |       9 |    1
+  27 |       5 |    1
   35 |       2 |    2
+  35 |       6 |    2
   45 |       3 |    3
   45 |       7 |    3
 (10 rows)
@@ -3000,14 +3000,14 @@ SELECT sum(unique1) over (order by four groups between unbounded preceding and u
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  45 |       0 |    0
-  45 |       8 |    0
   45 |       4 |    0
-  45 |       5 |    1
-  45 |       9 |    1
+  45 |       8 |    0
+  45 |       0 |    0
   45 |       1 |    1
-  45 |       6 |    2
+  45 |       9 |    1
+  45 |       5 |    1
   45 |       2 |    2
+  45 |       6 |    2
   45 |       3 |    3
   45 |       7 |    3
 (10 rows)
@@ -3017,14 +3017,14 @@ SELECT sum(unique1) over (order by four groups between current row and unbounded
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  45 |       0 |    0
-  45 |       8 |    0
   45 |       4 |    0
-  33 |       5 |    1
-  33 |       9 |    1
+  45 |       8 |    0
+  45 |       0 |    0
   33 |       1 |    1
-  18 |       6 |    2
+  33 |       9 |    1
+  33 |       5 |    1
   18 |       2 |    2
+  18 |       6 |    2
   10 |       3 |    3
   10 |       7 |    3
 (10 rows)
@@ -3034,14 +3034,14 @@ SELECT sum(unique1) over (order by four groups between 1 preceding and unbounded
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  45 |       0 |    0
-  45 |       8 |    0
   45 |       4 |    0
-  45 |       5 |    1
-  45 |       9 |    1
+  45 |       8 |    0
+  45 |       0 |    0
   45 |       1 |    1
-  33 |       6 |    2
+  45 |       9 |    1
+  45 |       5 |    1
   33 |       2 |    2
+  33 |       6 |    2
   18 |       3 |    3
   18 |       7 |    3
 (10 rows)
@@ -3051,14 +3051,14 @@ SELECT sum(unique1) over (order by four groups between 1 following and unbounded
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  33 |       0 |    0
-  33 |       8 |    0
   33 |       4 |    0
-  18 |       5 |    1
-  18 |       9 |    1
+  33 |       8 |    0
+  33 |       0 |    0
   18 |       1 |    1
-  10 |       6 |    2
+  18 |       9 |    1
+  18 |       5 |    1
   10 |       2 |    2
+  10 |       6 |    2
      |       3 |    3
      |       7 |    3
 (10 rows)
@@ -3068,14 +3068,14 @@ SELECT sum(unique1) over (order by four groups between unbounded preceding and 2
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  35 |       0 |    0
-  35 |       8 |    0
   35 |       4 |    0
-  45 |       5 |    1
-  45 |       9 |    1
+  35 |       8 |    0
+  35 |       0 |    0
   45 |       1 |    1
-  45 |       6 |    2
+  45 |       9 |    1
+  45 |       5 |    1
   45 |       2 |    2
+  45 |       6 |    2
   45 |       3 |    3
   45 |       7 |    3
 (10 rows)
@@ -3085,14 +3085,14 @@ SELECT sum(unique1) over (order by four groups between 2 preceding and 1 precedi
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-     |       0 |    0
-     |       8 |    0
      |       4 |    0
-  12 |       5 |    1
-  12 |       9 |    1
+     |       8 |    0
+     |       0 |    0
   12 |       1 |    1
-  27 |       6 |    2
+  12 |       9 |    1
+  12 |       5 |    1
   27 |       2 |    2
+  27 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -3102,14 +3102,14 @@ SELECT sum(unique1) over (order by four groups between 2 preceding and 1 followi
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  27 |       0 |    0
-  27 |       8 |    0
   27 |       4 |    0
-  35 |       5 |    1
-  35 |       9 |    1
+  27 |       8 |    0
+  27 |       0 |    0
   35 |       1 |    1
-  45 |       6 |    2
+  35 |       9 |    1
+  35 |       5 |    1
   45 |       2 |    2
+  45 |       6 |    2
   33 |       3 |    3
   33 |       7 |    3
 (10 rows)
@@ -3119,14 +3119,14 @@ SELECT sum(unique1) over (order by four groups between 0 preceding and 0 followi
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  12 |       0 |    0
-  12 |       8 |    0
   12 |       4 |    0
-  15 |       5 |    1
-  15 |       9 |    1
+  12 |       8 |    0
+  12 |       0 |    0
   15 |       1 |    1
-   8 |       6 |    2
+  15 |       9 |    1
+  15 |       5 |    1
    8 |       2 |    2
+   8 |       6 |    2
   10 |       3 |    3
   10 |       7 |    3
 (10 rows)
@@ -3136,14 +3136,14 @@ SELECT sum(unique1) over (order by four groups between 2 preceding and 1 followi
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  27 |       0 |    0
-  19 |       8 |    0
   23 |       4 |    0
-  30 |       5 |    1
-  26 |       9 |    1
+  19 |       8 |    0
+  27 |       0 |    0
   34 |       1 |    1
-  39 |       6 |    2
+  26 |       9 |    1
+  30 |       5 |    1
   43 |       2 |    2
+  39 |       6 |    2
   30 |       3 |    3
   26 |       7 |    3
 (10 rows)
@@ -3153,14 +3153,14 @@ SELECT sum(unique1) over (order by four groups between 2 preceding and 1 followi
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  15 |       0 |    0
-  15 |       8 |    0
   15 |       4 |    0
-  20 |       5 |    1
-  20 |       9 |    1
+  15 |       8 |    0
+  15 |       0 |    0
   20 |       1 |    1
-  37 |       6 |    2
+  20 |       9 |    1
+  20 |       5 |    1
   37 |       2 |    2
+  37 |       6 |    2
   23 |       3 |    3
   23 |       7 |    3
 (10 rows)
@@ -3170,14 +3170,14 @@ SELECT sum(unique1) over (order by four groups between 2 preceding and 1 followi
 FROM tenk1 WHERE unique1 < 10;
  sum | unique1 | four 
 -----+---------+------
-  15 |       0 |    0
-  23 |       8 |    0
   19 |       4 |    0
-  25 |       5 |    1
-  29 |       9 |    1
+  23 |       8 |    0
+  15 |       0 |    0
   21 |       1 |    1
-  43 |       6 |    2
+  29 |       9 |    1
+  25 |       5 |    1
   39 |       2 |    2
+  43 |       6 |    2
   26 |       3 |    3
   30 |       7 |    3
 (10 rows)
@@ -3258,14 +3258,14 @@ select first_value(salary) over(order by enroll_date groups between 1 preceding
 -------------+------+-----------+--------+-------------
         5000 | 6000 |      5000 |   5000 | 10-01-2006
         5000 | 3900 |      5000 |   6000 | 10-01-2006
-        5000 | 4800 |      5000 |   3900 | 12-23-2006
-        3900 | 5200 |      3900 |   4800 | 08-01-2007
+        5000 | 5200 |      5000 |   3900 | 12-23-2006
         3900 | 4800 |      3900 |   5200 | 08-01-2007
-        4800 | 5200 |      4800 |   4800 | 08-08-2007
+        3900 | 4800 |      3900 |   4800 | 08-01-2007
+        5200 | 5200 |      5200 |   4800 | 08-08-2007
         4800 | 3500 |      4800 |   5200 | 08-15-2007
-        5200 | 4500 |      5200 |   3500 | 12-10-2007
-        3500 | 4200 |      3500 |   4500 | 01-01-2008
-        3500 |      |      3500 |   4200 | 01-01-2008
+        5200 | 4200 |      5200 |   3500 | 12-10-2007
+        3500 | 4500 |      3500 |   4200 | 01-01-2008
+        3500 |      |      3500 |   4500 | 01-01-2008
 (10 rows)
 
 select last_value(salary) over(order by enroll_date groups between 1 preceding and 1 following),
@@ -3275,14 +3275,14 @@ select last_value(salary) over(order by enroll_date groups between 1 preceding a
 ------------+------+--------+-------------
        3900 |      |   5000 | 10-01-2006
        3900 | 5000 |   6000 | 10-01-2006
-       5200 | 6000 |   3900 | 12-23-2006
-       4800 | 3900 |   4800 | 08-01-2007
-       4800 | 4800 |   5200 | 08-01-2007
-       5200 | 5200 |   4800 | 08-08-2007
+       4800 | 6000 |   3900 | 12-23-2006
+       4800 | 3900 |   5200 | 08-01-2007
+       4800 | 5200 |   4800 | 08-01-2007
+       5200 | 4800 |   4800 | 08-08-2007
        3500 | 4800 |   5200 | 08-15-2007
-       4200 | 5200 |   3500 | 12-10-2007
-       4200 | 3500 |   4500 | 01-01-2008
-       4200 | 4500 |   4200 | 01-01-2008
+       4500 | 5200 |   3500 | 12-10-2007
+       4500 | 3500 |   4200 | 01-01-2008
+       4500 | 4200 |   4500 | 01-01-2008
 (10 rows)
 
 select first_value(salary) over(order by enroll_date groups between 1 following and 3 following
@@ -3295,14 +3295,14 @@ select first_value(salary) over(order by enroll_date groups between 1 following
 -------------+------+-----------+--------+-------------
         3900 | 6000 |      3900 |   5000 | 10-01-2006
         3900 | 3900 |      3900 |   6000 | 10-01-2006
-        4800 | 4800 |      4800 |   3900 | 12-23-2006
-        4800 | 5200 |      4800 |   4800 | 08-01-2007
+        5200 | 5200 |      5200 |   3900 | 12-23-2006
         4800 | 4800 |      4800 |   5200 | 08-01-2007
+        4800 | 4800 |      4800 |   4800 | 08-01-2007
         5200 | 5200 |      5200 |   4800 | 08-08-2007
         3500 | 3500 |      3500 |   5200 | 08-15-2007
-        4500 | 4500 |      4500 |   3500 | 12-10-2007
-             | 4200 |           |   4500 | 01-01-2008
-             |      |           |   4200 | 01-01-2008
+        4200 | 4200 |      4200 |   3500 | 12-10-2007
+             | 4500 |           |   4200 | 01-01-2008
+             |      |           |   4500 | 01-01-2008
 (10 rows)
 
 select last_value(salary) over(order by enroll_date groups between 1 following and 3 following
@@ -3314,13 +3314,13 @@ select last_value(salary) over(order by enroll_date groups between 1 following a
        4800 |      |   5000 | 10-01-2006
        4800 | 5000 |   6000 | 10-01-2006
        5200 | 6000 |   3900 | 12-23-2006
-       3500 | 3900 |   4800 | 08-01-2007
-       3500 | 4800 |   5200 | 08-01-2007
-       4200 | 5200 |   4800 | 08-08-2007
-       4200 | 4800 |   5200 | 08-15-2007
-       4200 | 5200 |   3500 | 12-10-2007
-            | 3500 |   4500 | 01-01-2008
-            | 4500 |   4200 | 01-01-2008
+       3500 | 3900 |   5200 | 08-01-2007
+       3500 | 5200 |   4800 | 08-01-2007
+       4500 | 4800 |   4800 | 08-08-2007
+       4500 | 4800 |   5200 | 08-15-2007
+       4500 | 5200 |   3500 | 12-10-2007
+            | 3500 |   4200 | 01-01-2008
+            | 4200 |   4500 | 01-01-2008
 (10 rows)
 
 -- Show differences in offset interpretation between ROWS, RANGE, and GROUPS
-- 
2.51.1

#25

Chao Li

li.evan.chao@gmail.com

about 2 months ago

In reply to: John Naylor (#24)

Re: tuple radix sort

Hi John,

I did an initial test before, but I didn’t read the code at the time. Today, I spent time reviewing 0001. Overall, I believe the implementation is solid. I just got a few comments/suggestions.

On Nov 26, 2025, at 21:11, John Naylor <johncnaylorls@gmail.com> wrote:

For v5 I've also added CHECK_FOR_INTERRUPTS and rewrote some comments.

--
John Naylor
Amazon Web Services
<v5-0001-Use-radix-sort-when-SortTuple-contains-a-pass-by-.patch><v5-0004-Detect-common-prefix-to-avoid-wasted-work-during-.patch><v5-0003-WIP-make-some-regression-tests-sort-order-more-de.patch><v5-0002-WIP-Adjust-regression-tests.patch>

1 - 0001
```
+		/* extract the byte for this level from the normalized datum */
+		current_byte = extract_byte(normalize_datum(tup->datum1, ssup),
+									level);
+
+		/* save it for the permutation step */
+		tup->current_byte = current_byte;
```

We recompute normalize_datum(tup->datum1, ssup) for every tuple in every level, why don’t cache the result in SortTuple. As we have cached current_byte in SortTuple, it shouldn’t be a big deal to add one more field to it.

2 - 0001
```
+	while (num_remaining > 1)
+	{
+		/* start the count over */
+		num_remaining = num_partitions;
```

The swap loop always start the count over, so that sorted partitions are re-scanned as well. I think we can do an optimization like:

```
num_active = num_partitions;
while (num_active > 1)
{
for (int i = 0; i < num_active; )
{
uint8 idx = remaining_partitions[i];
// do the swaps for the partition …

if (partitions[idx].offset == partitions[idx].next_offset)
{
remaining_partitions[i] = remaining_partitions[num_active - 1];
num_active--;
}
else
i++;
}
}
```

This way we move out sorted partitions, so they will not be re-scanned.

3 - 0001

In sort_byvalue_datum, we can add a fast-path for all NULL and all NOT NULL cases, so that they won’t need to run the branchless cyclic permutation and the two “for” loops of assertions. Something like:

```
while (d1 < state->memtupcount && data[d1].isnull1 == nulls_first)
d1++;

null_count = d1;
not_null_count = state->memtupcount - d1;

/* fast paths: all on one side */
if (null_count == 0 || not_null_count == 0)
{
if (nulls_first)
{
null_start = data;
not_null_start = data + null_count;
}
else
{
not_null_start = data;
null_start = data + not_null_count;
}

/* only one partition is non-empty; sort it and return */
if (not_null_count > 1)
{
if (not_null_count < QSORT_THRESHOLD)
qsort_tuple(not_null_start, not_null_count, state->base.comparetup, state);
else
radix_sort_tuple(not_null_start, not_null_count, 0, state);
}
else if (null_count > 1 && state->base.onlyKey == NULL)
{
qsort_tuple(null_start, null_count, state->base.comparetup_tiebreak, state);
}
return;
}

/* ... existing branchless cyclic permutation ... */

```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#26

John Naylor

johncnaylorls@gmail.com

about 2 months ago

In reply to: Chao Li (#25)

Re: tuple radix sort

On Thu, Nov 27, 2025 at 2:49 PM Chao Li <li.evan.chao@gmail.com> wrote:

I did an initial test before, but I didn’t read the code at the time. Today, I spent time reviewing 0001. Overall, I believe the implementation is solid. I just got a few comments/suggestions.

Thanks for looking!

On Nov 26, 2025, at 21:11, John Naylor <johncnaylorls@gmail.com> wrote:

<v5-0001-Use-radix-sort-when-SortTuple-contains-a-pass-by-.patch><v5-0004-Detect-common-prefix-to-avoid-wasted-work-during-.patch><v5-0003-WIP-make-some-regression-tests-sort-order-more-de.patch><v5-0002-WIP-Adjust-regression-tests.patch>

We recompute normalize_datum(tup->datum1, ssup) for every tuple in every level, why don’t cache the result in SortTuple. As we have cached current_byte in SortTuple, it shouldn’t be a big deal to add one more field to it.

Actually it is a big deal, because the memtuples array counts against work_mem.

I saw two ways to store the full normalized without increasing the
size, and I've already rejected them:
- share space with isnull1/srctape -- v3 did this, and I already
explained the reason for changing when I shared v4.
- share space with datum1 -- that would require additional code to
restore the original datum and makes it more difficult to verify
correctness

There is also a proposal upthread to not store anything, and that's
still up in the air.

2 - 0001
```
+       while (num_remaining > 1)
+       {
+               /* start the count over */
+               num_remaining = num_partitions;
```
The swap loop always start the count over, so that sorted partitions are re-scanned as well. I think we can do an optimization like:

```
num_active = num_partitions;
while (num_active > 1)
{
for (int i = 0; i < num_active; )
{
uint8 idx = remaining_partitions[i];
// do the swaps for the partition …

if (partitions[idx].offset == partitions[idx].next_offset)
{
remaining_partitions[i] = remaining_partitions[num_active - 1];
num_active--;
}
else
i++;
}
}
```

This way we move out sorted partitions, so they will not be re-scanned.

I don't think that's going to work without additional bookkeeping, so
I'm not sure it's worth it. See the recursion step.

3 - 0001

In sort_byvalue_datum, we can add a fast-path for all NULL and all NOT NULL cases, so that they won’t need to run the branchless cyclic permutation and the two “for” loops of assertions. Something like:

This is too clever and yet doesn't go far enough.

There is already one fast path, which happens to cover the common ASC
+ all NOT NULL case. The right way to skip the permutation step would
be to add a second loop that starts from the end and stops at the
first tuple that needs to swap. That should work not just for all NULL
and all NOT NULL, but also where there is a mix of the two and some
(or all) are already in place. All these cases can be treated the same
and they will continue to the exact same paths.

I haven't yet bothered to try harder, but it may be necessary in order
to avoid introducing regressions, so I'll look into it.

--
John Naylor
Amazon Web Services

#27

Chao Li

li.evan.chao@gmail.com

about 1 month ago

In reply to: John Naylor (#26)

1 attachment(s)

Re: tuple radix sort

Hi John,

I played with this again today and found an optimization that seems to dramatically improve the performance:

```
+static void
+radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state)
+{
+	RadixPartitionInfo partitions[256] = {0};
+	uint8_t		remaining_partitions[256] = {0};
```

Here partitions and remaining_partitions are just temporary buffers, allocating memory from stack and initialize them seems slow. By passing them as function parameters are much faster. See attached diff for my change.

V5 patch: by the way, v5 is very faster than v1.
```
evantest=# select * from test_multi order by category, name;
Time: 299.912 ms
evantest=# select * from test_multi order by category, name;
Time: 298.793 ms
evantest=# select * from test_multi order by category, name;
Time: 300.306 ms
evantest=# select * from test_multi order by category, name;
Time: 302.140 ms
```

v5 + my change:
```
evantest=# select * from test_multi order by category, name;
Time: 152.572 ms
evantest=# select * from test_multi order by category, name;
Time: 143.296 ms
evantest=# select * from test_multi order by category, name;
Time: 151.606 ms
```

The test I did today is just the high cardinality first column test I had done before:
```
drop table if exists test_multi;
create unlogged table test_multi (category int, name text);
insert into test_multi select (random() * 1000000)::int as category, md5(random()::text) || md5(random()::text) as name from generate_series(1, 1000000);
vacuum freeze test_multi;
\timing on
\o /dev/null
set wip_radix_sort = ‘on;
set work_mem = ‘2GB’;
select * from test_multi order by category, name;
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

tuplesort_chaoli.diffapplication/octet-stream; name=tuplesort_chaoli.diff; x-unix-mode=0644Download

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 028c5b71c27..2bf2785f5ec 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2760,10 +2760,11 @@ normalize_datum(Datum orig, SortSupport ssup)
  * DEALINGS IN THE SOFTWARE.
  */
 static void
-radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state)
+radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state,
+				 RadixPartitionInfo * partitions, uint8_t *remaining_partitions)
 {
-	RadixPartitionInfo partitions[256] = {0};
-	uint8_t		remaining_partitions[256] = {0};
+	/* RadixPartitionInfo partitions[256] = {0}; */
+	/* uint8_t		remaining_partitions[256] = {0}; */
 	size_t		total = 0;
 	int			num_partitions = 0;
 	int			num_remaining;
@@ -2771,6 +2772,9 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
 	size_t		start_offset = 0;
 	SortTuple  *partition_begin = begin;
 
+	memset(partitions, 0, sizeof(RadixPartitionInfo) * 256);
+	memset(remaining_partitions, 0, sizeof(uint8_t) * 256);
+
 	/* count number of occurrences of each byte */
 	for (SortTuple *tup = begin; tup < begin + n_elems; tup++)
 	{
@@ -2881,7 +2885,9 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
 					radix_sort_tuple(partition_begin,
 									 num_elements,
 									 level + 1,
-									 state);
+									 state,
+									 partitions,
+									 remaining_partitions);
 				}
 			}
 			else if (state->base.onlyKey == NULL)
@@ -3019,10 +3025,15 @@ sort_byvalue_datum(Tuplesortstate *state)
 		}
 		else
 		{
+			RadixPartitionInfo partitions[256];
+			uint8_t		remaining_partitions[256];
+
 			radix_sort_tuple(not_null_start,
 							 not_null_count,
 							 0,
-							 state);
+							 state,
+							 partitions,
+							 remaining_partitions);
 		}
 	}
 }

#28

John Naylor

johncnaylorls@gmail.com

about 1 month ago

In reply to: Chao Li (#27)

Re: tuple radix sort

On Wed, Dec 3, 2025 at 3:22 PM Chao Li <li.evan.chao@gmail.com> wrote:

I played with this again today and found an optimization that seems to dramatically improve the performance:
```
+static void
+radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state)
+{
+       RadixPartitionInfo partitions[256] = {0};
+       uint8_t         remaining_partitions[256] = {0};
```
Here partitions and remaining_partitions are just temporary buffers, allocating memory from stack and initialize them seems slow. By passing them as function parameters are much faster. See attached diff for my change.

The lesson here is: you can make it as fast as you like if you
accidentally blow away the state that we needed for this to work
correctly.

--
John Naylor
Amazon Web Services

#29

Chao Li

li.evan.chao@gmail.com

about 1 month ago

In reply to: John Naylor (#28)

Re: tuple radix sort

On Dec 4, 2025, at 13:30, John Naylor <johncnaylorls@gmail.com> wrote:

On Wed, Dec 3, 2025 at 3:22 PM Chao Li <li.evan.chao@gmail.com> wrote:
I played with this again today and found an optimization that seems to dramatically improve the performance:
```
+static void
+radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state)
+{
+       RadixPartitionInfo partitions[256] = {0};
+       uint8_t         remaining_partitions[256] = {0};
```
Here partitions and remaining_partitions are just temporary buffers, allocating memory from stack and initialize them seems slow. By passing them as function parameters are much faster. See attached diff for my change.
The lesson here is: you can make it as fast as you like if you
accidentally blow away the state that we needed for this to work
correctly.

Yeah, I quickly realized I was wrong after I clicked “send". I was trying the firs two optimizations as I suggested in my previous email, but the first didn’t help much, and the second just didn’t work. After several hours debugging, I guess my brain got stuck and came out the weird idea.

I continued playing with this again today. I think the execution time is mainly spent on the in-place element switching, which uses 3 levels of loops (while->for->for). If we can use an extra temp array to hold the sorted result, then the 3-level loop can be optimized to 1-level, but that will cost a lot of extra memory which I am afraid not affordable.

Anyway, it’s a fun of playing with this optimization thing. I may play with it again once I get some time.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#30

Chao Li

li.evan.chao@gmail.com

about 1 month ago

In reply to: Chao Li (#29)

2 attachment(s)

Re: tuple radix sort

Hi John,

I played the radix sort again during the weekend.

First, I changed my direction and implemented the in-place switching in the other way, where I did a way like chained-switching. Say starting from item0, for example, switching item0 to item5, then check where item5 should be switched to, and makes the switch, till an item is switch to position 0. See my implementation in other-implemation.diff if you are interested in it. This time, I eyeball checked the sort result and confirmed the correction. But my implementation is slightly slower than your implementation, based on the same test procedure I described in my previous email, my implementation is roughly ~3% slower your implementation. So I think that at least proves your current implementation in v5 has been perfectly fine tuned.

Then I went back to read your implementation again, I found a tiny optimization, where we can move “count sorted partitions” to before the “for” loop, which avoid sorted partition to go through the “for” loop, and my test shows that the movement may lead to ~1% improvement. See the change in radixsort_tiny_optimizeation.diff.

I also noticed that, there could be cases where target element is already in the right partition, so that switching could be unnecessary. However if we want to avoid such unnecessary switches, then we will need to add a “if” check. Given the total number of such cases is tiny, the “if” check would be more expensive than performing blindly switching. I tried to add such a check like:
```
if (offset == (size_t) (st - begin))
continue; /* already in correct position */
```
With my test, it just makes the query ~3% slower.

So, now I think I can wrap up this round of playing. My only suggestion is radixsort_tiny_optimizeation.diff. I may revisit this patch again once you make the entire patch set ready for review.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

other-implemation.diffapplication/octet-stream; name=other-implemation.diff; x-unix-mode=0644Download

commit ee29bccc4540f61dca283e4404ff140705628552
Author: Chao Li (Evan) <lic@highgo.com>
Date:   Sat Dec 6 07:03:14 2025 +0800

    the other implmentation

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 028c5b71c27..5e3e49ba22e 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -632,6 +632,7 @@ typedef struct RadixPartitionInfo
 		size_t		count;
 		size_t		offset;
 	};
+	size_t		begin_offset;
 	size_t		next_offset;
 }			RadixPartitionInfo;
 
@@ -2763,13 +2764,14 @@ static void
 radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state)
 {
 	RadixPartitionInfo partitions[256] = {0};
-	uint8_t		remaining_partitions[256] = {0};
+	uint8_t		remaining_partitions[256];
 	size_t		total = 0;
 	int			num_partitions = 0;
-	int			num_remaining;
+	//int			num_remaining;
 	SortSupport ssup = &state->base.sortKeys[0];
-	size_t		start_offset = 0;
-	SortTuple  *partition_begin = begin;
+	SortTuple pending, tmp;
+	//size_t		start_offset = 0;
+	//SortTuple  *partition_begin = begin;
 
 	/* count number of occurrences of each byte */
 	for (SortTuple *tup = begin; tup < begin + n_elems; tup++)
@@ -2796,6 +2798,7 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
 		if (count != 0)
 		{
 			partitions[i].offset = total;
+			partitions[i].begin_offset = total;
 			total += count;
 			remaining_partitions[num_partitions] = i;
 			num_partitions++;
@@ -2803,7 +2806,7 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
 		partitions[i].next_offset = total;
 	}
 
-	num_remaining = num_partitions;
+	//num_remaining = num_partitions;
 
 	/*
 	 * Swap tuples to correct partition.
@@ -2825,45 +2828,75 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
 	 * previous loop iteration results in only one partition that hasn't been
 	 * counted as sorted, we know it's actually sorted and can exit the loop.
 	 */
-	while (num_remaining > 1)
+	// while (num_remaining > 1)
+	// {
+	// 	/* start the count over */
+	// 	num_remaining = num_partitions;
+
+	// 	for (int i = 0; i < num_partitions; i++)
+	// 	{
+	// 		uint8		idx = remaining_partitions[i];
+
+	// 		for (SortTuple *st = begin + partitions[idx].offset;
+	// 			 st < begin + partitions[idx].next_offset;
+	// 			 st++)
+	// 		{
+	// 			size_t		offset = partitions[st->current_byte].offset++;
+	// 			SortTuple	tmp;
+
+	// 			/* swap current tuple with destination position */
+	// 			Assert(offset < n_elems);
+	// 			tmp = *st;
+	// 			*st = begin[offset];
+	// 			begin[offset] = tmp;
+
+	// 			CHECK_FOR_INTERRUPTS();
+	// 		};
+
+	// 		/* count sorted partitions */
+	// 		if (partitions[idx].offset == partitions[idx].next_offset)
+	// 			num_remaining--;
+	// 	}
+	// }
+	if (num_partitions > 1)
 	{
-		/* start the count over */
-		num_remaining = num_partitions;
-
-		for (int i = 0; i < num_partitions; i++)
+		for (int i = 0; i < n_elems; i++)
 		{
-			uint8		idx = remaining_partitions[i];
+			SortTuple *st = begin + i;
+
+			if (i >= partitions[st->current_byte].begin_offset &&
+				i < partitions[st->current_byte].next_offset)
+				continue;
 
-			for (SortTuple *st = begin + partitions[idx].offset;
-				 st < begin + partitions[idx].next_offset;
-				 st++)
+			while (true)
 			{
-				size_t		offset = partitions[st->current_byte].offset++;
-				SortTuple	tmp;
+				size_t		offset = partitions[st->current_byte].offset;
+
+				CHECK_FOR_INTERRUPTS();
+
+				/* target is the empty position, we are done */
+				if (offset == i)
+				{
+					begin[i] = pending;
+					break;
+				}
 
-				/* swap current tuple with destination position */
 				Assert(offset < n_elems);
 				tmp = *st;
-				*st = begin[offset];
+				pending = begin[offset];
 				begin[offset] = tmp;
-
-				CHECK_FOR_INTERRUPTS();
+				st = &pending;
+				partitions[begin[offset].current_byte].offset ++;
 			};
-
-			/* count sorted partitions */
-			if (partitions[idx].offset == partitions[idx].next_offset)
-				num_remaining--;
 		}
 	}
 
 	/* recurse */
-	for (uint8_t *rp = remaining_partitions;
-		 rp < remaining_partitions + num_partitions;
-		 rp++)
+	for (int i = 0; i < num_partitions; i++)
 	{
-		size_t		end_offset = partitions[*rp].next_offset;
-		SortTuple  *partition_end = begin + end_offset;
-		ptrdiff_t	num_elements = end_offset - start_offset;
+		uint8_t		rp = remaining_partitions[i];
+		SortTuple *partition_begin = begin + partitions[rp].begin_offset;
+		size_t	num_elements = partitions[rp].next_offset - partitions[rp].begin_offset;
 
 		if (num_elements > 1)
 		{
@@ -2897,9 +2930,6 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
 							state);
 			}
 		}
-
-		start_offset = end_offset;
-		partition_begin = partition_end;
 	}
 }

radixsort_tiny_optimizeation.diffapplication/octet-stream; name=radixsort_tiny_optimizeation.diff; x-unix-mode=0644Download

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 028c5b71c27..5185d8403d5 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2834,6 +2834,13 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
 		{
 			uint8		idx = remaining_partitions[i];
 
+			/* count sorted partitions */
+			if (partitions[idx].offset == partitions[idx].next_offset)
+			{
+				num_remaining--;
+				continue;
+			}
+
 			for (SortTuple *st = begin + partitions[idx].offset;
 				 st < begin + partitions[idx].next_offset;
 				 st++)
@@ -2841,6 +2848,9 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
 				size_t		offset = partitions[st->current_byte].offset++;
 				SortTuple	tmp;
 
+				//if (offset == (size_t) (st - begin))
+				//	continue;	/* already in correct position */
+
 				/* swap current tuple with destination position */
 				Assert(offset < n_elems);
 				tmp = *st;
@@ -2849,10 +2859,6 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
 
 				CHECK_FOR_INTERRUPTS();
 			};
-
-			/* count sorted partitions */
-			if (partitions[idx].offset == partitions[idx].next_offset)
-				num_remaining--;
 		}
 	}

#31

John Naylor

johncnaylorls@gmail.com

about 1 month ago

In reply to: Chao Li (#30)

Re: tuple radix sort

On Mon, Dec 8, 2025 at 9:52 AM Chao Li <li.evan.chao@gmail.com> wrote:

First, I changed my direction and implemented the in-place switching in the other way, where I did a way like chained-switching. Say starting from item0, for example, switching item0 to item5, then check where item5 should be switched to, and makes the switch, till an item is switch to position 0. See my implementation in other-implemation.diff if you are interested in it. This time, I eyeball checked the sort result and confirmed the correction. But my implementation is slightly slower than your implementation, based on the same test procedure I described in my previous email, my implementation is roughly ~3% slower your implementation. So I think that at least proves your current implementation in v5 has been perfectly fine tuned.

That shouldn't be surprising, since the way you describe is basically
"American flag sort", which is much older, and the innovation of
ska_byte_sort was to recognize that this is bad for CPU pipelining.
That was explained in detail in the blog post I linked to in my first
email.

Also notice that by attaching a .diff, the CF bot tries and fails to
apply that to master, and has been complaining that my patch needs a
rebase. Please don't do that again.

--
John Naylor
Amazon Web Services

#32

Chao Li

li.evan.chao@gmail.com

about 1 month ago

In reply to: John Naylor (#31)

Re: tuple radix sort

On Dec 8, 2025, at 14:26, John Naylor <johncnaylorls@gmail.com> wrote:

Also notice that by attaching a .diff, the CF bot tries and fails to
apply that to master, and has been complaining that my patch needs a
rebase. Please don't do that again.

Fair point. I wasn’t aware you have created a CF entry.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/